Sample records for optimal feature subsets

  1. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    NASA Astrophysics Data System (ADS)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2017-04-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  2. Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE.

    PubMed

    Chen, Qi; Meng, Zhaopeng; Liu, Xinyi; Jin, Qianguo; Su, Ran

    2018-06-15

    Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.

  3. Adaptive feature selection using v-shaped binary particle swarm optimization.

    PubMed

    Teng, Xuyang; Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers.

  4. Adaptive feature selection using v-shaped binary particle swarm optimization

    PubMed Central

    Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850

  5. Fish swarm intelligent to optimize real time monitoring of chips drying using machine vision

    NASA Astrophysics Data System (ADS)

    Hendrawan, Y.; Hawa, L. C.; Damayanti, R.

    2018-03-01

    This study attempted to apply machine vision-based chips drying monitoring system which is able to optimise the drying process of cassava chips. The objective of this study is to propose fish swarm intelligent (FSI) optimization algorithms to find the most significant set of image features suitable for predicting water content of cassava chips during drying process using artificial neural network model (ANN). Feature selection entails choosing the feature subset that maximizes the prediction accuracy of ANN. Multi-Objective Optimization (MOO) was used in this study which consisted of prediction accuracy maximization and feature-subset size minimization. The results showed that the best feature subset i.e. grey mean, L(Lab) Mean, a(Lab) energy, red entropy, hue contrast, and grey homogeneity. The best feature subset has been tested successfully in ANN model to describe the relationship between image features and water content of cassava chips during drying process with R2 of real and predicted data was equal to 0.9.

  6. Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms.

    PubMed

    Lin, Kuan-Cheng; Hsieh, Yi-Hsiu

    2015-10-01

    The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.

  7. Inline Measurement of Particle Concentrations in Multicomponent Suspensions using Ultrasonic Sensor and Least Squares Support Vector Machines.

    PubMed

    Zhan, Xiaobin; Jiang, Shulan; Yang, Yili; Liang, Jian; Shi, Tielin; Li, Xiwen

    2015-09-18

    This paper proposes an ultrasonic measurement system based on least squares support vector machines (LS-SVM) for inline measurement of particle concentrations in multicomponent suspensions. Firstly, the ultrasonic signals are analyzed and processed, and the optimal feature subset that contributes to the best model performance is selected based on the importance of features. Secondly, the LS-SVM model is tuned, trained and tested with different feature subsets to obtain the optimal model. In addition, a comparison is made between the partial least square (PLS) model and the LS-SVM model. Finally, the optimal LS-SVM model with the optimal feature subset is applied to inline measurement of particle concentrations in the mixing process. The results show that the proposed method is reliable and accurate for inline measuring the particle concentrations in multicomponent suspensions and the measurement accuracy is sufficiently high for industrial application. Furthermore, the proposed method is applicable to the modeling of the nonlinear system dynamically and provides a feasible way to monitor industrial processes.

  8. The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

    PubMed

    Cheng, Qiang; Zhou, Hongbo; Cheng, Jie

    2011-06-01

    Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

  9. Toward optimal feature and time segment selection by divergence method for EEG signals classification.

    PubMed

    Wang, Jie; Feng, Zuren; Lu, Na; Luo, Jing

    2018-06-01

    Feature selection plays an important role in the field of EEG signals based motor imagery pattern classification. It is a process that aims to select an optimal feature subset from the original set. Two significant advantages involved are: lowering the computational burden so as to speed up the learning procedure and removing redundant and irrelevant features so as to improve the classification performance. Therefore, feature selection is widely employed in the classification of EEG signals in practical brain-computer interface systems. In this paper, we present a novel statistical model to select the optimal feature subset based on the Kullback-Leibler divergence measure, and automatically select the optimal subject-specific time segment. The proposed method comprises four successive stages: a broad frequency band filtering and common spatial pattern enhancement as preprocessing, features extraction by autoregressive model and log-variance, the Kullback-Leibler divergence based optimal feature and time segment selection and linear discriminate analysis classification. More importantly, this paper provides a potential framework for combining other feature extraction models and classification algorithms with the proposed method for EEG signals classification. Experiments on single-trial EEG signals from two public competition datasets not only demonstrate that the proposed method is effective in selecting discriminative features and time segment, but also show that the proposed method yields relatively better classification results in comparison with other competitive methods. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Decontaminate feature for tracking: adaptive tracking via evolutionary feature subset

    NASA Astrophysics Data System (ADS)

    Liu, Qiaoyuan; Wang, Yuru; Yin, Minghao; Ren, Jinchang; Li, Ruizhi

    2017-11-01

    Although various visual tracking algorithms have been proposed in the last 2-3 decades, it remains a challenging problem for effective tracking with fast motion, deformation, occlusion, etc. Under complex tracking conditions, most tracking models are not discriminative and adaptive enough. When the combined feature vectors are inputted to the visual models, this may lead to redundancy causing low efficiency and ambiguity causing poor performance. An effective tracking algorithm is proposed to decontaminate features for each video sequence adaptively, where the visual modeling is treated as an optimization problem from the perspective of evolution. Every feature vector is compared to a biological individual and then decontaminated via classical evolutionary algorithms. With the optimized subsets of features, the "curse of dimensionality" has been avoided while the accuracy of the visual model has been improved. The proposed algorithm has been tested on several publicly available datasets with various tracking challenges and benchmarked with a number of state-of-the-art approaches. The comprehensive experiments have demonstrated the efficacy of the proposed methodology.

  11. Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes

    NASA Astrophysics Data System (ADS)

    Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie

    2017-03-01

    Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).

  12. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    NASA Astrophysics Data System (ADS)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  13. Feature selection with harmony search.

    PubMed

    Diao, Ren; Shen, Qiang

    2012-12-01

    Many search strategies have been exploited for the task of feature selection (FS), in an effort to identify more compact and better quality subsets. Such work typically involves the use of greedy hill climbing (HC), or nature-inspired heuristics, in order to discover the optimal solution without going through exhaustive search. In this paper, a novel FS approach based on harmony search (HS) is presented. It is a general approach that can be used in conjunction with many subset evaluation techniques. The simplicity of HS is exploited to reduce the overall complexity of the search process. The proposed approach is able to escape from local solutions and identify multiple solutions owing to the stochastic nature of HS. Additional parameter control schemes are introduced to reduce the effort and impact of parameter configuration. These can be further combined with the iterative refinement strategy, tailored to enforce the discovery of quality subsets. The resulting approach is compared with those that rely on HC, genetic algorithms, and particle swarm optimization, accompanied by in-depth studies of the suggested improvements.

  14. Enhancing the Discrimination Ability of a Gas Sensor Array Based on a Novel Feature Selection and Fusion Framework.

    PubMed

    Deng, Changjian; Lv, Kun; Shi, Debo; Yang, Bo; Yu, Song; He, Zhiyi; Yan, Jia

    2018-06-12

    In this paper, a novel feature selection and fusion framework is proposed to enhance the discrimination ability of gas sensor arrays for odor identification. Firstly, we put forward an efficient feature selection method based on the separability and the dissimilarity to determine the feature selection order for each type of feature when increasing the dimension of selected feature subsets. Secondly, the K-nearest neighbor (KNN) classifier is applied to determine the dimensions of the optimal feature subsets for different types of features. Finally, in the process of establishing features fusion, we come up with a classification dominance feature fusion strategy which conducts an effective basic feature. Experimental results on two datasets show that the recognition rates of Database I and Database II achieve 97.5% and 80.11%, respectively, when k = 1 for KNN classifier and the distance metric is correlation distance (COR), which demonstrates the superiority of the proposed feature selection and fusion framework in representing signal features. The novel feature selection method proposed in this paper can effectively select feature subsets that are conducive to the classification, while the feature fusion framework can fuse various features which describe the different characteristics of sensor signals, for enhancing the discrimination ability of gas sensors and, to a certain extent, suppressing drift effect.

  15. A global optimization approach to multi-polarity sentiment analysis.

    PubMed

    Li, Xinmiao; Li, Jing; Wu, Yukeng

    2015-01-01

    Following the rapid development of social media, sentiment analysis has become an important social media mining technique. The performance of automatic sentiment analysis primarily depends on feature selection and sentiment classification. While information gain (IG) and support vector machines (SVM) are two important techniques, few studies have optimized both approaches in sentiment analysis. The effectiveness of applying a global optimization approach to sentiment analysis remains unclear. We propose a global optimization-based sentiment analysis (PSOGO-Senti) approach to improve sentiment analysis with IG for feature selection and SVM as the learning engine. The PSOGO-Senti approach utilizes a particle swarm optimization algorithm to obtain a global optimal combination of feature dimensions and parameters in the SVM. We evaluate the PSOGO-Senti model on two datasets from different fields. The experimental results showed that the PSOGO-Senti model can improve binary and multi-polarity Chinese sentiment analysis. We compared the optimal feature subset selected by PSOGO-Senti with the features in the sentiment dictionary. The results of this comparison indicated that PSOGO-Senti can effectively remove redundant and noisy features and can select a domain-specific feature subset with a higher-explanatory power for a particular sentiment analysis task. The experimental results showed that the PSOGO-Senti approach is effective and robust for sentiment analysis tasks in different domains. By comparing the improvements of two-polarity, three-polarity and five-polarity sentiment analysis results, we found that the five-polarity sentiment analysis delivered the largest improvement. The improvement of the two-polarity sentiment analysis was the smallest. We conclude that the PSOGO-Senti achieves higher improvement for a more complicated sentiment analysis task. We also compared the results of PSOGO-Senti with those of the genetic algorithm (GA) and grid search method. From the results of this comparison, we found that PSOGO-Senti is more suitable for improving a difficult multi-polarity sentiment analysis problem.

  16. The depth estimation of 3D face from single 2D picture based on manifold learning constraints

    NASA Astrophysics Data System (ADS)

    Li, Xia; Yang, Yang; Xiong, Hailiang; Liu, Yunxia

    2018-04-01

    The estimation of depth is virtual important in 3D face reconstruction. In this paper, we propose a t-SNE based on manifold learning constraints and introduce K-means method to divide the original database into several subset, and the selected optimal subset to reconstruct the 3D face depth information can greatly reduce the computational complexity. Firstly, we carry out the t-SNE operation to reduce the key feature points in each 3D face model from 1×249 to 1×2. Secondly, the K-means method is applied to divide the training 3D database into several subset. Thirdly, the Euclidean distance between the 83 feature points of the image to be estimated and the feature point information before the dimension reduction of each cluster center is calculated. The category of the image to be estimated is judged according to the minimum Euclidean distance. Finally, the method Kong D will be applied only in the optimal subset to estimate the depth value information of 83 feature points of 2D face images. Achieving the final depth estimation results, thus the computational complexity is greatly reduced. Compared with the traditional traversal search estimation method, although the proposed method error rate is reduced by 0.49, the number of searches decreases with the change of the category. In order to validate our approach, we use a public database to mimic the task of estimating the depth of face images from 2D images. The average number of searches decreased by 83.19%.

  17. Optimal number of features as a function of sample size for various classification rules.

    PubMed

    Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R

    2005-04-15

    Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.

  18. Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization

    NASA Technical Reports Server (NTRS)

    Holst, Terry L.

    2004-01-01

    A genetic algorithm approach suitable for solving multi-objective optimization problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.

  19. Better physical activity classification using smartphone acceleration sensor.

    PubMed

    Arif, Muhammad; Bilal, Mohsin; Kattan, Ahmed; Ahamed, S Iqbal

    2014-09-01

    Obesity is becoming one of the serious problems for the health of worldwide population. Social interactions on mobile phones and computers via internet through social e-networks are one of the major causes of lack of physical activities. For the health specialist, it is important to track the record of physical activities of the obese or overweight patients to supervise weight loss control. In this study, acceleration sensor present in the smartphone is used to monitor the physical activity of the user. Physical activities including Walking, Jogging, Sitting, Standing, Walking upstairs and Walking downstairs are classified. Time domain features are extracted from the acceleration data recorded by smartphone during different physical activities. Time and space complexity of the whole framework is done by optimal feature subset selection and pruning of instances. Classification results of six physical activities are reported in this paper. Using simple time domain features, 99 % classification accuracy is achieved. Furthermore, attributes subset selection is used to remove the redundant features and to minimize the time complexity of the algorithm. A subset of 30 features produced more than 98 % classification accuracy for the six physical activities.

  20. A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization.

    PubMed

    Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein

    2016-06-01

    This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction

    PubMed Central

    Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2015-01-01

    Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797

  2. Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor

    PubMed Central

    Alamedine, D.; Khalil, M.; Marque, C.

    2013-01-01

    Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536

  3. Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.

    PubMed

    Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao

    2015-04-01

    Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. A ℓ2, 1 norm regularized multi-kernel learning for false positive reduction in Lung nodule CAD.

    PubMed

    Cao, Peng; Liu, Xiaoli; Zhang, Jian; Li, Wei; Zhao, Dazhe; Huang, Min; Zaiane, Osmar

    2017-03-01

    The aim of this paper is to describe a novel algorithm for False Positive Reduction in lung nodule Computer Aided Detection(CAD). In this paper, we describes a new CT lung CAD method which aims to detect solid nodules. Specially, we proposed a multi-kernel classifier with a ℓ 2, 1 norm regularizer for heterogeneous feature fusion and selection from the feature subset level, and designed two efficient strategies to optimize the parameters of kernel weights in non-smooth ℓ 2, 1 regularized multiple kernel learning algorithm. The first optimization algorithm adapts a proximal gradient method for solving the ℓ 2, 1 norm of kernel weights, and use an accelerated method based on FISTA; the second one employs an iterative scheme based on an approximate gradient descent method. The results demonstrates that the FISTA-style accelerated proximal descent method is efficient for the ℓ 2, 1 norm formulation of multiple kernel learning with the theoretical guarantee of the convergence rate. Moreover, the experimental results demonstrate the effectiveness of the proposed methods in terms of Geometric mean (G-mean) and Area under the ROC curve (AUC), and significantly outperforms the competing methods. The proposed approach exhibits some remarkable advantages both in heterogeneous feature subsets fusion and classification phases. Compared with the fusion strategies of feature-level and decision level, the proposed ℓ 2, 1 norm multi-kernel learning algorithm is able to accurately fuse the complementary and heterogeneous feature sets, and automatically prune the irrelevant and redundant feature subsets to form a more discriminative feature set, leading a promising classification performance. Moreover, the proposed algorithm consistently outperforms the comparable classification approaches in the literature. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization

    NASA Technical Reports Server (NTRS)

    Holst, Terry L.

    2005-01-01

    A genetic algorithm approach suitable for solving multi-objective problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding Pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the Pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide Pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.

  6. Local Feature Selection for Data Classification.

    PubMed

    Armanfard, Narges; Reilly, James P; Komeili, Majid

    2016-06-01

    Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.

  7. Identifying Patients with Atrioventricular Septal Defect in Down Syndrome Populations by Using Self-Normalizing Neural Networks and Feature Selection.

    PubMed

    Pan, Xiaoyong; Hu, Xiaohua; Zhang, Yu Hang; Feng, Kaiyan; Wang, Shao Peng; Chen, Lei; Huang, Tao; Cai, Yu Dong

    2018-04-12

    Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew's correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles.

  8. An enhanced data visualization method for diesel engine malfunction classification using multi-sensor signals.

    PubMed

    Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan

    2015-10-21

    The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine.

  9. An Enhanced Data Visualization Method for Diesel Engine Malfunction Classification Using Multi-Sensor Signals

    PubMed Central

    Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan

    2015-01-01

    The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine. PMID:26506347

  10. Feature selection for neural network based defect classification of ceramic components using high frequency ultrasound.

    PubMed

    Kesharaju, Manasa; Nagarajah, Romesh

    2015-09-01

    The motivation for this research stems from a need for providing a non-destructive testing method capable of detecting and locating any defects and microstructural variations within armour ceramic components before issuing them to the soldiers who rely on them for their survival. The development of an automated ultrasonic inspection based classification system would make possible the checking of each ceramic component and immediately alert the operator about the presence of defects. Generally, in many classification problems a choice of features or dimensionality reduction is significant and simultaneously very difficult, as a substantial computational effort is required to evaluate possible feature subsets. In this research, a combination of artificial neural networks and genetic algorithms are used to optimize the feature subset used in classification of various defects in reaction-sintered silicon carbide ceramic components. Initially wavelet based feature extraction is implemented from the region of interest. An Artificial Neural Network classifier is employed to evaluate the performance of these features. Genetic Algorithm based feature selection is performed. Principal Component Analysis is a popular technique used for feature selection and is compared with the genetic algorithm based technique in terms of classification accuracy and selection of optimal number of features. The experimental results confirm that features identified by Principal Component Analysis lead to improved performance in terms of classification percentage with 96% than Genetic algorithm with 94%. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.

    PubMed

    Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W

    2016-10-01

    This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  12. YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.

    PubMed

    Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  13. Constraint programming based biomarker optimization.

    PubMed

    Zhou, Manli; Luo, Youxi; Sun, Guoquan; Mai, Guoqin; Zhou, Fengfeng

    2015-01-01

    Efficient and intuitive characterization of biological big data is becoming a major challenge for modern bio-OMIC based scientists. Interactive visualization and exploration of big data is proven to be one of the successful solutions. Most of the existing feature selection algorithms do not allow the interactive inputs from users in the optimizing process of feature selection. This study investigates this question as fixing a few user-input features in the finally selected feature subset and formulates these user-input features as constraints for a programming model. The proposed algorithm, fsCoP (feature selection based on constrained programming), performs well similar to or much better than the existing feature selection algorithms, even with the constraints from both literature and the existing algorithms. An fsCoP biomarker may be intriguing for further wet lab validation, since it satisfies both the classification optimization function and the biomedical knowledge. fsCoP may also be used for the interactive exploration of bio-OMIC big data by interactively adding user-defined constraints for modeling.

  14. Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques.

    PubMed

    Mathieson, Luke; Mendes, Alexandre; Marsden, John; Pond, Jeffrey; Moscato, Pablo

    2017-01-01

    This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686-690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.

  15. Channel and feature selection in multifunction myoelectric control.

    PubMed

    Khushaba, Rami N; Al-Jumaily, Adel

    2007-01-01

    Real time controlling devices based on myoelectric singles (MES) is one of the challenging research problems. This paper presents a new approach to reduce the computational cost of real time systems driven by Myoelectric signals (MES) (a.k.a Electromyography--EMG). The new approach evaluates the significance of feature/channel selection on MES pattern recognition. Particle Swarm Optimization (PSO), an evolutionary computational technique, is employed to search the feature/channel space for important subsets. These important subsets will be evaluated using a multilayer perceptron trained with back propagation neural network (BPNN). Practical results acquired from tests done on six subjects' datasets of MES signals measured in a noninvasive manner using surface electrodes are presented. It is proved that minimum error rates can be achieved by considering the correct combination of features/channels, thus providing a feasible system for practical implementation purpose for rehabilitation of patients.

  16. An opinion formation based binary optimization approach for feature selection

    NASA Astrophysics Data System (ADS)

    Hamedmoghadam, Homayoun; Jalili, Mahdi; Yu, Xinghuo

    2018-02-01

    This paper proposed a novel optimization method based on opinion formation in complex network systems. The proposed optimization technique mimics human-human interaction mechanism based on a mathematical model derived from social sciences. Our method encodes a subset of selected features to the opinion of an artificial agent and simulates the opinion formation process among a population of agents to solve the feature selection problem. The agents interact using an underlying interaction network structure and get into consensus in their opinions, while finding better solutions to the problem. A number of mechanisms are employed to avoid getting trapped in local minima. We compare the performance of the proposed method with a number of classical population-based optimization methods and a state-of-the-art opinion formation based method. Our experiments on a number of high dimensional datasets reveal outperformance of the proposed algorithm over others.

  17. Particle Swarm Optimization approach to defect detection in armour ceramics.

    PubMed

    Kesharaju, Manasa; Nagarajah, Romesh

    2017-03-01

    In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function. Copyright © 2016. Published by Elsevier B.V.

  18. Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.

    PubMed

    Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing

    2009-03-06

    Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.

  19. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues.

    PubMed

    Guo, Song; Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields.

  20. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues

    PubMed Central

    Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields. PMID:27034949

  1. Hypergraph Based Feature Selection Technique for Medical Diagnosis.

    PubMed

    Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar

    2016-11-01

    The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.

  2. A Hypergraph and Arithmetic Residue-based Probabilistic Neural Network for classification in Intrusion Detection Systems.

    PubMed

    Raman, M R Gauthama; Somu, Nivethitha; Kirthivasan, Kannan; Sriram, V S Shankar

    2017-08-01

    Over the past few decades, the design of an intelligent Intrusion Detection System (IDS) remains an open challenge to the research community. Continuous efforts by the researchers have resulted in the development of several learning models based on Artificial Neural Network (ANN) to improve the performance of the IDSs. However, there exists a tradeoff with respect to the stability of ANN architecture and the detection rate for less frequent attacks. This paper presents a novel approach based on Helly property of Hypergraph and Arithmetic Residue-based Probabilistic Neural Network (HG AR-PNN) to address the classification problem in IDS. The Helly property of Hypergraph was exploited for the identification of the optimal feature subset and the arithmetic residue of the optimal feature subset was used to train the PNN. The performance of HG AR-PNN was evaluated using KDD CUP 1999 intrusion dataset. Experimental results prove the dominance of HG AR-PNN classifier over the existing classifiers with respect to the stability and improved detection rate for less frequent attacks. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine.

    PubMed

    Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun

    2016-10-01

    As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.

  4. Feature Selection in Classification of Eye Movements Using Electrooculography for Activity Recognition

    PubMed Central

    Mala, S.; Latha, K.

    2014-01-01

    Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition. PMID:25574185

  5. Feature selection in classification of eye movements using electrooculography for activity recognition.

    PubMed

    Mala, S; Latha, K

    2014-01-01

    Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.

  6. A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

    PubMed

    Li, Der-Chiang; Liu, Chiao-Wen; Hu, Susan C

    2011-05-01

    Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small. This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper. This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance. This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches. Copyright © 2011 Elsevier B.V. All rights reserved.

  7. Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48

    NASA Astrophysics Data System (ADS)

    Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia

    2017-11-01

    Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.

  8. xMSanalyzer: automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data.

    PubMed

    Uppal, Karan; Soltow, Quinlyn A; Strobel, Frederick H; Pittard, W Stephen; Gernert, Kim M; Yu, Tianwei; Jones, Dean P

    2013-01-16

    Detection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation. xMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites. xMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.

  9. Hybrid feature selection for supporting lightweight intrusion detection systems

    NASA Astrophysics Data System (ADS)

    Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin

    2017-08-01

    Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.

  10. AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

    PubMed

    Sun, Lei; Wang, Jun; Wei, Jinmao

    2017-03-14

    The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.

  11. Facial Affect Recognition Using Regularized Discriminant Analysis-Based Algorithms

    NASA Astrophysics Data System (ADS)

    Lee, Chien-Cheng; Huang, Shin-Sheng; Shih, Cheng-Yuan

    2010-12-01

    This paper presents a novel and effective method for facial expression recognition including happiness, disgust, fear, anger, sadness, surprise, and neutral state. The proposed method utilizes a regularized discriminant analysis-based boosting algorithm (RDAB) with effective Gabor features to recognize the facial expressions. Entropy criterion is applied to select the effective Gabor feature which is a subset of informative and nonredundant Gabor features. The proposed RDAB algorithm uses RDA as a learner in the boosting algorithm. The RDA combines strengths of linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). It solves the small sample size and ill-posed problems suffered from QDA and LDA through a regularization technique. Additionally, this study uses the particle swarm optimization (PSO) algorithm to estimate optimal parameters in RDA. Experiment results demonstrate that our approach can accurately and robustly recognize facial expressions.

  12. Learning in data-limited multimodal scenarios: Scandent decision forests and tree-based features.

    PubMed

    Hor, Soheil; Moradi, Mehdi

    2016-12-01

    Incomplete and inconsistent datasets often pose difficulties in multimodal studies. We introduce the concept of scandent decision trees to tackle these difficulties. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We show how scandent trees can be used to enhance the performance of decision forests trained on a small number of multimodal samples when we have access to larger datasets with vastly incomplete feature sets. Additionally, we introduce the concept of tree-based feature transforms in the decision forest paradigm. When combined with scandent trees, the tree-based feature transforms enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. Using this methodology, we build a model trained on MRI and PET images of the ADNI dataset, and then test it on cases with only MRI data. We show that this is significantly more effective in staging of cognitive impairments compared to a similar decision forest model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data. Copyright © 2016. Published by Elsevier B.V.

  13. Consciousness Indexing and Outcome Prediction with Resting-State EEG in Severe Disorders of Consciousness.

    PubMed

    Stefan, Sabina; Schorr, Barbara; Lopez-Rolon, Alex; Kolassa, Iris-Tatjana; Shock, Jonathan P; Rosenfelder, Martin; Heck, Suzette; Bender, Andreas

    2018-04-17

    We applied the following methods to resting-state EEG data from patients with disorders of consciousness (DOC) for consciousness indexing and outcome prediction: microstates, entropy (i.e. approximate, permutation), power in alpha and delta frequency bands, and connectivity (i.e. weighted symbolic mutual information, symbolic transfer entropy, complex network analysis). Patients with unresponsive wakefulness syndrome (UWS) and patients in a minimally conscious state (MCS) were classified into these two categories by fitting and testing a generalised linear model. We aimed subsequently to develop an automated system for outcome prediction in severe DOC by selecting an optimal subset of features using sequential floating forward selection (SFFS). The two outcome categories were defined as UWS or dead, and MCS or emerged from MCS. Percentage of time spent in microstate D in the alpha frequency band performed best at distinguishing MCS from UWS patients. The average clustering coefficient obtained from thresholding beta coherence performed best at predicting outcome. The optimal subset of features selected with SFFS consisted of the frequency of microstate A in the 2-20 Hz frequency band, path length obtained from thresholding alpha coherence, and average path length obtained from thresholding alpha coherence. Combining these features seemed to afford high prediction power. Python and MATLAB toolboxes for the above calculations are freely available under the GNU public license for non-commercial use ( https://qeeg.wordpress.com ).

  14. Feature Selection and Pedestrian Detection Based on Sparse Representation.

    PubMed

    Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei

    2015-01-01

    Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.

  15. On the Impact of Local Taxes in a Set Cover Game

    NASA Astrophysics Data System (ADS)

    Escoffier, Bruno; Gourvès, Laurent; Monnot, Jérôme

    Given a collection C of weighted subsets of a ground set E, the SET cover problem is to find a minimum weight subset of C which covers all elements of E. We study a strategic game defined upon this classical optimization problem. Every element of E is a player which chooses one set of C where it appears. Following a public tax function, every player is charged a fraction of the weight of the set that it has selected. Our motivation is to design a tax function having the following features: it can be implemented in a distributed manner, existence of an equilibrium is guaranteed and the social cost for these equilibria is minimized.

  16. Wavelet-based energy features for glaucomatous image classification.

    PubMed

    Dua, Sumeet; Acharya, U Rajendra; Chowriappa, Pradeep; Sree, S Vinitha

    2012-01-01

    Texture features within images are actively pursued for accurate and efficient glaucoma classification. Energy distribution over wavelet subbands is applied to find these important texture features. In this paper, we investigate the discriminatory potential of wavelet features obtained from the daubechies (db3), symlets (sym3), and biorthogonal (bio3.3, bio3.5, and bio3.7) wavelet filters. We propose a novel technique to extract energy signatures obtained using 2-D discrete wavelet transform, and subject these signatures to different feature ranking and feature selection strategies. We have gauged the effectiveness of the resultant ranked and selected subsets of features using a support vector machine, sequential minimal optimization, random forest, and naïve Bayes classification strategies. We observed an accuracy of around 93% using tenfold cross validations to demonstrate the effectiveness of these methods.

  17. Individual differences in attention strategies during detection, fine discrimination, and coarse discrimination

    PubMed Central

    Hecker, Elizabeth A.; Serences, John T.; Srinivasan, Ramesh

    2013-01-01

    Interacting with the environment requires the ability to flexibly direct attention to relevant features. We examined the degree to which individuals attend to visual features within and across Detection, Fine Discrimination, and Coarse Discrimination tasks. Electroencephalographic (EEG) responses were measured to an unattended peripheral flickering (4 or 6 Hz) grating while individuals (n = 33) attended to orientations that were offset by 0°, 10°, 20°, 30°, 40°, and 90° from the orientation of the unattended flicker. These unattended responses may be sensitive to attentional gain at the attended spatial location, since attention to features enhances early visual responses throughout the visual field. We found no significant differences in tuning curves across the three tasks in part due to individual differences in strategies. We sought to characterize individual attention strategies using hierarchical Bayesian modeling, which grouped individuals into families of curves that reflect attention to the physical target orientation (“on-channel”) or away from the target orientation (“off-channel”) or a uniform distribution of attention. The different curves were related to behavioral performance; individuals with “on-channel” curves had lower thresholds than individuals with uniform curves. Individuals with “off-channel” curves during Fine Discrimination additionally had lower thresholds than those assigned to uniform curves, highlighting the perceptual benefits of attending away from the physical target orientation during fine discriminations. Finally, we showed that a subset of individuals with optimal curves (“on-channel”) during Detection also demonstrated optimal curves (“off-channel”) during Fine Discrimination, indicating that a subset of individuals can modulate tuning optimally for detection and discrimination. PMID:23678013

  18. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    PubMed

    Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  19. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

    PubMed Central

    Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205

  20. A Fault Recognition System for Gearboxes of Wind Turbines

    NASA Astrophysics Data System (ADS)

    Yang, Zhiling; Huang, Haiyue; Yin, Zidong

    2017-12-01

    Costs of maintenance and loss of power generation caused by the faults of wind turbines gearboxes are the main components of operation costs for a wind farm. Therefore, the technology of condition monitoring and fault recognition for wind turbines gearboxes is becoming a hot topic. A condition monitoring and fault recognition system (CMFRS) is presented for CBM of wind turbines gearboxes in this paper. The vibration signals from acceleration sensors at different locations of gearbox and the data from supervisory control and data acquisition (SCADA) system are collected to CMFRS. Then the feature extraction and optimization algorithm is applied to these operational data. Furthermore, to recognize the fault of gearboxes, the GSO-LSSVR algorithm is proposed, combining the least squares support vector regression machine (LSSVR) with the Glowworm Swarm Optimization (GSO) algorithm. Finally, the results show that the fault recognition system used in this paper has a high rate for identifying three states of wind turbines’ gears; besides, the combination of date features can affect the identifying rate and the selection optimization algorithm presented in this paper can get a pretty good date feature subset for the fault recognition.

  1. Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis

    NASA Astrophysics Data System (ADS)

    Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan

    2017-10-01

    This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

  2. Transferability of optimally-selected climate models in the quantification of climate change impacts on hydrology

    NASA Astrophysics Data System (ADS)

    Chen, Jie; Brissette, François P.; Lucas-Picher, Philippe

    2016-11-01

    Given the ever increasing number of climate change simulations being carried out, it has become impractical to use all of them to cover the uncertainty of climate change impacts. Various methods have been proposed to optimally select subsets of a large ensemble of climate simulations for impact studies. However, the behaviour of optimally-selected subsets of climate simulations for climate change impacts is unknown, since the transfer process from climate projections to the impact study world is usually highly non-linear. Consequently, this study investigates the transferability of optimally-selected subsets of climate simulations in the case of hydrological impacts. Two different methods were used for the optimal selection of subsets of climate scenarios, and both were found to be capable of adequately representing the spread of selected climate model variables contained in the original large ensemble. However, in both cases, the optimal subsets had limited transferability to hydrological impacts. To capture a similar variability in the impact model world, many more simulations have to be used than those that are needed to simply cover variability from the climate model variables' perspective. Overall, both optimal subset selection methods were better than random selection when small subsets were selected from a large ensemble for impact studies. However, as the number of selected simulations increased, random selection often performed better than the two optimal methods. To ensure adequate uncertainty coverage, the results of this study imply that selecting as many climate change simulations as possible is the best avenue. Where this was not possible, the two optimal methods were found to perform adequately.

  3. Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers

    NASA Astrophysics Data System (ADS)

    Weinmann, Martin; Jutzi, Boris; Hinz, Stefan; Mallet, Clément

    2015-07-01

    3D scene analysis in terms of automatically assigning 3D points a respective semantic label has become a topic of great importance in photogrammetry, remote sensing, computer vision and robotics. In this paper, we address the issue of how to increase the distinctiveness of geometric features and select the most relevant ones among these for 3D scene analysis. We present a new, fully automated and versatile framework composed of four components: (i) neighborhood selection, (ii) feature extraction, (iii) feature selection and (iv) classification. For each component, we consider a variety of approaches which allow applicability in terms of simplicity, efficiency and reproducibility, so that end-users can easily apply the different components and do not require expert knowledge in the respective domains. In a detailed evaluation involving 7 neighborhood definitions, 21 geometric features, 7 approaches for feature selection, 10 classifiers and 2 benchmark datasets, we demonstrate that the selection of optimal neighborhoods for individual 3D points significantly improves the results of 3D scene analysis. Additionally, we show that the selection of adequate feature subsets may even further increase the quality of the derived results while significantly reducing both processing time and memory consumption.

  4. A novel approach for dimension reduction of microarray.

    PubMed

    Aziz, Rabia; Verma, C K; Srivastava, Namita

    2017-12-01

    This paper proposes a new hybrid search technique for feature (gene) selection (FS) using Independent component analysis (ICA) and Artificial Bee Colony (ABC) called ICA+ABC, to select informative genes based on a Naïve Bayes (NB) algorithm. An important trait of this technique is the optimization of ICA feature vector using ABC. ICA+ABC is a hybrid search algorithm that combines the benefits of extraction approach, to reduce the size of data and wrapper approach, to optimize the reduced feature vectors. This hybrid search technique is facilitated by evaluating the performance of ICA+ABC on six standard gene expression datasets of classification. Extensive experiments were conducted to compare the performance of ICA+ABC with the results obtained from recently published Minimum Redundancy Maximum Relevance (mRMR) +ABC algorithm for NB classifier. Also to check the performance that how ICA+ABC works as feature selection with NB classifier, compared the combination of ICA with popular filter techniques and with other similar bio inspired algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The result shows that ICA+ABC has a significant ability to generate small subsets of genes from the ICA feature vector, that significantly improve the classification accuracy of NB classifier compared to other previously suggested methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Entropy-based gene ranking without selection bias for the predictive classification of microarray data.

    PubMed

    Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe

    2003-11-06

    We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  6. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling

    PubMed Central

    Zhou, Fuqun; Zhang, Aining

    2016-01-01

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2–3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests’ features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data. PMID:27792152

  7. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    PubMed

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  8. Diagnosis of Atopic Dermatitis: Mimics, Overlaps, and Complications

    PubMed Central

    Siegfried, Elaine C.; Hebert, Adelaide A.

    2015-01-01

    Atopic dermatitis (AD) is one of the most common skin diseases affecting infants and children. A smaller subset of adults has persistent or new-onset AD. AD is characterized by pruritus, erythema, induration, and scale, but these features are also typical of several other conditions that can mimic, coexist with, or complicate AD. These include inflammatory skin conditions, infections, infestations, malignancies, genetic disorders, immunodeficiency disorders, nutritional disorders, graft-versus-host disease, and drug eruptions. Familiarity of the spectrum of these diseases and their distinguishing features is critical for correct and timely diagnosis and optimal treatment. PMID:26239454

  9. Skin lesion computational diagnosis of dermoscopic images: Ensemble models based on input feature manipulation.

    PubMed

    Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S

    2017-10-01

    The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. McTwo: a two-step feature selection algorithm based on maximal information coefficient.

    PubMed

    Ge, Ruiquan; Zhou, Manli; Luo, Youxi; Meng, Qinghan; Mai, Guoqin; Ma, Dongli; Wang, Guoqing; Zhou, Fengfeng

    2016-03-23

    High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.

  11. Differential evolution-based multi-objective optimization for the definition of a health indicator for fault diagnostics and prognostics

    NASA Astrophysics Data System (ADS)

    Baraldi, P.; Bonfanti, G.; Zio, E.

    2018-03-01

    The identification of the current degradation state of an industrial component and the prediction of its future evolution is a fundamental step for the development of condition-based and predictive maintenance approaches. The objective of the present work is to propose a general method for extracting a health indicator to measure the amount of component degradation from a set of signals measured during operation. The proposed method is based on the combined use of feature extraction techniques, such as Empirical Mode Decomposition and Auto-Associative Kernel Regression, and a multi-objective Binary Differential Evolution (BDE) algorithm for selecting the subset of features optimal for the definition of the health indicator. The objectives of the optimization are desired characteristics of the health indicator, such as monotonicity, trendability and prognosability. A case study is considered, concerning the prediction of the remaining useful life of turbofan engines. The obtained results confirm that the method is capable of extracting health indicators suitable for accurate prognostics.

  12. Pythran: enabling static optimization of scientific Python programs

    NASA Astrophysics Data System (ADS)

    Guelton, Serge; Brunet, Pierrick; Amini, Mehdi; Merlini, Adrien; Corbillon, Xavier; Raynaud, Alan

    2015-01-01

    Pythran is an open source static compiler that turns modules written in a subset of Python language into native ones. Assuming that scientific modules do not rely much on the dynamic features of the language, it trades them for powerful, possibly inter-procedural, optimizations. These optimizations include detection of pure functions, temporary allocation removal, constant folding, Numpy ufunc fusion and parallelization, explicit thread-level parallelism through OpenMP annotations, false variable polymorphism pruning, and automatic vector instruction generation such as AVX or SSE. In addition to these compilation steps, Pythran provides a C++ runtime library that leverages the C++ STL to provide generic containers, and the Numeric Template Toolbox for Numpy support. It takes advantage of modern C++11 features such as variadic templates, type inference, move semantics and perfect forwarding, as well as classical idioms such as expression templates. Unlike the Cython approach, Pythran input code remains compatible with the Python interpreter. Output code is generally as efficient as the annotated Cython equivalent, if not more, but without the backward compatibility loss.

  13. Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM.

    PubMed

    Hojjati, Seyed Hani; Ebrahimzadeh, Ata; Khazaee, Ali; Babajani-Feremi, Abbas

    2017-04-15

    We investigated identifying patients with mild cognitive impairment (MCI) who progress to Alzheimer's disease (AD), MCI converter (MCI-C), from those with MCI who do not progress to AD, MCI non-converter (MCI-NC), based on resting-state fMRI (rs-fMRI). Graph theory and machine learning approach were utilized to predict progress of patients with MCI to AD using rs-fMRI. Eighteen MCI converts (average age 73.6 years; 11 male) and 62 age-matched MCI non-converters (average age 73.0 years, 28 male) were included in this study. We trained and tested a support vector machine (SVM) to classify MCI-C from MCI-NC using features constructed based on the local and global graph measures. A novel feature selection algorithm was developed and utilized to select an optimal subset of features. Using subset of optimal features in SVM, we classified MCI-C from MCI-NC with an accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve of 91.4%, 83.24%, 90.1%, and 0.95, respectively. Furthermore, results of our statistical analyses were used to identify the affected brain regions in AD. To the best of our knowledge, this is the first study that combines the graph measures (constructed based on rs-fMRI) with machine learning approach and accurately classify MCI-C from MCI-NC. Results of this study demonstrate potential of the proposed approach for early AD diagnosis and demonstrate capability of rs-fMRI to predict conversion from MCI to AD by identifying affected brain regions underlying this conversion. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Predicting protein-protein interactions by combing various sequence- derived features into the general form of Chou's Pseudo amino acid composition.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2012-05-01

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  15. Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2011-09-20

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  16. Comparing the role of shape and texture on staging hepatic fibrosis from medical imaging

    NASA Astrophysics Data System (ADS)

    Zhang, Xuejun; Louie, Ryan; Liu, Brent J.; Gao, Xin; Tan, Xiaomin; Qu, Xianghe; Long, Liling

    2016-03-01

    The purpose of this study is to investigate the role of shape and texture in the classification of hepatic fibrosis by selecting the optimal parameters for a better Computer-aided diagnosis (CAD) system. 10 surface shape features are extracted from a standardized profile of liver; while15 texture features calculated from gray level co-occurrence matrix (GLCM) are extracted within an ROI in liver. Each combination of these input subsets is checked by using support vector machine (SVM) with leave-one-case-out method to differentiate fibrosis into two groups: normal or abnormal. The accurate rate value of all 10/15 types number of features is 66.83% by texture, while 85.74% by shape features, respectively. The irregularity of liver shape can demonstrate fibrotic grade efficiently and texture feature of CT image is not recommended to use with shape feature for interpretation of cirrhosis.

  17. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data.

    PubMed

    Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.

  18. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    PubMed Central

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  19. Sulcal set optimization for cortical surface registration.

    PubMed

    Joshi, Anand A; Pantazis, Dimitrios; Li, Quanzheng; Damasio, Hanna; Shattuck, David W; Toga, Arthur W; Leahy, Richard M

    2010-04-15

    Flat mapping based cortical surface registration constrained by manually traced sulcal curves has been widely used for inter subject comparisons of neuroanatomical data. Even for an experienced neuroanatomist, manual sulcal tracing can be quite time consuming, with the cost increasing with the number of sulcal curves used for registration. We present a method for estimation of an optimal subset of size N(C) from N possible candidate sulcal curves that minimizes a mean squared error metric over all combinations of N(C) curves. The resulting procedure allows us to estimate a subset with a reduced number of curves to be traced as part of the registration procedure leading to optimal use of manual labeling effort for registration. To minimize the error metric we analyze the correlation structure of the errors in the sulcal curves by modeling them as a multivariate Gaussian distribution. For a given subset of sulci used as constraints in surface registration, the proposed model estimates registration error based on the correlation structure of the sulcal errors. The optimal subset of constraint curves consists of the N(C) sulci that jointly minimize the estimated error variance for the subset of unconstrained curves conditioned on the N(C) constraint curves. The optimal subsets of sulci are presented and the estimated and actual registration errors for these subsets are computed. Copyright 2009 Elsevier Inc. All rights reserved.

  20. Multisensor-based real-time quality monitoring by means of feature extraction, selection and modeling for Al alloy in arc welding

    NASA Astrophysics Data System (ADS)

    Zhang, Zhifen; Chen, Huabin; Xu, Yanling; Zhong, Jiyong; Lv, Na; Chen, Shanben

    2015-08-01

    Multisensory data fusion-based online welding quality monitoring has gained increasing attention in intelligent welding process. This paper mainly focuses on the automatic detection of typical welding defect for Al alloy in gas tungsten arc welding (GTAW) by means of analzing arc spectrum, sound and voltage signal. Based on the developed algorithms in time and frequency domain, 41 feature parameters were successively extracted from these signals to characterize the welding process and seam quality. Then, the proposed feature selection approach, i.e., hybrid fisher-based filter and wrapper was successfully utilized to evaluate the sensitivity of each feature and reduce the feature dimensions. Finally, the optimal feature subset with 19 features was selected to obtain the highest accuracy, i.e., 94.72% using established classification model. This study provides a guideline for feature extraction, selection and dynamic modeling based on heterogeneous multisensory data to achieve a reliable online defect detection system in arc welding.

  1. T-ray relevant frequencies for osteosarcoma classification

    NASA Astrophysics Data System (ADS)

    Withayachumnankul, W.; Ferguson, B.; Rainsford, T.; Findlay, D.; Mickan, S. P.; Abbott, D.

    2006-01-01

    We investigate the classification of the T-ray response of normal human bone cells and human osteosarcoma cells, grown in culture. Given the magnitude and phase responses within a reliable spectral range as features for input vectors, a trained support vector machine can correctly classify the two cell types to some extent. Performance of the support vector machine is deteriorated by the curse of dimensionality, resulting from the comparatively large number of features in the input vectors. Feature subset selection methods are used to select only an optimal number of relevant features for inputs. As a result, an improvement in generalization performance is attainable, and the selected frequencies can be used for further describing different mechanisms of the cells, responding to T-rays. We demonstrate a consistent classification accuracy of 89.6%, while the only one fifth of the original features are retained in the data set.

  2. Paroxysmal atrial fibrillation prediction based on HRV analysis and non-dominated sorting genetic algorithm III.

    PubMed

    Boon, K H; Khalil-Hani, M; Malarvili, M B

    2018-01-01

    This paper presents a method that able to predict the paroxysmal atrial fibrillation (PAF). The method uses shorter heart rate variability (HRV) signals when compared to existing methods, and achieves good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to electrically stabilize and prevent the onset of atrial arrhythmias with different pacing techniques. We propose a multi-objective optimization algorithm based on the non-dominated sorting genetic algorithm III for optimizing the baseline PAF prediction system, that consists of the stages of pre-processing, HRV feature extraction, and support vector machine (SVM) model. The pre-processing stage comprises of heart rate correction, interpolation, and signal detrending. After that, time-domain, frequency-domain, non-linear HRV features are extracted from the pre-processed data in feature extraction stage. Then, these features are used as input to the SVM for predicting the PAF event. The proposed optimization algorithm is used to optimize the parameters and settings of various HRV feature extraction algorithms, select the best feature subsets, and tune the SVM parameters simultaneously for maximum prediction performance. The proposed method achieves an accuracy rate of 87.7%, which significantly outperforms most of the previous works. This accuracy rate is achieved even with the HRV signal length being reduced from the typical 30 min to just 5 min (a reduction of 83%). Furthermore, another significant result is the sensitivity rate, which is considered more important that other performance metrics in this paper, can be improved with the trade-off of lower specificity. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Identification of features in indexed data and equipment therefore

    DOEpatents

    Jarman, Kristin H [Richland, WA; Daly, Don Simone [Richland, WA; Anderson, Kevin K [Richland, WA; Wahl, Karen L [Richland, WA

    2002-04-02

    Embodiments of the present invention provide methods of identifying a feature in an indexed dataset. Such embodiments encompass selecting an initial subset of indices, the initial subset of indices being encompassed by an initial window-of-interest and comprising at least one beginning index and at least one ending index; computing an intensity weighted measure of dispersion for the subset of indices using a subset of responses corresponding to the subset of indices; and comparing the intensity weighted measure of dispersion to a dispersion critical value determined from an expected value of the intensity weighted measure of dispersion under a null hypothesis of no transient feature present. Embodiments of the present invention also encompass equipment configured to perform the methods of the present invention.

  4. Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

    PubMed Central

    Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor

    2015-01-01

    In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757

  5. Using learning automata to determine proper subset size in high-dimensional spaces

    NASA Astrophysics Data System (ADS)

    Seyyedi, Seyyed Hossein; Minaei-Bidgoli, Behrouz

    2017-03-01

    In this paper, we offer a new method called FSLA (Finding the best candidate Subset using Learning Automata), which combines the filter and wrapper approaches for feature selection in high-dimensional spaces. Considering the difficulties of dimension reduction in high-dimensional spaces, FSLA's multi-objective functionality is to determine, in an efficient manner, a feature subset that leads to an appropriate tradeoff between the learning algorithm's accuracy and efficiency. First, using an existing weighting function, the feature list is sorted and selected subsets of the list of different sizes are considered. Then, a learning automaton verifies the performance of each subset when it is used as the input space of the learning algorithm and estimates its fitness upon the algorithm's accuracy and the subset size, which determines the algorithm's efficiency. Finally, FSLA introduces the fittest subset as the best choice. We tested FSLA in the framework of text classification. The results confirm its promising performance of attaining the identified goal.

  6. Multi-probe-based resonance-frequency electrical impedance spectroscopy for detection of suspicious breast lesions: improving performance using partial ROC optimization

    NASA Astrophysics Data System (ADS)

    Lederman, Dror; Zheng, Bin; Wang, Xingwei; Wang, Xiao Hui; Gur, David

    2011-03-01

    We have developed a multi-probe resonance-frequency electrical impedance spectroscope (REIS) system to detect breast abnormalities. Based on assessing asymmetry in REIS signals acquired between left and right breasts, we developed several machine learning classifiers to classify younger women (i.e., under 50YO) into two groups of having high and low risk for developing breast cancer. In this study, we investigated a new method to optimize performance based on the area under a selected partial receiver operating characteristic (ROC) curve when optimizing an artificial neural network (ANN), and tested whether it could improve classification performance. From an ongoing prospective study, we selected a dataset of 174 cases for whom we have both REIS signals and diagnostic status verification. The dataset includes 66 "positive" cases recommended for biopsy due to detection of highly suspicious breast lesions and 108 "negative" cases determined by imaging based examinations. A set of REIS-based feature differences, extracted from the two breasts using a mirror-matched approach, was computed and constituted an initial feature pool. Using a leave-one-case-out cross-validation method, we applied a genetic algorithm (GA) to train the ANN with an optimal subset of features. Two optimization criteria were separately used in GA optimization, namely the area under the entire ROC curve (AUC) and the partial area under the ROC curve, up to a predetermined threshold (i.e., 90% specificity). The results showed that although the ANN optimized using the entire AUC yielded higher overall performance (AUC = 0.83 versus 0.76), the ANN optimized using the partial ROC area criterion achieved substantially higher operational performance (i.e., increasing sensitivity level from 28% to 48% at 95% specificity and/ or from 48% to 58% at 90% specificity).

  7. The effect of feature selection methods on computer-aided detection of masses in mammograms

    NASA Astrophysics Data System (ADS)

    Hupse, Rianne; Karssemeijer, Nico

    2010-05-01

    In computer-aided diagnosis (CAD) research, feature selection methods are often used to improve generalization performance of classifiers and shorten computation times. In an application that detects malignant masses in mammograms, we investigated the effect of using a selection criterion that is similar to the final performance measure we are optimizing, namely the mean sensitivity of the system in a predefined range of the free-response receiver operating characteristics (FROC). To obtain the generalization performance of the selected feature subsets, a cross validation procedure was performed on a dataset containing 351 abnormal and 7879 normal regions, each region providing a set of 71 mass features. The same number of noise features, not containing any information, were added to investigate the ability of the feature selection algorithms to distinguish between useful and non-useful features. It was found that significantly higher performances were obtained using feature sets selected by the general test statistic Wilks' lambda than using feature sets selected by the more specific FROC measure. Feature selection leads to better performance when compared to a system in which all features were used.

  8. glmnetLRC f/k/a lrc package: Logistic Regression Classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2016-06-09

    Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less

  9. Use of genetic algorithm for the selection of EEG features

    NASA Astrophysics Data System (ADS)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  10. IDH mutation assessment of glioma using texture features of multimodal MR images

    NASA Astrophysics Data System (ADS)

    Zhang, Xi; Tian, Qiang; Wu, Yu-Xia; Xu, Xiao-Pan; Li, Bao-Juan; Liu, Yi-Xiong; Liu, Yang; Lu, Hong-Bing

    2017-03-01

    Purpose: To 1) find effective texture features from multimodal MRI that can distinguish IDH mutant and wild status, and 2) propose a radiomic strategy for preoperatively detecting IDH mutation patients with glioma. Materials and Methods: 152 patients with glioma were retrospectively included from the Cancer Genome Atlas. Corresponding T1-weighted image before- and post-contrast, T2-weighted image and fluid-attenuation inversion recovery image from the Cancer Imaging Archive were analyzed. Specific statistical tests were applied to analyze the different kind of baseline information of LrGG patients. Finally, 168 texture features were derived from multimodal MRI per patient. Then the support vector machine-based recursive feature elimination (SVM-RFE) and classification strategy was adopted to find the optimal feature subset and build the identification models for detecting the IDH mutation. Results: Among 152 patients, 92 and 60 were confirmed to be IDH-wild and mutant, respectively. Statistical analysis showed that the patients without IDH mutation was significant older than patients with IDH mutation (p<0.01), and the distribution of some histological subtypes was significant different between IDH wild and mutant groups (p<0.01). After SVM-RFE, 15 optimal features were determined for IDH mutation detection. The accuracy, sensitivity, specificity, and AUC after SVM-RFE and parameter optimization were 82.2%, 85.0%, 78.3%, and 0.841, respectively. Conclusion: This study presented a radiomic strategy for noninvasively discriminating IDH mutation of patients with glioma. It effectively incorporated kinds of texture features from multimodal MRI, and SVM-based classification strategy. Results suggested that features selected from SVM-RFE were more potential to identifying IDH mutation. The proposed radiomics strategy could facilitate the clinical decision making in patients with glioma.

  11. A Novel Approach for Lie Detection Based on F-Score and Extreme Learning Machine

    PubMed Central

    Gao, Junfeng; Wang, Zhao; Yang, Yong; Zhang, Wenjia; Tao, Chunyi; Guan, Jinan; Rao, Nini

    2013-01-01

    A new machine learning method referred to as F-score_ELM was proposed to classify the lying and truth-telling using the electroencephalogram (EEG) signals from 28 guilty and innocent subjects. Thirty-one features were extracted from the probe responses from these subjects. Then, a recently-developed classifier called extreme learning machine (ELM) was combined with F-score, a simple but effective feature selection method, to jointly optimize the number of the hidden nodes of ELM and the feature subset by a grid-searching training procedure. The method was compared to two classification models combining principal component analysis with back-propagation network and support vector machine classifiers. We thoroughly assessed the performance of these classification models including the training and testing time, sensitivity and specificity from the training and testing sets, as well as network size. The experimental results showed that the number of the hidden nodes can be effectively optimized by the proposed method. Also, F-score_ELM obtained the best classification accuracy and required the shortest training and testing time. PMID:23755136

  12. Muon tomography imaging improvement using optimized limited angle data

    NASA Astrophysics Data System (ADS)

    Bai, Chuanyong; Simon, Sean; Kindem, Joel; Luo, Weidong; Sossong, Michael J.; Steiger, Matthew

    2014-05-01

    Image resolution of muon tomography is limited by the range of zenith angles of cosmic ray muons and the flux rate at sea level. Low flux rate limits the use of advanced data rebinning and processing techniques to improve image quality. By optimizing the limited angle data, however, image resolution can be improved. To demonstrate the idea, physical data of tungsten blocks were acquired on a muon tomography system. The angular distribution and energy spectrum of muons measured on the system was also used to generate simulation data of tungsten blocks of different arrangement (geometry). The data were grouped into subsets using the zenith angle and volume images were reconstructed from the data subsets using two algorithms. One was a distributed PoCA (point of closest approach) algorithm and the other was an accelerated iterative maximal likelihood/expectation maximization (MLEM) algorithm. Image resolution was compared for different subsets. Results showed that image resolution was better in the vertical direction for subsets with greater zenith angles and better in the horizontal plane for subsets with smaller zenith angles. The overall image resolution appeared to be the compromise of that of different subsets. This work suggests that the acquired data can be grouped into different limited angle data subsets for optimized image resolution in desired directions. Use of multiple images with resolution optimized in different directions can improve overall imaging fidelity and the intended applications.

  13. Unbiased feature selection in learning random forests for high-dimensional data.

    PubMed

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  14. Automated discrimination of dementia spectrum disorders using extreme learning machine and structural T1 MRI features.

    PubMed

    Jongin Kim; Boreom Lee

    2017-07-01

    The classification of neuroimaging data for the diagnosis of Alzheimer's Disease (AD) is one of the main research goals of the neuroscience and clinical fields. In this study, we performed extreme learning machine (ELM) classifier to discriminate the AD, mild cognitive impairment (MCI) from normal control (NC). We compared the performance of ELM with that of a linear kernel support vector machine (SVM) for 718 structural MRI images from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The data consisted of normal control, MCI converter (MCI-C), MCI non-converter (MCI-NC), and AD. We employed SVM-based recursive feature elimination (RFE-SVM) algorithm to find the optimal subset of features. In this study, we found that the RFE-SVM feature selection approach in combination with ELM shows the superior classification accuracy to that of linear kernel SVM for structural T1 MRI data.

  15. Feature Extraction and Selection Strategies for Automated Target Recognition

    NASA Technical Reports Server (NTRS)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  16. Feature extraction and selection strategies for automated target recognition

    NASA Astrophysics Data System (ADS)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-04-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory regionof- interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  17. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    PubMed

    Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  18. Feature Selection for Wearable Smartphone-Based Human Activity Recognition with Able bodied, Elderly, and Stroke Patients

    PubMed Central

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272

  19. Efficient feature subset selection with probabilistic distance criteria. [pattern recognition

    NASA Technical Reports Server (NTRS)

    Chittineni, C. B.

    1979-01-01

    Recursive expressions are derived for efficiently computing the commonly used probabilistic distance measures as a change in the criteria both when a feature is added to and when a feature is deleted from the current feature subset. A combinatorial algorithm for generating all possible r feature combinations from a given set of s features in (s/r) steps with a change of a single feature at each step is presented. These expressions can also be used for both forward and backward sequential feature selection.

  20. SCI model structure determination program (OSR) user's guide. [optimal subset regression

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The computer program, OSR (Optimal Subset Regression) which estimates models for rotorcraft body and rotor force and moment coefficients is described. The technique used is based on the subset regression algorithm. Given time histories of aerodynamic coefficients, aerodynamic variables, and control inputs, the program computes correlation between various time histories. The model structure determination is based on these correlations. Inputs and outputs of the program are given.

  1. Relevance popularity: A term event model based feature selection scheme for text classification.

    PubMed

    Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao

    2017-01-01

    Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

  2. Optimization of breast mass classification using sequential forward floating selection (SFFS) and a support vector machine (SVM) model

    PubMed Central

    Tan, Maxine; Pu, Jiantao; Zheng, Bin

    2014-01-01

    Purpose: Improving radiologists’ performance in classification between malignant and benign breast lesions is important to increase cancer detection sensitivity and reduce false-positive recalls. For this purpose, developing computer-aided diagnosis (CAD) schemes has been attracting research interest in recent years. In this study, we investigated a new feature selection method for the task of breast mass classification. Methods: We initially computed 181 image features based on mass shape, spiculation, contrast, presence of fat or calcifications, texture, isodensity, and other morphological features. From this large image feature pool, we used a sequential forward floating selection (SFFS)-based feature selection method to select relevant features, and analyzed their performance using a support vector machine (SVM) model trained for the classification task. On a database of 600 benign and 600 malignant mass regions of interest (ROIs), we performed the study using a ten-fold cross-validation method. Feature selection and optimization of the SVM parameters were conducted on the training subsets only. Results: The area under the receiver operating characteristic curve (AUC) = 0.805±0.012 was obtained for the classification task. The results also showed that the most frequently-selected features by the SFFS-based algorithm in 10-fold iterations were those related to mass shape, isodensity and presence of fat, which are consistent with the image features frequently used by radiologists in the clinical environment for mass classification. The study also indicated that accurately computing mass spiculation features from the projection mammograms was difficult, and failed to perform well for the mass classification task due to tissue overlap within the benign mass regions. Conclusions: In conclusion, this comprehensive feature analysis study provided new and valuable information for optimizing computerized mass classification schemes that may have potential to be useful as a “second reader” in future clinical practice. PMID:24664267

  3. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    PubMed

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  4. Machine learning based detection of age-related macular degeneration (AMD) and diabetic macular edema (DME) from optical coherence tomography (OCT) images

    PubMed Central

    Wang, Yu; Zhang, Yaonan; Yao, Zhaomin; Zhao, Ruixue; Zhou, Fengfeng

    2016-01-01

    Non-lethal macular diseases greatly impact patients’ life quality, and will cause vision loss at the late stages. Visual inspection of the optical coherence tomography (OCT) images by the experienced clinicians is the main diagnosis technique. We proposed a computer-aided diagnosis (CAD) model to discriminate age-related macular degeneration (AMD), diabetic macular edema (DME) and healthy macula. The linear configuration pattern (LCP) based features of the OCT images were screened by the Correlation-based Feature Subset (CFS) selection algorithm. And the best model based on the sequential minimal optimization (SMO) algorithm achieved 99.3% in the overall accuracy for the three classes of samples. PMID:28018716

  5. Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.

    PubMed

    Libbrecht, Maxwell W; Bilmes, Jeffrey A; Noble, William Stafford

    2018-04-01

    Selecting a non-redundant representative subset of sequences is a common step in many bioinformatics workflows, such as the creation of non-redundant training sets for sequence and structural models or selection of "operational taxonomic units" from metagenomics data. Previous methods for this task, such as CD-HIT, PISCES, and UCLUST, apply a heuristic threshold-based algorithm that has no theoretical guarantees. We propose a new approach based on submodular optimization. Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success for other representative set selection problems. We demonstrate that the submodular optimization approach results in representative protein sequence subsets with greater structural diversity than sets chosen by existing methods, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by competing approaches. We also show how the optimization framework allows us to design a mixture objective function that performs well for both large and small representative sets. The framework we describe is the best possible in polynomial time (under some assumptions), and it is flexible and intuitive because it applies a suite of generic methods to optimize one of a variety of objective functions. © 2018 Wiley Periodicals, Inc.

  6. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

    PubMed Central

    Liu, Aiming; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

    2017-01-01

    Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems. PMID:29117100

  7. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata.

    PubMed

    Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

    2017-11-08

    Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.

  8. Fast Solution in Sparse LDA for Binary Classification

    NASA Technical Reports Server (NTRS)

    Moghaddam, Baback

    2010-01-01

    An algorithm that performs sparse linear discriminant analysis (Sparse-LDA) finds near-optimal solutions in far less time than the prior art when specialized to binary classification (of 2 classes). Sparse-LDA is a type of feature- or variable- selection problem with numerous applications in statistics, machine learning, computer vision, computational finance, operations research, and bio-informatics. Because of its combinatorial nature, feature- or variable-selection problems are NP-hard or computationally intractable in cases involving more than 30 variables or features. Therefore, one typically seeks approximate solutions by means of greedy search algorithms. The prior Sparse-LDA algorithm was a greedy algorithm that considered the best variable or feature to add/ delete to/ from its subsets in order to maximally discriminate between multiple classes of data. The present algorithm is designed for the special but prevalent case of 2-class or binary classification (e.g. 1 vs. 0, functioning vs. malfunctioning, or change versus no change). The present algorithm provides near-optimal solutions on large real-world datasets having hundreds or even thousands of variables or features (e.g. selecting the fewest wavelength bands in a hyperspectral sensor to do terrain classification) and does so in typical computation times of minutes as compared to days or weeks as taken by the prior art. Sparse LDA requires solving generalized eigenvalue problems for a large number of variable subsets (represented by the submatrices of the input within-class and between-class covariance matrices). In the general (fullrank) case, the amount of computation scales at least cubically with the number of variables and thus the size of the problems that can be solved is limited accordingly. However, in binary classification, the principal eigenvalues can be found using a special analytic formula, without resorting to costly iterative techniques. The present algorithm exploits this analytic form along with the inherent sequential nature of greedy search itself. Together this enables the use of highly-efficient partitioned-matrix-inverse techniques that result in large speedups of computation in both the forward-selection and backward-elimination stages of greedy algorithms in general.

  9. A Framework for Cloudy Model Optimization and Database Storage

    NASA Astrophysics Data System (ADS)

    Calvén, Emilia; Helton, Andrew; Sankrit, Ravi

    2018-01-01

    We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.

  10. Automatic classification of artifactual ICA-components for artifact removal in EEG signals.

    PubMed

    Winkler, Irene; Haufe, Stefan; Tangermann, Michael

    2011-08-02

    Artifacts contained in EEG recordings hamper both, the visual interpretation by experts as well as the algorithmic processing and analysis (e.g. for Brain-Computer Interfaces (BCI) or for Mental State Monitoring). While hand-optimized selection of source components derived from Independent Component Analysis (ICA) to clean EEG data is widespread, the field could greatly profit from automated solutions based on Machine Learning methods. Existing ICA-based removal strategies depend on explicit recordings of an individual's artifacts or have not been shown to reliably identify muscle artifacts. We propose an automatic method for the classification of general artifactual source components. They are estimated by TDSEP, an ICA method that takes temporal correlations into account. The linear classifier is based on an optimized feature subset determined by a Linear Programming Machine (LPM). The subset is composed of features from the frequency-, the spatial- and temporal domain. A subject independent classifier was trained on 640 TDSEP components (reaction time (RT) study, n = 12) that were hand labeled by experts as artifactual or brain sources and tested on 1080 new components of RT data of the same study. Generalization was tested on new data from two studies (auditory Event Related Potential (ERP) paradigm, n = 18; motor imagery BCI paradigm, n = 80) that used data with different channel setups and from new subjects. Based on six features only, the optimized linear classifier performed on level with the inter-expert disagreement (<10% Mean Squared Error (MSE)) on the RT data. On data of the auditory ERP study, the same pre-calculated classifier generalized well and achieved 15% MSE. On data of the motor imagery paradigm, we demonstrate that the discriminant information used for BCI is preserved when removing up to 60% of the most artifactual source components. We propose a universal and efficient classifier of ICA components for the subject independent removal of artifacts from EEG data. Based on linear methods, it is applicable for different electrode placements and supports the introspection of results. Trained on expert ratings of large data sets, it is not restricted to the detection of eye- and muscle artifacts. Its performance and generalization ability is demonstrated on data of different EEG studies.

  11. Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification.

    PubMed

    Wang, LiQiang; Li, CuiFeng

    2014-10-01

    A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.

  12. Fizzy: feature subset selection for metagenomics.

    PubMed

    Ditzler, Gregory; Morrison, J Calvin; Lan, Yemin; Rosen, Gail L

    2015-11-04

    Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  13. Fizzy. Feature subset selection for metagenomics

    DOE PAGES

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; ...

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  14. Parsing the genetic heterogeneity of chromosome 12q susceptibility genes for Alzheimer disease by family-based association analysis.

    PubMed

    Lin, Ping-I; Martin, Eden R; Browning-Large, Carrie A; Schmechel, Donald E; Welsh-Bohmer, Kathleen A; Doraiswamy, P Murali; Gilbert, John R; Haines, Jonathan L; Pericak-Vance, Margaret A

    2006-07-01

    Previous linkage studies have suggested that chromosome 12 may harbor susceptibility genes for late-onset Alzheimer disease (LOAD). No risk genes on chromosome 12 have been conclusively identified yet. We have reported that the linkage evidence for LOAD in a 12q region was significantly increased in autopsy-confirmed families particularly for those showing no linkage to alpha-T catenin gene, a LOAD candidate gene on chromosome 10 [LOD score increased from 0.1 in the autopsy-confirmed subset to 4.19 in the unlinked subset (optimal subset); p<0.0001 for the increase in LOD score], indicating a one-LOD support interval spanning 6 Mb. To further investigate this finding and to identify potential candidate LOAD risk genes for follow-up analysis, we analyzed 99 single nucleotide polymorphisms in this region, for the overall sample, the autopsy-confirmed subset, and the optimal subset, respectively, for comparison. We saw no significant association (p<0.01) in the overall sample. In the autopsy-confirmed subset, the best finding was obtained in the activation transcription factor 7 (ATF7) gene (single-locus association, p=0.002; haplotype association global, p=0.007). In the optimal subset, the best finding was obtained in the hypothetical protein FLJ20436 (FLJ20436) gene (single-locus association, p=0.0026). These results suggest that subset and covariate analyses may be one approach to help identify novel susceptibility genes on chromosome 12q for LOAD.

  15. Design of a verifiable subset for HAL/S

    NASA Technical Reports Server (NTRS)

    Browne, J. C.; Good, D. I.; Tripathi, A. R.; Young, W. D.

    1979-01-01

    An attempt to evaluate the applicability of program verification techniques to the existing programming language, HAL/S is discussed. HAL/S is a general purpose high level language designed to accommodate the software needs of the NASA Space Shuttle project. A diversity of features for scientific computing, concurrent and real-time programming, and error handling are discussed. The criteria by which features were evaluated for inclusion into the verifiable subset are described. Individual features of HAL/S with respect to these criteria are examined and justification for the omission of various features from the subset is provided. Conclusions drawn from the research are presented along with recommendations made for the use of HAL/S with respect to the area of program verification.

  16. Design of focused and restrained subsets from extremely large virtual libraries.

    PubMed

    Jamois, Eric A; Lin, Chien T; Waldman, Marvin

    2003-11-01

    With the current and ever-growing offering of reagents along with the vast palette of organic reactions, virtual libraries accessible to combinatorial chemists can reach sizes of billions of compounds or more. Extracting practical size subsets for experimentation has remained an essential step in the design of combinatorial libraries. A typical approach to computational library design involves enumeration of structures and properties for the entire virtual library, which may be unpractical for such large libraries. This study describes a new approach termed as on the fly optimization (OTFO) where descriptors are computed as needed within the subset optimization cycle and without intermediate enumeration of structures. Results reported herein highlight the advantages of coupling an ultra-fast descriptor calculation engine to subset optimization capabilities. We also show that enumeration of properties for the entire virtual library may not only be unpractical but also wasteful. Successful design of focused and restrained subsets can be achieved while sampling only a small fraction of the virtual library. We also investigate the stability of the method and compare results obtained from simulated annealing (SA) and genetic algorithms (GA).

  17. Enhanced Product Generation at NASA Data Centers Through Grid Technology

    NASA Technical Reports Server (NTRS)

    Barkstrom, Bruce R.; Hinke, Thomas H.; Gavali, Shradha; Seufzer, William J.

    2003-01-01

    This paper describes how grid technology can support the ability of NASA data centers to provide customized data products. A combination of grid technology and commodity processors are proposed to provide the bandwidth necessary to perform customized processing of data, with customized data subsetting providing the initial example. This customized subsetting engine can be used to support a new type of subsetting, called phenomena-based subsetting, where data is subsetted based on its association with some phenomena, such as mesoscale convective systems or hurricanes. This concept is expanded to allow the phenomena to be detected in one type of data, with the subsetting requirements transmitted to the subsetting engine to subset a different type of data. The subsetting requirements are generated by a data mining system and transmitted to the subsetter in the form of an XML feature index that describes the spatial and temporal extent of the phenomena. For this work, a grid-based mining system called the Grid Miner is used to identify the phenomena and generate the feature index. This paper discusses the value of grid technology in facilitating the development of a high performance customized product processing and the coupling of a grid mining system to support phenomena-based subsetting.

  18. A Novel Feature Optimization for Wearable Human-Computer Interfaces Using Surface Electromyography Sensors

    PubMed Central

    Zhang, Xiong; Zhao, Yacong; Zhang, Yu; Zhong, Xuefei; Fan, Zhaowen

    2018-01-01

    The novel human-computer interface (HCI) using bioelectrical signals as input is a valuable tool to improve the lives of people with disabilities. In this paper, surface electromyography (sEMG) signals induced by four classes of wrist movements were acquired from four sites on the lower arm with our designed system. Forty-two features were extracted from the time, frequency and time-frequency domains. Optimal channels were determined from single-channel classification performance rank. The optimal-feature selection was according to a modified entropy criteria (EC) and Fisher discrimination (FD) criteria. The feature selection results were evaluated by four different classifiers, and compared with other conventional feature subsets. In online tests, the wearable system acquired real-time sEMG signals. The selected features and trained classifier model were used to control a telecar through four different paradigms in a designed environment with simple obstacles. Performance was evaluated based on travel time (TT) and recognition rate (RR). The results of hardware evaluation verified the feasibility of our acquisition systems, and ensured signal quality. Single-channel analysis results indicated that the channel located on the extensor carpi ulnaris (ECU) performed best with mean classification accuracy of 97.45% for all movement’s pairs. Channels placed on ECU and the extensor carpi radialis (ECR) were selected according to the accuracy rank. Experimental results showed that the proposed FD method was better than other feature selection methods and single-type features. The combination of FD and random forest (RF) performed best in offline analysis, with 96.77% multi-class RR. Online results illustrated that the state-machine paradigm with a 125 ms window had the highest maneuverability and was closest to real-life control. Subjects could accomplish online sessions by three sEMG-based paradigms, with average times of 46.02, 49.06 and 48.08 s, respectively. These experiments validate the feasibility of proposed real-time wearable HCI system and algorithms, providing a potential assistive device interface for persons with disabilities. PMID:29543737

  19. Weighted score-level feature fusion based on Dempster-Shafer evidence theory for action recognition

    NASA Astrophysics Data System (ADS)

    Zhang, Guoliang; Jia, Songmin; Li, Xiuzhi; Zhang, Xiangyin

    2018-01-01

    The majority of human action recognition methods use multifeature fusion strategy to improve the classification performance, where the contribution of different features for specific action has not been paid enough attention. We present an extendible and universal weighted score-level feature fusion method using the Dempster-Shafer (DS) evidence theory based on the pipeline of bag-of-visual-words. First, the partially distinctive samples in the training set are selected to construct the validation set. Then, local spatiotemporal features and pose features are extracted from these samples to obtain evidence information. The DS evidence theory and the proposed rule of survival of the fittest are employed to achieve evidence combination and calculate optimal weight vectors of every feature type belonging to each action class. Finally, the recognition results are deduced via the weighted summation strategy. The performance of the established recognition framework is evaluated on Penn Action dataset and a subset of the joint-annotated human metabolome database (sub-JHMDB). The experiment results demonstrate that the proposed feature fusion method can adequately exploit the complementarity among multiple features and improve upon most of the state-of-the-art algorithms on Penn Action and sub-JHMDB datasets.

  20. Fukunaga-Koontz feature transformation for statistical structural damage detection and hierarchical neuro-fuzzy damage localisation

    NASA Astrophysics Data System (ADS)

    Hoell, Simon; Omenzetter, Piotr

    2017-07-01

    Considering jointly damage sensitive features (DSFs) of signals recorded by multiple sensors, applying advanced transformations to these DSFs and assessing systematically their contribution to damage detectability and localisation can significantly enhance the performance of structural health monitoring systems. This philosophy is explored here for partial autocorrelation coefficients (PACCs) of acceleration responses. They are interrogated with the help of the linear discriminant analysis based on the Fukunaga-Koontz transformation using datasets of the healthy and selected reference damage states. Then, a simple but efficient fast forward selection procedure is applied to rank the DSF components with respect to statistical distance measures specialised for either damage detection or localisation. For the damage detection task, the optimal feature subsets are identified based on the statistical hypothesis testing. For damage localisation, a hierarchical neuro-fuzzy tool is developed that uses the DSF ranking to establish its own optimal architecture. The proposed approaches are evaluated experimentally on data from non-destructively simulated damage in a laboratory scale wind turbine blade. The results support our claim of being able to enhance damage detectability and localisation performance by transforming and optimally selecting DSFs. It is demonstrated that the optimally selected PACCs from multiple sensors or their Fukunaga-Koontz transformed versions can not only improve the detectability of damage via statistical hypothesis testing but also increase the accuracy of damage localisation when used as inputs into a hierarchical neuro-fuzzy network. Furthermore, the computational effort of employing these advanced soft computing models for damage localisation can be significantly reduced by using transformed DSFs.

  1. Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition.

    PubMed

    Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Möller, Steffen; Hartmann, Enno; Kalies, Kai-Uwe; Suganthan, P N; Martinetz, Thomas

    2010-12-01

    Apoptosis is an essential process for controlling tissue homeostasis by regulating a physiological balance between cell proliferation and cell death. The subcellular locations of proteins performing the cell death are determined by mostly independent cellular mechanisms. The regular bioinformatics tools to predict the subcellular locations of such apoptotic proteins do often fail. This work proposes a model for the sorting of proteins that are involved in apoptosis, allowing us to both the prediction of their subcellular locations as well as the molecular properties that contributed to it. We report a novel hybrid Genetic Algorithm (GA)/Support Vector Machine (SVM) approach to predict apoptotic protein sequences using 119 sequence derived properties like frequency of amino acid groups, secondary structure, and physicochemical properties. GA is used for selecting a near-optimal subset of informative features that is most relevant for the classification. Jackknife cross-validation is applied to test the predictive capability of the proposed method on 317 apoptosis proteins. Our method achieved 85.80% accuracy using all 119 features and 89.91% accuracy for 25 features selected by GA. Our models were examined by a test dataset of 98 apoptosis proteins and obtained an overall accuracy of 90.34%. The results show that the proposed approach is promising; it is able to select small subsets of features and still improves the classification accuracy. Our model can contribute to the understanding of programmed cell death and drug discovery. The software and dataset are available at http://www.inb.uni-luebeck.de/tools-demos/apoptosis/GASVM.

  2. ALK mutation and inhibition in lung cancer

    PubMed Central

    Le, Tri; Gerber, David E.

    2016-01-01

    The advent of precision medicine in non-small cell lung cancer has remarkably altered the direction of research and improved clinical outcomes. The identification of molecular subsets with differential response to targeted therapies began with the identification of epidermal growth factor receptor mutated tumors in subsets of non-small cell lung cancer (NSCLC). Emboldened by unprecedented response rates to kinase inhibitors seen in that subset, the oncologic community searched for other molecular subsets featuring oncogene addiction. An early result of this search was the discovery of NSCLC driven by activating rearrangements of the anaplastic lymphoma kinase (ALK) gene. In an astoundingly brief period following the recognition of ALK-positive NSCLC, details of the biology, clinicopathologic features, development of targeted inhibitors, mechanisms of therapeutic resistance, and new generations of treatment were elucidated. This review summarizes the current understanding of the pathologic features, diagnostic approach, treatment options, resistance mechanisms, and future research areas for ALK-positive NSCLC. PMID:27637426

  3. A model of proto-object based saliency

    PubMed Central

    Russell, Alexander F.; Mihalaş, Stefan; von der Heydt, Rudiger; Niebur, Ernst; Etienne-Cummings, Ralph

    2013-01-01

    Organisms use the process of selective attention to optimally allocate their computational resources to the instantaneously most relevant subsets of a visual scene, ensuring that they can parse the scene in real time. Many models of bottom-up attentional selection assume that elementary image features, like intensity, color and orientation, attract attention. Gestalt psychologists, how-ever, argue that humans perceive whole objects before they analyze individual features. This is supported by recent psychophysical studies that show that objects predict eye-fixations better than features. In this report we present a neurally inspired algorithm of object based, bottom-up attention. The model rivals the performance of state of the art non-biologically plausible feature based algorithms (and outperforms biologically plausible feature based algorithms) in its ability to predict perceptual saliency (eye fixations and subjective interest points) in natural scenes. The model achieves this by computing saliency as a function of proto-objects that establish the perceptual organization of the scene. All computational mechanisms of the algorithm have direct neural correlates, and our results provide evidence for the interface theory of attention. PMID:24184601

  4. Deploying a quantum annealing processor to detect tree cover in aerial imagery of California

    PubMed Central

    Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Mukhopadhyay, Supratik; Nemani, Ramakrishna R.

    2017-01-01

    Quantum annealing is an experimental and potentially breakthrough computational technology for handling hard optimization problems, including problems of computer vision. We present a case study in training a production-scale classifier of tree cover in remote sensing imagery, using early-generation quantum annealing hardware built by D-wave Systems, Inc. Beginning within a known boosting framework, we train decision stumps on texture features and vegetation indices extracted from four-band, one-meter-resolution aerial imagery from the state of California. We then impose a regulated quadratic training objective to select an optimal voting subset from among these stumps. The votes of the subset define the classifier. For optimization, the logical variables in the objective function map to quantum bits in the hardware device, while quadratic couplings encode as the strength of physical interactions between the quantum bits. Hardware design limits the number of couplings between these basic physical entities to five or six. To account for this limitation in mapping large problems to the hardware architecture, we propose a truncation and rescaling of the training objective through a trainable metaparameter. The boosting process on our basic 108- and 508-variable problems, thus constituted, returns classifiers that incorporate a diverse range of color- and texture-based metrics and discriminate tree cover with accuracies as high as 92% in validation and 90% on a test scene encompassing the open space preserves and dense suburban build of Mill Valley, CA. PMID:28241028

  5. A ROC-based feature selection method for computer-aided detection and diagnosis

    NASA Astrophysics Data System (ADS)

    Wang, Songyuan; Zhang, Guopeng; Liao, Qimei; Zhang, Junying; Jiao, Chun; Lu, Hongbing

    2014-03-01

    Image-based computer-aided detection and diagnosis (CAD) has been a very active research topic aiming to assist physicians to detect lesions and distinguish them from benign to malignant. However, the datasets fed into a classifier usually suffer from small number of samples, as well as significantly less samples available in one class (have a disease) than the other, resulting in the classifier's suboptimal performance. How to identifying the most characterizing features of the observed data for lesion detection is critical to improve the sensitivity and minimize false positives of a CAD system. In this study, we propose a novel feature selection method mR-FAST that combines the minimal-redundancymaximal relevance (mRMR) framework with a selection metric FAST (feature assessment by sliding thresholds) based on the area under a ROC curve (AUC) generated on optimal simple linear discriminants. With three feature datasets extracted from CAD systems for colon polyps and bladder cancer, we show that the space of candidate features selected by mR-FAST is more characterizing for lesion detection with higher AUC, enabling to find a compact subset of superior features at low cost.

  6. Seismic Characterizations of Fractures: Dynamic Diagnostics

    NASA Astrophysics Data System (ADS)

    Pyrak-Nolte, L. J.

    2017-12-01

    Fracture geometry controls fluid flow in a fracture, affects mechanical stability and influences energy partitioning that affects wave scattering. Our ability to detect and monitor fracture evolution is controlled by the frequency of the signal used to probe a fracture system, i.e. frequency selects the scales. No matter the frequency chosen, some set of discontinuities will be optimal for detection because different wavelengths sample different subsets of fractures. The select subset of fractures is based on the stiffness of the fractures which in turn is linked to fluid flow. A goal is obtaining information from scales outside the optimal detection regime. Fracture geometry trajectories are a potential approach to drive a fracture system across observation scales, i.e. moving systems between effective medium and scattering regimes. Dynamic trajectories (such as perturbing stress, fluid pressure, chemical alteration, etc.) can be used to perturb fracture geometry to enhance scattering or give rise to discrete modes that are intimately related to the micro-structural evolution of a fracture. However, identification of these signal features will require methods for identifying these micro-structural signatures in complicated scattered fields. Acknowledgment: This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Geosciences Research Program under Award Number (DE-FG02-09ER16022).

  7. Encouraging top-down attention in visual search:A developmental perspective.

    PubMed

    Lookadoo, Regan; Yang, Yingying; Merrill, Edward C

    2017-10-01

    Four experiments are reported in which 60 younger children (7-8 years old), 60 older children (10-11 years old), and 60 young adults (18-25 years old) performed a conjunctive visual search task (15 per group in each experiment). The number of distractors of each feature type was unbalanced across displays to evaluate participants' ability to restrict search to the smaller subset of features. The use of top-down attention processes to restrict search was encouraged by providing external aids for identifying and maintaining attention on the smaller set. In Experiment 1, no external assistance was provided. In Experiment 2, precues and instructions were provided to focus attention on that subset. In Experiment 3, trials in which the smaller subset was represented by the same feature were presented in alternating blocks to eliminate the need to switch attention between features from trial to trial. In Experiment 4, consecutive blocks of the same subset features were presented in the first or second half of the experiment, providing additional consistency. All groups benefited from external support of top-down attention, although the pattern of improvement varied across experiments. The younger children benefited most from precues and instruction, using the subset search strategy when instructed. Furthermore, younger children benefited from blocking trials only when blocks of the same features did not alternate. Older participants benefited from the blocking of trials in both Experiments 3 and 4, but not from precues and instructions. Hence, our results revealed both malleability and limits of children's top-down control of attention.

  8. Insights into multimodal imaging classification of ADHD

    PubMed Central

    Colby, John B.; Rudie, Jeffrey D.; Brown, Jesse A.; Douglas, Pamela K.; Cohen, Mark S.; Shehzad, Zarrar

    2012-01-01

    Attention deficit hyperactivity disorder (ADHD) currently is diagnosed in children by clinicians via subjective ADHD-specific behavioral instruments and by reports from the parents and teachers. Considering its high prevalence and large economic and societal costs, a quantitative tool that aids in diagnosis by characterizing underlying neurobiology would be extremely valuable. This provided motivation for the ADHD-200 machine learning (ML) competition, a multisite collaborative effort to investigate imaging classifiers for ADHD. Here we present our ML approach, which used structural and functional magnetic resonance imaging data, combined with demographic information, to predict diagnostic status of individuals with ADHD from typically developing (TD) children across eight different research sites. Structural features included quantitative metrics from 113 cortical and non-cortical regions. Functional features included Pearson correlation functional connectivity matrices, nodal and global graph theoretical measures, nodal power spectra, voxelwise global connectivity, and voxelwise regional homogeneity. We performed feature ranking for each site and modality using the multiple support vector machine recursive feature elimination (SVM-RFE) algorithm, and feature subset selection by optimizing the expected generalization performance of a radial basis function kernel SVM (RBF-SVM) trained across a range of the top features. Site-specific RBF-SVMs using these optimal feature sets from each imaging modality were used to predict the class labels of an independent hold-out test set. A voting approach was used to combine these multiple predictions and assign final class labels. With this methodology we were able to predict diagnosis of ADHD with 55% accuracy (versus a 39% chance level in this sample), 33% sensitivity, and 80% specificity. This approach also allowed us to evaluate predictive structural and functional features giving insight into abnormal brain circuitry in ADHD. PMID:22912605

  9. HPV-Associated Head and Neck Cancer: Unique Features of Epidemiology and Clinical Management

    PubMed Central

    Maxwell, Jessica H.; Grandis, Jennifer R.; Ferris, Robert L.

    2017-01-01

    Human papillomavirus (HPV) is a recently identified causative agent for a subset of head and neck cancers, primarily in the oropharynx, and is largely responsible for the rising worldwide incidence of oropharyngeal cancer (OPC). Patients with HPV-positive OPC have distinct risk factor profiles and generally have a better prognosis than patients with traditional, HPV-negative, head and neck cancer. Concurrent chemotherapy and radiation is a widely accepted primary treatment modality for many patients with HPV-positive OPC. However, recent advances in surgical modalities, including transoral laser and robotic surgery, have led to the reemergence of primary surgical treatment for HPV-positive patients. Clinical trials are under way to determine optimal treatment strategies for the growing subset of patients with HPV-positive OPC. Similarly, identifying those patients with HPV-positive cancer who are at risk for recurrence and poor survival is critical in order to tailor individual treatment regimens and avoid potential undertreatment. PMID:26332002

  10. Feature selection for the classification of traced neurons.

    PubMed

    López-Cabrera, José D; Lorenzo-Ginori, Juan V

    2018-06-01

    The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Feature Selection Methods for Robust Decoding of Finger Movements in a Non-human Primate

    PubMed Central

    Padmanaban, Subash; Baker, Justin; Greger, Bradley

    2018-01-01

    Objective: The performance of machine learning algorithms used for neural decoding of dexterous tasks may be impeded due to problems arising when dealing with high-dimensional data. The objective of feature selection algorithms is to choose a near-optimal subset of features from the original feature space to improve the performance of the decoding algorithm. The aim of our study was to compare the effects of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis (PCA), and Mutual Information Maximization on SVM classification performance for a dexterous decoding task. Approach: A nonhuman primate (NHP) was trained to perform small coordinated movements—similar to typing. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials (AP) during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon AP firing rates. We used the SVM classification to examine the functional parameters of (i) robustness to simulated failure and (ii) longevity of classification. We also compared the effect of using isolated-neuron and multi-unit firing rates as the feature vector supplied to the SVM. Main results: The average decoding accuracy for multi-unit features and single-unit features using Mutual Information Maximization (MIM) across 47 sessions was 96.74 ± 3.5% and 97.65 ± 3.36% respectively. The reduction in decoding accuracy between using 100% of the features and 10% of features based on MIM was 45.56% (from 93.7 to 51.09%) and 4.75% (from 95.32 to 90.79%) for multi-unit and single-unit features respectively. MIM had best performance compared to other feature selection methods. Significance: These results suggest improved decoding performance can be achieved by using optimally selected features. The results based on clinically relevant performance metrics also suggest that the decoding algorithm can be made robust by using optimal features and feature selection algorithms. We believe that even a few percent increase in performance is important and improves the decoding accuracy of the machine learning algorithm potentially increasing the ease of use of a brain machine interface. PMID:29467602

  12. New and Improved Version of the ASDC MOPITT Search and Subset Web Application

    Atmospheric Science Data Center

    2016-07-06

    ... and Improved Version of the ASDC MOPITT Search and Subset Web Application Friday, June 24, 2016 A new and improved version of the ASDC MOPITT Search and Subset Web Application has been released. New features include: Versions 5 and 6 ...

  13. High Accuracy Human Activity Recognition Based on Sparse Locality Preserving Projections.

    PubMed

    Zhu, Xiangbin; Qiu, Huiling

    2016-01-01

    Human activity recognition(HAR) from the temporal streams of sensory data has been applied to many fields, such as healthcare services, intelligent environments and cyber security. However, the classification accuracy of most existed methods is not enough in some applications, especially for healthcare services. In order to improving accuracy, it is necessary to develop a novel method which will take full account of the intrinsic sequential characteristics for time-series sensory data. Moreover, each human activity may has correlated feature relationship at different levels. Therefore, in this paper, we propose a three-stage continuous hidden Markov model (TSCHMM) approach to recognize human activities. The proposed method contains coarse, fine and accurate classification. The feature reduction is an important step in classification processing. In this paper, sparse locality preserving projections (SpLPP) is exploited to determine the optimal feature subsets for accurate classification of the stationary-activity data. It can extract more discriminative activities features from the sensor data compared with locality preserving projections. Furthermore, all of the gyro-based features are used for accurate classification of the moving data. Compared with other methods, our method uses significantly less number of features, and the over-all accuracy has been obviously improved.

  14. High Accuracy Human Activity Recognition Based on Sparse Locality Preserving Projections

    PubMed Central

    2016-01-01

    Human activity recognition(HAR) from the temporal streams of sensory data has been applied to many fields, such as healthcare services, intelligent environments and cyber security. However, the classification accuracy of most existed methods is not enough in some applications, especially for healthcare services. In order to improving accuracy, it is necessary to develop a novel method which will take full account of the intrinsic sequential characteristics for time-series sensory data. Moreover, each human activity may has correlated feature relationship at different levels. Therefore, in this paper, we propose a three-stage continuous hidden Markov model (TSCHMM) approach to recognize human activities. The proposed method contains coarse, fine and accurate classification. The feature reduction is an important step in classification processing. In this paper, sparse locality preserving projections (SpLPP) is exploited to determine the optimal feature subsets for accurate classification of the stationary-activity data. It can extract more discriminative activities features from the sensor data compared with locality preserving projections. Furthermore, all of the gyro-based features are used for accurate classification of the moving data. Compared with other methods, our method uses significantly less number of features, and the over-all accuracy has been obviously improved. PMID:27893761

  15. Initial development of a computer-aided diagnosis tool for solitary pulmonary nodules

    NASA Astrophysics Data System (ADS)

    Catarious, David M., Jr.; Baydush, Alan H.; Floyd, Carey E., Jr.

    2001-07-01

    This paper describes the development of a computer-aided diagnosis (CAD) tool for solitary pulmonary nodules. This CAD tool is built upon physically meaningful features that were selected because of their relevance to shape and texture. These features included a modified version of the Hotelling statistic (HS), a channelized HS, three measures of fractal properties, two measures of spicularity, and three manually measured shape features. These features were measured from a difficult database consisting of 237 regions of interest (ROIs) extracted from digitized chest radiographs. The center of each 256x256 pixel ROI contained a suspicious lesion which was sent to follow-up by a radiologist and whose nature was later clinically determined. Linear discriminant analysis (LDA) was used to search the feature space via sequential forward search using percentage correct as the performance metric. An optimized feature subset, selected for the highest accuracy, was then fed into a three layer artificial neural network (ANN). The ANN's performance was assessed by receiver operating characteristic (ROC) analysis. A leave-one-out testing/training methodology was employed for the ROC analysis. The performance of this system is competitive with that of three radiologists on the same database.

  16. An improved wrapper-based feature selection method for machinery fault diagnosis

    PubMed Central

    2017-01-01

    A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks. PMID:29261689

  17. Systematic feature selection improves accuracy of methylation-based forensic age estimation in Han Chinese males.

    PubMed

    Feng, Lei; Peng, Fuduan; Li, Shanfei; Jiang, Li; Sun, Hui; Ji, Anquan; Zeng, Changqing; Li, Caixia; Liu, Fan

    2018-03-23

    Estimating individual age from biomarkers may provide key information facilitating forensic investigations. Recent progress has shown DNA methylation at age-associated CpG sites as the most informative biomarkers for estimating the individual age of an unknown donor. Optimal feature selection plays a critical role in determining the performance of the final prediction model. In this study we investigate methylation levels at 153 age-associated CpG sites from 21 previously reported genomic regions using the EpiTYPER system for their predictive power on individual age in 390 Han Chinese males ranging from 15 to 75 years of age. We conducted a systematic feature selection using a stepwise backward multiple linear regression analysis as well as an exhaustive searching algorithm. Both approaches identified the same subset of 9 CpG sites, which in linear combination provided the optimal model fitting with mean absolute deviation (MAD) of 2.89 years of age and explainable variance (R 2 ) of 0.92. The final model was validated in two independent Han Chinese male samples (validation set 1, N = 65, MAD = 2.49, R 2  = 0.95, and validation set 2, N = 62, MAD = 3.36, R 2  = 0.89). Other competing models such as support vector machine and artificial neural network did not outperform the linear model to any noticeable degree. The validation set 1 was additionally analyzed using Pyrosequencing technology for cross-platform validation and was termed as validation set 3. Directly applying our model, in which the methylation levels were detected by the EpiTYPER system, to the data from pyrosequencing technology showed, however, less accurate results in terms of MAD (validation set 3, N = 65 Han Chinese males, MAD = 4.20, R 2  = 0.93), suggesting the presence of a batch effect between different data generation platforms. This batch effect could be partially overcome by a z-score transformation (MAD = 2.76, R 2  = 0.93). Overall, our systematic feature selection identified 9 CpG sites as the optimal subset for forensic age estimation and the prediction model consisting of these 9 markers demonstrated high potential in forensic practice. An age estimator implementing our prediction model allowing missing markers is freely available at http://liufan.big.ac.cn/AgePrediction. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. High Dimensional Classification Using Features Annealed Independence Rules.

    PubMed

    Fan, Jianqing; Fan, Yingying

    2008-01-01

    Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

  19. Comparative study of feature selection with ensemble learning using SOM variants

    NASA Astrophysics Data System (ADS)

    Filali, Ameni; Jlassi, Chiraz; Arous, Najet

    2017-03-01

    Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.

  20. Optimal Frequency-Domain System Realization with Weighting

    NASA Technical Reports Server (NTRS)

    Juang, Jer-Nan; Maghami, Peiman G.

    1999-01-01

    Several approaches are presented to identify an experimental system model directly from frequency response data. The formulation uses a matrix-fraction description as the model structure. Frequency weighting such as exponential weighting is introduced to solve a weighted least-squares problem to obtain the coefficient matrices for the matrix-fraction description. A multi-variable state-space model can then be formed using the coefficient matrices of the matrix-fraction description. Three different approaches are introduced to fine-tune the model using nonlinear programming methods to minimize the desired cost function. The first method uses an eigenvalue assignment technique to reassign a subset of system poles to improve the identified model. The second method deals with the model in the real Schur or modal form, reassigns a subset of system poles, and adjusts the columns (rows) of the input (output) influence matrix using a nonlinear optimizer. The third method also optimizes a subset of poles, but the input and output influence matrices are refined at every optimization step through least-squares procedures.

  1. Automatic Classification of Artifactual ICA-Components for Artifact Removal in EEG Signals

    PubMed Central

    2011-01-01

    Background Artifacts contained in EEG recordings hamper both, the visual interpretation by experts as well as the algorithmic processing and analysis (e.g. for Brain-Computer Interfaces (BCI) or for Mental State Monitoring). While hand-optimized selection of source components derived from Independent Component Analysis (ICA) to clean EEG data is widespread, the field could greatly profit from automated solutions based on Machine Learning methods. Existing ICA-based removal strategies depend on explicit recordings of an individual's artifacts or have not been shown to reliably identify muscle artifacts. Methods We propose an automatic method for the classification of general artifactual source components. They are estimated by TDSEP, an ICA method that takes temporal correlations into account. The linear classifier is based on an optimized feature subset determined by a Linear Programming Machine (LPM). The subset is composed of features from the frequency-, the spatial- and temporal domain. A subject independent classifier was trained on 640 TDSEP components (reaction time (RT) study, n = 12) that were hand labeled by experts as artifactual or brain sources and tested on 1080 new components of RT data of the same study. Generalization was tested on new data from two studies (auditory Event Related Potential (ERP) paradigm, n = 18; motor imagery BCI paradigm, n = 80) that used data with different channel setups and from new subjects. Results Based on six features only, the optimized linear classifier performed on level with the inter-expert disagreement (<10% Mean Squared Error (MSE)) on the RT data. On data of the auditory ERP study, the same pre-calculated classifier generalized well and achieved 15% MSE. On data of the motor imagery paradigm, we demonstrate that the discriminant information used for BCI is preserved when removing up to 60% of the most artifactual source components. Conclusions We propose a universal and efficient classifier of ICA components for the subject independent removal of artifacts from EEG data. Based on linear methods, it is applicable for different electrode placements and supports the introspection of results. Trained on expert ratings of large data sets, it is not restricted to the detection of eye- and muscle artifacts. Its performance and generalization ability is demonstrated on data of different EEG studies. PMID:21810266

  2. Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

    NASA Astrophysics Data System (ADS)

    Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

    2014-07-01

    Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.

  3. Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

    PubMed Central

    Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

    2014-01-01

    Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686

  4. Image Correlation Pattern Optimization for Micro-Scale In-Situ Strain Measurements

    NASA Technical Reports Server (NTRS)

    Bomarito, G. F.; Hochhalter, J. D.; Cannon, A. H.

    2016-01-01

    The accuracy and precision of digital image correlation (DIC) is a function of three primary ingredients: image acquisition, image analysis, and the subject of the image. Development of the first two (i.e. image acquisition techniques and image correlation algorithms) has led to widespread use of DIC; however, fewer developments have been focused on the third ingredient. Typically, subjects of DIC images are mechanical specimens with either a natural surface pattern or a pattern applied to the surface. Research in the area of DIC patterns has primarily been aimed at identifying which surface patterns are best suited for DIC, by comparing patterns to each other. Because the easiest and most widespread methods of applying patterns have a high degree of randomness associated with them (e.g., airbrush, spray paint, particle decoration, etc.), less effort has been spent on exact construction of ideal patterns. With the development of patterning techniques such as microstamping and lithography, patterns can be applied to a specimen pixel by pixel from a patterned image. In these cases, especially because the patterns are reused many times, an optimal pattern is sought such that error introduced into DIC from the pattern is minimized. DIC consists of tracking the motion of an array of nodes from a reference image to a deformed image. Every pixel in the images has an associated intensity (grayscale) value, with discretization depending on the bit depth of the image. Because individual pixel matching by intensity value yields a non-unique scale-dependent problem, subsets around each node are used for identification. A correlation criteria is used to find the best match of a particular subset of a reference image within a deformed image. The reader is referred to references for enumerations of typical correlation criteria. As illustrated by Schreier and Sutton and Lu and Cary systematic errors can be introduced by representing the underlying deformation with under-matched shape functions. An important implication, as discussed by Sutton et al., is that in the presence of highly localized deformations (e.g., crack fronts), error can be reduced by minimizing the subset size. In other words, smaller subsets allow the more accurate resolution of localized deformations. Contrarily, the choice of optimal subset size has been widely studied and a general consensus is that larger subsets with more information content are less prone to random error. Thus, an optimal subset size balances the systematic error from under matched deformations with random error from measurement noise. The alternative approach pursued in the current work is to choose a small subset size and optimize the information content within (i.e., optimizing an applied DIC pattern), rather than finding an optimal subset size. In the literature, many pattern quality metrics have been proposed, e.g., sum of square intensity gradient (SSSIG), mean subset fluctuation, gray level co-occurrence, autocorrelation-based metrics, and speckle-based metrics. The majority of these metrics were developed to quantify the quality of common pseudo-random patterns after they have been applied, and were not created with the intent of pattern generation. As such, it is found that none of the metrics examined in this study are fit to be the objective function of a pattern generation optimization. In some cases, such as with speckle-based metrics, application to pixel by pixel patterns is ill-conditioned and requires somewhat arbitrary extensions. In other cases, such as with the SSSIG, it is shown that trivial solutions exist for the optimum of the metric which are ill-suited for DIC (such as a checkerboard pattern). In the current work, a multi-metric optimization method is proposed whereby quality is viewed as a combination of individual quality metrics. Specifically, SSSIG and two auto-correlation metrics are used which have generally competitive objectives. Thus, each metric could be viewed as a constraint imposed upon the others, thereby precluding the achievement of their trivial solutions. In this way, optimization produces a pattern which balances the benefits of multiple quality metrics. The resulting pattern, along with randomly generated patterns, is subjected to numerical deformations and analyzed with DIC software. The optimal pattern is shown to outperform randomly generated patterns.

  5. An Efficient Method Coupling Kernel Principal Component Analysis with Adjoint-Based Optimal Control and Its Goal-Oriented Extensions

    NASA Astrophysics Data System (ADS)

    Thimmisetty, C.; Talbot, C.; Tong, C. H.; Chen, X.

    2016-12-01

    The representativeness of available data poses a significant fundamental challenge to the quantification of uncertainty in geophysical systems. Furthermore, the successful application of machine learning methods to geophysical problems involving data assimilation is inherently constrained by the extent to which obtainable data represent the problem considered. We show how the adjoint method, coupled with optimization based on methods of machine learning, can facilitate the minimization of an objective function defined on a space of significantly reduced dimension. By considering uncertain parameters as constituting a stochastic process, the Karhunen-Loeve expansion and its nonlinear extensions furnish an optimal basis with respect to which optimization using L-BFGS can be carried out. In particular, we demonstrate that kernel PCA can be coupled with adjoint-based optimal control methods to successfully determine the distribution of material parameter values for problems in the context of channelized deformable media governed by the equations of linear elasticity. Since certain subsets of the original data are characterized by different features, the convergence rate of the method in part depends on, and may be limited by, the observations used to furnish the kernel principal component basis. By determining appropriate weights for realizations of the stochastic random field, then, one may accelerate the convergence of the method. To this end, we present a formulation of Weighted PCA combined with a gradient-based means using automatic differentiation to iteratively re-weight observations concurrent with the determination of an optimal reduced set control variables in the feature space. We demonstrate how improvements in the accuracy and computational efficiency of the weighted linear method can be achieved over existing unweighted kernel methods, and discuss nonlinear extensions of the algorithm.

  6. Variable Selection for Road Segmentation in Aerial Images

    NASA Astrophysics Data System (ADS)

    Warnke, S.; Bulatov, D.

    2017-05-01

    For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.

  7. Co-arrays in the Next Fortran Standard

    DOE PAGES

    Reid, John; Numrich, Robert W.

    2007-01-01

    The WG5 committee, at its meeting in Delft, May 2005, decided to include co-arrays in the next Fortran Standard. A Fortran program containing co-arrays is interpreted as if it were replicated a fixed number of times and all copies were executed asynchronously. Each copy has its own set of data objects and is called an image. The array syntax of Fortran is extended with additional trailing subscripts in square brackets to give a clear and straightforward representation of access to data on other images. References without square brackets are to local data, so code that can run independently is uncluttered.more » Any occurrence of square brackets is a warning about communication between images. The additional syntax requires support in the compiler, but it has been designed to be easy to implement and to give the compiler scope both to apply its optimizations within each image and to optimize the communication between images. The extension includes execution control statements for synchronizing images and intrinsic procedures to return the number of images, to return the index of the current image, and to perform collective operations. The paper does not attempt to describe the full details of the feature as it now appears in the draft of the new standard. Instead, we describe a subset and demonstrate the use of this subset with examples.« less

  8. Aggressive behavior and poor prognosis of endometrial stromal sarcomas with YWHAE-FAM22 rearrangement indicate the clinical importance to recognize this subset.

    PubMed

    Kruse, Arnold-Jan; Croce, Sabrina; Kruitwagen, Roy F P M; Riedl, Robert G; Slangen, Brigitte F M; Van Gorp, Toon; Van de Vijver, Koen K

    2014-11-01

    Although the World Health Organization (WHO) in 2003 defined endometrial stromal sarcomas (ESSs) in general have a good prognosis, considerable differences in clinical behavior and prognosis may exist between different patients with ESS. The ESSs of the type associated with YWHAE-NUTM2 (previously named YWHAE-FAM22) fusion have a more aggressive clinical behavior and poorer prognosis than conventional ESS. Recently, the WHO 2014 classification recognizes this subset of ESS as a separate entity and classifies these as high-grade ESSs. Recognition of this subset has therefore an important clinical impact. We performed a review of the literature to delineate the clinicopathologic features of ESS patients with an YWHAE-NUTM2 rearrangement, with the goal to recognize this subset of ESS. We report a case of a woman with WHO 2014-defined high-grade ESS. Furthermore, published English literature was reviewed for YWHAE-FAM22 ESS and uterus. Twenty patients were identified, with a median age of 50 (range, 28-67) years. There were no clinical features able to recognize YWHAE-NUTM2 ESS. However, they characteristically contain specific histopathological features. Furthermore, YWHAE-NUTM2 ESSs are strongly cyclin D1 positive in contrast to conventional low-grade ESSs. YWHAE-NUTM2 ESSs represent a subset of ESSs with an aggressive clinical behavior and poor prognosis. Specific histopathological features may indicate the presence of YWHAE-NUTM2 rearrangement, which subsequently can be confirmed by cyclin D1 immunostaining.

  9. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    PubMed

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.

  10. Algorithm For Solution Of Subset-Regression Problems

    NASA Technical Reports Server (NTRS)

    Verhaegen, Michel

    1991-01-01

    Reliable and flexible algorithm for solution of subset-regression problem performs QR decomposition with new column-pivoting strategy, enables selection of subset directly from originally defined regression parameters. This feature, in combination with number of extensions, makes algorithm very flexible for use in analysis of subset-regression problems in which parameters have physical meanings. Also extended to enable joint processing of columns contaminated by noise with those free of noise, without using scaling techniques.

  11. Clinicopathologic features and management of blastoid variant of mantle cell lymphoma.

    PubMed

    Shrestha, Rajesh; Bhatt, Vijaya Raj; Guru Murthy, Guru Subramanian; Armitage, James O

    2015-01-01

    The blastoid variant of mantle cell lymphoma (MCL), which accounts for less than one-third of MCL, may arise de novo or as a transformation from the classical form of MCL. Blastoid variant, which predominantly involves men in their sixth decade, has frequent extranodal involvement (40-60%), stage IV disease (up to 85%) and central nervous system (CNS) involvement. Diagnosis relies on morphological features and is challenging. Immunophenotyping may display CD23 and CD10 positivity and CD5 negativity in a subset. Genetic analysis demonstrates an increased number of complex genetic alterations. Blastoid variant responds poorly to conventional chemotherapy and has a short duration of response. Although the optimal therapy remains to be established, CNS prophylaxis and the use of aggressive immunochemotherapy followed by autologous stem cell transplant may prolong the remission rate and survival. Further studies are crucial to expand our understanding of this disease entity and improve the clinical outcome.

  12. A novel clinical decision support system using improved adaptive genetic algorithm for the assessment of fetal well-being.

    PubMed

    Ravindran, Sindhu; Jambek, Asral Bahari; Muthusamy, Hariharan; Neoh, Siew-Chin

    2015-01-01

    A novel clinical decision support system is proposed in this paper for evaluating the fetal well-being from the cardiotocogram (CTG) dataset through an Improved Adaptive Genetic Algorithm (IAGA) and Extreme Learning Machine (ELM). IAGA employs a new scaling technique (called sigma scaling) to avoid premature convergence and applies adaptive crossover and mutation techniques with masking concepts to enhance population diversity. Also, this search algorithm utilizes three different fitness functions (two single objective fitness functions and multi-objective fitness function) to assess its performance. The classification results unfold that promising classification accuracy of 94% is obtained with an optimal feature subset using IAGA. Also, the classification results are compared with those of other Feature Reduction techniques to substantiate its exhaustive search towards the global optimum. Besides, five other benchmark datasets are used to gauge the strength of the proposed IAGA algorithm.

  13. Novel Approaches to Improve Iris Recognition System Performance Based on Local Quality Evaluation and Feature Fusion

    PubMed Central

    2014-01-01

    For building a new iris template, this paper proposes a strategy to fuse different portions of iris based on machine learning method to evaluate local quality of iris. There are three novelties compared to previous work. Firstly, the normalized segmented iris is divided into multitracks and then each track is estimated individually to analyze the recognition accuracy rate (RAR). Secondly, six local quality evaluation parameters are adopted to analyze texture information of each track. Besides, particle swarm optimization (PSO) is employed to get the weights of these evaluation parameters and corresponding weighted coefficients of different tracks. Finally, all tracks' information is fused according to the weights of different tracks. The experimental results based on subsets of three public and one private iris image databases demonstrate three contributions of this paper. (1) Our experimental results prove that partial iris image cannot completely replace the entire iris image for iris recognition system in several ways. (2) The proposed quality evaluation algorithm is a self-adaptive algorithm, and it can automatically optimize the parameters according to iris image samples' own characteristics. (3) Our feature information fusion strategy can effectively improve the performance of iris recognition system. PMID:24693243

  14. Novel approaches to improve iris recognition system performance based on local quality evaluation and feature fusion.

    PubMed

    Chen, Ying; Liu, Yuanning; Zhu, Xiaodong; Chen, Huiling; He, Fei; Pang, Yutong

    2014-01-01

    For building a new iris template, this paper proposes a strategy to fuse different portions of iris based on machine learning method to evaluate local quality of iris. There are three novelties compared to previous work. Firstly, the normalized segmented iris is divided into multitracks and then each track is estimated individually to analyze the recognition accuracy rate (RAR). Secondly, six local quality evaluation parameters are adopted to analyze texture information of each track. Besides, particle swarm optimization (PSO) is employed to get the weights of these evaluation parameters and corresponding weighted coefficients of different tracks. Finally, all tracks' information is fused according to the weights of different tracks. The experimental results based on subsets of three public and one private iris image databases demonstrate three contributions of this paper. (1) Our experimental results prove that partial iris image cannot completely replace the entire iris image for iris recognition system in several ways. (2) The proposed quality evaluation algorithm is a self-adaptive algorithm, and it can automatically optimize the parameters according to iris image samples' own characteristics. (3) Our feature information fusion strategy can effectively improve the performance of iris recognition system.

  15. Discriminative illumination: per-pixel classification of raw materials based on optimal projections of spectral BRDF.

    PubMed

    Liu, Chao; Gu, Jinwei

    2014-01-01

    Classifying raw, unpainted materials--metal, plastic, ceramic, fabric, and so on--is an important yet challenging task for computer vision. Previous works measure subsets of surface spectral reflectance as features for classification. However, acquiring the full spectral reflectance is time consuming and error-prone. In this paper, we propose to use coded illumination to directly measure discriminative features for material classification. Optimal illumination patterns--which we call "discriminative illumination"--are learned from training samples, after projecting to which the spectral reflectance of different materials are maximally separated. This projection is automatically realized by the integration of incident light for surface reflection. While a single discriminative illumination is capable of linear, two-class classification, we show that multiple discriminative illuminations can be used for nonlinear and multiclass classification. We also show theoretically that the proposed method has higher signal-to-noise ratio than previous methods due to light multiplexing. Finally, we construct an LED-based multispectral dome and use the discriminative illumination method for classifying a variety of raw materials, including metal (aluminum, alloy, steel, stainless steel, brass, and copper), plastic, ceramic, fabric, and wood. Experimental results demonstrate its effectiveness.

  16. Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

    NASA Astrophysics Data System (ADS)

    Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

    2016-04-01

    This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.

  17. Hash Bit Selection for Nearest Neighbor Search.

    PubMed

    Xianglong Liu; Junfeng He; Shih-Fu Chang

    2017-11-01

    To overcome the barrier of storage and computation when dealing with gigantic-scale data sets, compact hashing has been studied extensively to approximate the nearest neighbor search. Despite the recent advances, critical design issues remain open in how to select the right features, hashing algorithms, and/or parameter settings. In this paper, we address these by posing an optimal hash bit selection problem, in which an optimal subset of hash bits are selected from a pool of candidate bits generated by different features, algorithms, or parameters. Inspired by the optimization criteria used in existing hashing algorithms, we adopt the bit reliability and their complementarity as the selection criteria that can be carefully tailored for hashing performance in different tasks. Then, the bit selection solution is discovered by finding the best tradeoff between search accuracy and time using a modified dynamic programming method. To further reduce the computational complexity, we employ the pairwise relationship among hash bits to approximate the high-order independence property, and formulate it as an efficient quadratic programming method that is theoretically equivalent to the normalized dominant set problem in a vertex- and edge-weighted graph. Extensive large-scale experiments have been conducted under several important application scenarios of hash techniques, where our bit selection framework can achieve superior performance over both the naive selection methods and the state-of-the-art hashing algorithms, with significant accuracy gains ranging from 10% to 50%, relatively.

  18. Artificially intelligent recognition of Arabic speaker using voice print-based local features

    NASA Astrophysics Data System (ADS)

    Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz

    2016-11-01

    Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.

  19. Optimization of Regression Models of Experimental Data Using Confirmation Points

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2010-01-01

    A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.

  20. A non-linear data mining parameter selection algorithm for continuous variables

    PubMed Central

    Razavi, Marianne; Brady, Sean

    2017-01-01

    In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829

  1. Optimizing the design of vertical seismic profiling (VSP) for imaging fracture zones over hardrock basement geothermal environments

    NASA Astrophysics Data System (ADS)

    Reiser, Fabienne; Schmelzbach, Cedric; Maurer, Hansruedi; Greenhalgh, Stewart; Hellwig, Olaf

    2017-04-01

    A primary focus of geothermal seismic imaging is to map dipping faults and fracture zones that control rock permeability and fluid flow. Vertical seismic profiling (VSP) is therefore a most valuable means to image the immediate surroundings of an existing borehole to guide, for example, the placing of new boreholes to optimize production from known faults and fractures. We simulated 2D and 3D acoustic synthetic seismic data and processed it through to pre-stack depth migration to optimize VSP survey layouts for mapping moderately to steeply dipping fracture zones within possible basement geothermal reservoirs. Our VSP survey optimization procedure for sequentially selecting source locations to define the area where source points are best located for optimal imaging makes use of a cross-correlation statistic, by which a subset of migrated shot gathers is compared with a target or reference image from a comprehensive set of source gathers. In geothermal exploration at established sites, it is reasonable to assume that sufficient à priori information is available to construct such a target image. We generally obtained good results with a relatively small number of optimally chosen source positions distributed over an ideal source location area for different fracture zone scenarios (different dips, azimuths, and distances from the surveying borehole). Adding further sources outside the optimal source area did not necessarily improve the results, but rather resulted in image distortions. It was found that fracture zones located at borehole-receiver depths and laterally offset from the borehole by 300 m can be imaged reliably for a range of the different dips, but more source positions and large offsets between sources and the borehole are required for imaging steeply dipping interfaces. When such features cross-cut the borehole, they are particularly difficult to image. For fracture zones with different azimuths, 3D effects are observed. Far offset source positions contribute less to the image quality as fracture zone azimuth increases. Our optimization methodology is best suited for designing future field surveys with a favorable benefit-cost ratio in areas with significant à priori knowledge. Moreover, our optimization workflow is valuable for selecting useful subsets of acquired data for optimum target-oriented processing.

  2. The role of attention in illusory conjunctions.

    PubMed

    Tsal, Y; Meiran, N; Lavie, N

    1994-03-01

    In five experiments, we investigated the effects of attention on illusory conjunctions formed between features of unrelated objects. The first three experiments used a weak manipulation of attention and found that illusory conjunctions formed either among features receiving high attentional priority or among features receiving low attentional priority were not more frequent than were conjunctions formed between mixed features of different attentional priority. The last two experiments used a strong manipulation of attention and failed to reveal any evidence of true illusory conjunctions. The results are inconsistent with the feature-integration theory, which predicts that when attention is focused on a subset of items, illusory conjunctions ought to occur within and outside of the attended subset, but not between the attended and unattended items.

  3. Schizophrenia classification using functional network features

    NASA Astrophysics Data System (ADS)

    Rish, Irina; Cecchi, Guillermo A.; Heuton, Kyle

    2012-03-01

    This paper focuses on discovering statistical biomarkers (features) that are predictive of schizophrenia, with a particular focus on topological properties of fMRI functional networks. We consider several network properties, such as node (voxel) strength, clustering coefficients, local efficiency, as well as just a subset of pairwise correlations. While all types of features demonstrate highly significant statistical differences in several brain areas, and close to 80% classification accuracy, the most remarkable results of 93% accuracy are achieved by using a small subset of only a dozen of most-informative (lowest p-value) correlation features. Our results suggest that voxel-level correlations and functional network features derived from them are highly informative about schizophrenia and can be used as statistical biomarkers for the disease.

  4. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    NASA Astrophysics Data System (ADS)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  5. Reliability-based design optimization using a generalized subset simulation method and posterior approximation

    NASA Astrophysics Data System (ADS)

    Ma, Yuan-Zhuo; Li, Hong-Shuang; Yao, Wei-Xing

    2018-05-01

    The evaluation of the probabilistic constraints in reliability-based design optimization (RBDO) problems has always been significant and challenging work, which strongly affects the performance of RBDO methods. This article deals with RBDO problems using a recently developed generalized subset simulation (GSS) method and a posterior approximation approach. The posterior approximation approach is used to transform all the probabilistic constraints into ordinary constraints as in deterministic optimization. The assessment of multiple failure probabilities required by the posterior approximation approach is achieved by GSS in a single run at all supporting points, which are selected by a proper experimental design scheme combining Sobol' sequences and Bucher's design. Sequentially, the transformed deterministic design optimization problem can be solved by optimization algorithms, for example, the sequential quadratic programming method. Three optimization problems are used to demonstrate the efficiency and accuracy of the proposed method.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou Juefei; Szafruga, Urszula B.; Kuzyk, Mark G.

    We use numerical optimization to study the properties of (1) the class of one-dimensional potential energy functions and (2) systems of point nuclei in two dimensions that yield the largest intrinsic hyperpolarizabilities, which we find to be within 30% of the fundamental limit. In all cases, we use a one-electron model. It is found that a broad range of optimized potentials, each of very different character, yield the same intrinsic hyperpolarizability ceiling of 0.709. Furthermore, all optimized potential energy functions share common features such as (1) the value of the normalized transition dipole moment to the dominant state, which forcesmore » the hyperpolarizability to be dominated by only two excited states and (2) the energy ratio between the two dominant states. All optimized potentials are found to obey the three-level ansatz to within about 1%. Many of these potential energy functions may be implementable in multiple quantum well structures. The subset of potentials with undulations reaffirm that modulation of conjugation may be an approach for making better organic molecules, though there appear to be many others. Additionally, our results suggest that one-dimensional molecules may have larger diagonal intrinsic hyperpolarizability {beta}{sub xxx}{sup int} than higher-dimensional systems.« less

  7. Merits and limitations of optimality criteria method for structural optimization

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Guptill, James D.; Berke, Laszlo

    1993-01-01

    The merits and limitations of the optimality criteria (OC) method for the minimum weight design of structures subjected to multiple load conditions under stress, displacement, and frequency constraints were investigated by examining several numerical examples. The examples were solved utilizing the Optimality Criteria Design Code that was developed for this purpose at NASA Lewis Research Center. This OC code incorporates OC methods available in the literature with generalizations for stress constraints, fully utilized design concepts, and hybrid methods that combine both techniques. Salient features of the code include multiple choices for Lagrange multiplier and design variable update methods, design strategies for several constraint types, variable linking, displacement and integrated force method analyzers, and analytical and numerical sensitivities. The performance of the OC method, on the basis of the examples solved, was found to be satisfactory for problems with few active constraints or with small numbers of design variables. For problems with large numbers of behavior constraints and design variables, the OC method appears to follow a subset of active constraints that can result in a heavier design. The computational efficiency of OC methods appears to be similar to some mathematical programming techniques.

  8. Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles.

    PubMed

    Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho

    2017-01-01

    Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. An ensemble method for extracting adverse drug events from social media.

    PubMed

    Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi

    2016-06-01

    Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Automatic plankton image classification combining multiple view features via multiple kernel learning.

    PubMed

    Zheng, Haiyong; Wang, Ruchen; Yu, Zhibin; Wang, Nan; Gu, Zhaorui; Zheng, Bing

    2017-12-28

    Plankton, including phytoplankton and zooplankton, are the main source of food for organisms in the ocean and form the base of marine food chain. As the fundamental components of marine ecosystems, plankton is very sensitive to environment changes, and the study of plankton abundance and distribution is crucial, in order to understand environment changes and protect marine ecosystems. This study was carried out to develop an extensive applicable plankton classification system with high accuracy for the increasing number of various imaging devices. Literature shows that most plankton image classification systems were limited to only one specific imaging device and a relatively narrow taxonomic scope. The real practical system for automatic plankton classification is even non-existent and this study is partly to fill this gap. Inspired by the analysis of literature and development of technology, we focused on the requirements of practical application and proposed an automatic system for plankton image classification combining multiple view features via multiple kernel learning (MKL). For one thing, in order to describe the biomorphic characteristics of plankton more completely and comprehensively, we combined general features with robust features, especially by adding features like Inner-Distance Shape Context for morphological representation. For another, we divided all the features into different types from multiple views and feed them to multiple classifiers instead of only one by combining different kernel matrices computed from different types of features optimally via multiple kernel learning. Moreover, we also applied feature selection method to choose the optimal feature subsets from redundant features for satisfying different datasets from different imaging devices. We implemented our proposed classification system on three different datasets across more than 20 categories from phytoplankton to zooplankton. The experimental results validated that our system outperforms state-of-the-art plankton image classification systems in terms of accuracy and robustness. This study demonstrated automatic plankton image classification system combining multiple view features using multiple kernel learning. The results indicated that multiple view features combined by NLMKL using three kernel functions (linear, polynomial and Gaussian kernel functions) can describe and use information of features better so that achieve a higher classification accuracy.

  11. On the complexity and approximability of some Euclidean optimal summing problems

    NASA Astrophysics Data System (ADS)

    Eremeev, A. V.; Kel'manov, A. V.; Pyatkin, A. V.

    2016-10-01

    The complexity status of several well-known discrete optimization problems with the direction of optimization switching from maximum to minimum is analyzed. The task is to find a subset of a finite set of Euclidean points (vectors). In these problems, the objective functions depend either only on the norm of the sum of the elements from the subset or on this norm and the cardinality of the subset. It is proved that, if the dimension of the space is a part of the input, then all these problems are strongly NP-hard. Additionally, it is shown that, if the space dimension is fixed, then all the problems are NP-hard even for dimension 2 (on a plane) and there are no approximation algorithms with a guaranteed accuracy bound for them unless P = NP. It is shown that, if the coordinates of the input points are integer, then all the problems can be solved in pseudopolynomial time in the case of a fixed space dimension.

  12. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  13. New TES Search and Subset Application

    Atmospheric Science Data Center

    2017-08-23

    ... Wednesday, September 19, 2012 The Atmospheric Science Data Center (ASDC) at NASA Langley Research Center in collaboration ... pleased to announce the release of the TES Search and Subset Web Application for select TES Level 2 products. Features of the Search and ...

  14. SERDP and ESTCP Expert Panel Workshop on Research and Development Needs for the Environmental Remediation Application of Molecular Biological Tools

    DTIC Science & Technology

    2005-10-01

    used to infer metabolic rates in marine systems. For example, there is evidence from both pure cultures and environmental samples that rbcL...It includes many useful bioinformatics features such as constructing a neighbor-joining tree for a subset of sequences, downloading a subset of...further provide software that allow users to extract useful information from sequences. The most commonly used feature is probe/primer design

  15. In Silico Syndrome Prediction for Coronary Artery Disease in Traditional Chinese Medicine

    PubMed Central

    Lu, Peng; Chen, Jianxin; Zhao, Huihui; Gao, Yibo; Luo, Liangtao; Zuo, Xiaohan; Shi, Qi; Yang, Yiping; Yi, Jianqiang; Wang, Wei

    2012-01-01

    Coronary artery disease (CAD) is the leading causes of deaths in the world. The differentiation of syndrome (ZHENG) is the criterion of diagnosis and therapeutic in TCM. Therefore, syndrome prediction in silico can be improving the performance of treatment. In this paper, we present a Bayesian network framework to construct a high-confidence syndrome predictor based on the optimum subset, that is, collected by Support Vector Machine (SVM) feature selection. Syndrome of CAD can be divided into asthenia and sthenia syndromes. According to the hierarchical characteristics of syndrome, we firstly label every case three types of syndrome (asthenia, sthenia, or both) to solve several syndromes with some patients. On basis of the three syndromes' classes, we design SVM feature selection to achieve the optimum symptom subset and compare this subset with Markov blanket feature select using ROC. Using this subset, the six predictors of CAD's syndrome are constructed by the Bayesian network technique. We also design Naïve Bayes, C4.5 Logistic, Radial basis function (RBF) network compared with Bayesian network. In a conclusion, the Bayesian network method based on the optimum symptoms shows a practical method to predict six syndromes of CAD in TCM. PMID:22567030

  16. GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

    PubMed

    Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann

    2012-09-24

    Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  18. Optimization of camera exposure durations for multi-exposure speckle imaging of the microcirculation

    PubMed Central

    Kazmi, S. M. Shams; Balial, Satyajit; Dunn, Andrew K.

    2014-01-01

    Improved Laser Speckle Contrast Imaging (LSCI) blood flow analyses that incorporate inverse models of the underlying laser-tissue interaction have been used to develop more quantitative implementations of speckle flowmetry such as Multi-Exposure Speckle Imaging (MESI). In this paper, we determine the optimal camera exposure durations required for obtaining flow information with comparable accuracy with the prevailing MESI implementation utilized in recent in vivo rodent studies. A looping leave-one-out (LOO) algorithm was used to identify exposure subsets which were analyzed for accuracy against flows obtained from analysis with the original full exposure set over 9 animals comprising n = 314 regional flow measurements. From the 15 original exposures, 6 exposures were found using the LOO process to provide comparable accuracy, defined as being no more than 10% deviant, with the original flow measurements. The optimal subset of exposures provides a basis set of camera durations for speckle flowmetry studies of the microcirculation and confers a two-fold faster acquisition rate and a 28% reduction in processing time without sacrificing accuracy. Additionally, the optimization process can be used to identify further reductions in the exposure subsets for tailoring imaging over less expansive flow distributions to enable even faster imaging. PMID:25071956

  19. Optimizing an Actuator Array for the Control of Multi-Frequency Noise in Aircraft Interiors

    NASA Technical Reports Server (NTRS)

    Palumbo, D. L.; Padula, S. L.

    1997-01-01

    Techniques developed for selecting an optimized actuator array for interior noise reduction at a single frequency are extended to the multi-frequency case. Transfer functions for 64 actuators were obtained at 5 frequencies from ground testing the rear section of a fully trimmed DC-9 fuselage. A single loudspeaker facing the left side of the aircraft was the primary source. A combinatorial search procedure (tabu search) was employed to find optimum actuator subsets of from 2 to 16 actuators. Noise reduction predictions derived from the transfer functions were used as a basis for evaluating actuator subsets during optimization. Results indicate that it is necessary to constrain actuator forces during optimization. Unconstrained optimizations selected actuators which require unrealistically large forces. Two methods of constraint are evaluated. It is shown that a fast, but approximate, method yields results equivalent to an accurate, but computationally expensive, method.

  20. HIV-1 protease cleavage site prediction based on two-stage feature selection method.

    PubMed

    Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong

    2013-03-01

    Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

  1. Fold assessment for comparative protein structure modeling.

    PubMed

    Melo, Francisco; Sali, Andrej

    2007-11-01

    Accurate and automated assessment of both geometrical errors and incompleteness of comparative protein structure models is necessary for an adequate use of the models. Here, we describe a composite score for discriminating between models with the correct and incorrect fold. To find an accurate composite score, we designed and applied a genetic algorithm method that searched for a most informative subset of 21 input model features as well as their optimized nonlinear transformation into the composite score. The 21 input features included various statistical potential scores, stereochemistry quality descriptors, sequence alignment scores, geometrical descriptors, and measures of protein packing. The optimized composite score was found to depend on (1) a statistical potential z-score for residue accessibilities and distances, (2) model compactness, and (3) percentage sequence identity of the alignment used to build the model. The accuracy of the composite score was compared with the accuracy of assessment by single and combined features as well as by other commonly used assessment methods. The testing set was representative of models produced by automated comparative modeling on a genomic scale. The composite score performed better than any other tested score in terms of the maximum correct classification rate (i.e., 3.3% false positives and 2.5% false negatives) as well as the sensitivity and specificity across the whole range of thresholds. The composite score was implemented in our program MODELLER-8 and was used to assess models in the MODBASE database that contains comparative models for domains in approximately 1.3 million protein sequences.

  2. TMS combined with EEG in genetic generalized epilepsy: A phase II diagnostic accuracy study.

    PubMed

    Kimiskidis, Vasilios K; Tsimpiris, Alkiviadis; Ryvlin, Philippe; Kalviainen, Reetta; Koutroumanidis, Michalis; Valentin, Antonio; Laskaris, Nikolaos; Kugiumtzis, Dimitris

    2017-02-01

    (A) To develop a TMS-EEG stimulation and data analysis protocol in genetic generalized epilepsy (GGE). (B) To investigate the diagnostic accuracy of TMS-EEG in GGE. Pilot experiments resulted in the development and optimization of a paired-pulse TMS-EEG protocol at rest, during hyperventilation (HV), and post-HV combined with multi-level data analysis. This protocol was applied in 11 controls (C) and 25 GGE patients (P), further dichotomized into responders to antiepileptic drugs (R, n=13) and non-responders (n-R, n=12).Features (n=57) extracted from TMS-EEG responses after multi-level analysis were given to a feature selection scheme and a Bayesian classifier, and the accuracy of assigning participants into the classes P-C and R-nR was computed. On the basis of the optimal feature subset, the cross-validated accuracy of TMS-EEG for the classification P-C was 0.86 at rest, 0.81 during HV and 0.92 at post-HV, whereas for R-nR the corresponding figures are 0.80, 0.78 and 0.65, respectively. Applying a fusion approach on all conditions resulted in an accuracy of 0.84 for the classification P-C and 0.76 for the classification R-nR. TMS-EEG can be used for diagnostic purposes and for assessing the response to antiepileptic drugs. TMS-EEG holds significant diagnostic potential in GGE. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.

  3. Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

    PubMed

    Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

    2014-09-09

    A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.

  4. Rough sets and Laplacian score based cost-sensitive feature selection

    PubMed Central

    Yu, Shenglong

    2018-01-01

    Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884

  5. Rough sets and Laplacian score based cost-sensitive feature selection.

    PubMed

    Yu, Shenglong; Zhao, Hong

    2018-01-01

    Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  6. Pharmacogenetic research in partnership with American Indian and Alaska Native communities

    PubMed Central

    Woodahl, Erica L; Lesko, Lawrence J; Hopkins, Scarlett; Robinson, Renee F; Thummel, Kenneth E; Burke, Wylie

    2014-01-01

    Pharmacogenetics is a subset of personalized medicine that applies knowledge about genetic variation in gene–drug pairs to help guide optimal dosing. There is a lack of data, however, about pharmacogenetic variation in underserved populations. One strategy for increasing participation of underserved populations in pharmacogenetic research is to include communities in the research process. We have established academic–community partnerships with American Indian and Alaska Native people living in Alaska and Montana to study pharmacogenetics. Key features of the partnership include community oversight of the project, research objectives that address community health priorities, and bidirectional learning that builds capacity in both the community and the research team. Engaging the community as coresearchers can help build trust to advance pharmacogenetic research objectives. PMID:25141898

  7. Activity recognition using dynamic multiple sensor fusion in body sensor networks.

    PubMed

    Gao, Lei; Bourke, Alan K; Nelson, John

    2012-01-01

    Multiple sensor fusion is a main research direction for activity recognition. However, there are two challenges in those systems: the energy consumption due to the wireless transmission and the classifier design because of the dynamic feature vector. This paper proposes a multi-sensor fusion framework, which consists of the sensor selection module and the hierarchical classifier. The sensor selection module adopts the convex optimization to select the sensor subset in real time. The hierarchical classifier combines the Decision Tree classifier with the Naïve Bayes classifier. The dataset collected from 8 subjects, who performed 8 scenario activities, was used to evaluate the proposed system. The results show that the proposed system can obviously reduce the energy consumption while guaranteeing the recognition accuracy.

  8. Multiview Locally Linear Embedding for Effective Medical Image Retrieval

    PubMed Central

    Shen, Hualei; Tao, Dacheng; Ma, Dianfu

    2013-01-01

    Content-based medical image retrieval continues to gain attention for its potential to assist radiological image interpretation and decision making. Many approaches have been proposed to improve the performance of medical image retrieval system, among which visual features such as SIFT, LBP, and intensity histogram play a critical role. Typically, these features are concatenated into a long vector to represent medical images, and thus traditional dimension reduction techniques such as locally linear embedding (LLE), principal component analysis (PCA), or laplacian eigenmaps (LE) can be employed to reduce the “curse of dimensionality”. Though these approaches show promising performance for medical image retrieval, the feature-concatenating method ignores the fact that different features have distinct physical meanings. In this paper, we propose a new method called multiview locally linear embedding (MLLE) for medical image retrieval. Following the patch alignment framework, MLLE preserves the geometric structure of the local patch in each feature space according to the LLE criterion. To explore complementary properties among a range of features, MLLE assigns different weights to local patches from different feature spaces. Finally, MLLE employs global coordinate alignment and alternating optimization techniques to learn a smooth low-dimensional embedding from different features. To justify the effectiveness of MLLE for medical image retrieval, we compare it with conventional spectral embedding methods. We conduct experiments on a subset of the IRMA medical image data set. Evaluation results show that MLLE outperforms state-of-the-art dimension reduction methods. PMID:24349277

  9. The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.

    PubMed

    Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo

    2018-05-17

    The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.

  10. Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.

    PubMed

    Li, Yan; Gu, Leon; Kanade, Takeo

    2011-09-01

    Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.

  11. Use of MODIS Cloud Top Pressure to Improve Assimilation Yields of AIRS Radiances in GSI

    NASA Technical Reports Server (NTRS)

    Zavodsky, Bradley; Srikishen, Jayanthi

    2014-01-01

    Radiances from hyperspectral sounders such as the Atmospheric Infrared Sounder (AIRS) are routinely assimilated both globally and regionally in operational numerical weather prediction (NWP) systems using the Gridpoint Statistical Interpolation (GSI) data assimilation system. However, only thinned, cloud-free radiances from a 281-channel subset are used, so the overall percentage of these observations that are assimilated is somewhere on the order of 5%. Cloud checks are performed within GSI to determine which channels peak above cloud top; inaccuracies may lead to less assimilated radiances or introduction of biases from cloud-contaminated radiances.Relatively large footprint from AIRS may not optimally represent small-scale cloud features that might be better resolved by higher-resolution imagers like the Moderate Resolution Imaging Spectroradiometer (MODIS). Objective of this project is to "swap" the MODIS-derived cloud top pressure (CTP) for that designated by the AIRS-only quality control within GSI to test the hypothesis that better representation of cloud features will result in higher assimilated radiance yields and improved forecasts.

  12. Evaluation of subset matching methods and forms of covariate balance.

    PubMed

    de Los Angeles Resa, María; Zubizarreta, José R

    2016-11-30

    This paper conducts a Monte Carlo simulation study to evaluate the performance of multivariate matching methods that select a subset of treatment and control observations. The matching methods studied are the widely used nearest neighbor matching with propensity score calipers and the more recently proposed methods, optimal matching of an optimally chosen subset and optimal cardinality matching. The main findings are: (i) covariate balance, as measured by differences in means, variance ratios, Kolmogorov-Smirnov distances, and cross-match test statistics, is better with cardinality matching because by construction it satisfies balance requirements; (ii) for given levels of covariate balance, the matched samples are larger with cardinality matching than with the other methods; (iii) in terms of covariate distances, optimal subset matching performs best; (iv) treatment effect estimates from cardinality matching have lower root-mean-square errors, provided strong requirements for balance, specifically, fine balance, or strength-k balance, plus close mean balance. In standard practice, a matched sample is considered to be balanced if the absolute differences in means of the covariates across treatment groups are smaller than 0.1 standard deviations. However, the simulation results suggest that stronger forms of balance should be pursued in order to remove systematic biases due to observed covariates when a difference in means treatment effect estimator is used. In particular, if the true outcome model is additive, then marginal distributions should be balanced, and if the true outcome model is additive with interactions, then low-dimensional joints should be balanced. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. The human Vδ2+ T-cell compartment comprises distinct innate-like Vγ9+ and adaptive Vγ9- subsets.

    PubMed

    Davey, Martin S; Willcox, Carrie R; Hunter, Stuart; Kasatskaya, Sofya A; Remmerswaal, Ester B M; Salim, Mahboob; Mohammed, Fiyaz; Bemelman, Frederike J; Chudakov, Dmitriy M; Oo, Ye H; Willcox, Benjamin E

    2018-05-02

    Vδ2 + T cells form the predominant human γδ T-cell population in peripheral blood and mediate T-cell receptor (TCR)-dependent anti-microbial and anti-tumour immunity. Here we show that the Vδ2 + compartment comprises both innate-like and adaptive subsets. Vγ9 + Vδ2 + T cells display semi-invariant TCR repertoires, featuring public Vγ9 TCR sequences equivalent in cord and adult blood. By contrast, we also identify a separate, Vγ9 - Vδ2 + T-cell subset that typically has a CD27 hi CCR7 + CD28 + IL-7Rα + naive-like phenotype and a diverse TCR repertoire, however in response to viral infection, undergoes clonal expansion and differentiation to a CD27 lo CD45RA + CX 3 CR1 + granzymeA/B + effector phenotype. Consistent with a function in solid tissue immunosurveillance, we detect human intrahepatic Vγ9 - Vδ2 + T cells featuring dominant clonal expansions and an effector phenotype. These findings redefine human γδ T-cell subsets by delineating the Vδ2 + T-cell compartment into innate-like (Vγ9 + ) and adaptive (Vγ9 - ) subsets, which have distinct functions in microbial immunosurveillance.

  14. Statistical Learning of Origin-Specific Statically Optimal Individualized Treatment Rules

    PubMed Central

    van der Laan, Mark J.; Petersen, Maya L.

    2008-01-01

    Consider a longitudinal observational or controlled study in which one collects chronological data over time on a random sample of subjects. The time-dependent process one observes on each subject contains time-dependent covariates, time-dependent treatment actions, and an outcome process or single final outcome of interest. A statically optimal individualized treatment rule (as introduced in van der Laan et. al. (2005), Petersen et. al. (2007)) is a treatment rule which at any point in time conditions on a user-supplied subset of the past, computes the future static treatment regimen that maximizes a (conditional) mean future outcome of interest, and applies the first treatment action of the latter regimen. In particular, Petersen et. al. (2007) clarified that, in order to be statically optimal, an individualized treatment rule should not depend on the observed treatment mechanism. Petersen et. al. (2007) further developed estimators of statically optimal individualized treatment rules based on a past capturing all confounding of past treatment history on outcome. In practice, however, one typically wishes to find individualized treatment rules responding to a user-supplied subset of the complete observed history, which may not be sufficient to capture all confounding. The current article provides an important advance on Petersen et. al. (2007) by developing locally efficient double robust estimators of statically optimal individualized treatment rules responding to such a user-supplied subset of the past. However, failure to capture all confounding comes at a price; the static optimality of the resulting rules becomes origin-specific. We explain origin-specific static optimality, and discuss the practical importance of the proposed methodology. We further present the results of a data analysis in which we estimate a statically optimal rule for switching antiretroviral therapy among patients infected with resistant HIV virus. PMID:19122792

  15. Histological Image Feature Mining Reveals Emergent Diagnostic Properties for Renal Cancer

    PubMed Central

    Kothari, Sonal; Phan, John H.; Young, Andrew N.; Wang, May D.

    2016-01-01

    Computer-aided histological image classification systems are important for making objective and timely cancer diagnostic decisions. These systems use combinations of image features that quantify a variety of image properties. Because researchers tend to validate their diagnostic systems on specific cancer endpoints, it is difficult to predict which image features will perform well given a new cancer endpoint. In this paper, we define a comprehensive set of common image features (consisting of 12 distinct feature subsets) that quantify a variety of image properties. We use a data-mining approach to determine which feature subsets and image properties emerge as part of an “optimal” diagnostic model when applied to specific cancer endpoints. Our goal is to assess the performance of such comprehensive image feature sets for application to a wide variety of diagnostic problems. We perform this study on 12 endpoints including 6 renal tumor subtype endpoints and 6 renal cancer grade endpoints. Keywords-histology, image mining, computer-aided diagnosis PMID:28163980

  16. Correlative feature analysis of FFDM images

    NASA Astrophysics Data System (ADS)

    Yuan, Yading; Giger, Maryellen L.; Li, Hui; Sennett, Charlene

    2008-03-01

    Identifying the corresponding image pair of a lesion is an essential step for combining information from different views of the lesion to improve the diagnostic ability of both radiologists and CAD systems. Because of the non-rigidity of the breasts and the 2D projective property of mammograms, this task is not trivial. In this study, we present a computerized framework that differentiates the corresponding images from different views of a lesion from non-corresponding ones. A dual-stage segmentation method, which employs an initial radial gradient index(RGI) based segmentation and an active contour model, was initially applied to extract mass lesions from the surrounding tissues. Then various lesion features were automatically extracted from each of the two views of each lesion to quantify the characteristics of margin, shape, size, texture and context of the lesion, as well as its distance to nipple. We employed a two-step method to select an effective subset of features, and combined it with a BANN to obtain a discriminant score, which yielded an estimate of the probability that the two images are of the same physical lesion. ROC analysis was used to evaluate the performance of the individual features and the selected feature subset in the task of distinguishing between corresponding and non-corresponding pairs. By using a FFDM database with 124 corresponding image pairs and 35 non-corresponding pairs, the distance feature yielded an AUC (area under the ROC curve) of 0.8 with leave-one-out evaluation by lesion, and the feature subset, which includes distance feature, lesion size and lesion contrast, yielded an AUC of 0.86. The improvement by using multiple features was statistically significant as compared to single feature performance. (p<0.001)

  17. An Active RBSE Framework to Generate Optimal Stimulus Sequences in a BCI for Spelling

    NASA Astrophysics Data System (ADS)

    Moghadamfalahi, Mohammad; Akcakaya, Murat; Nezamfar, Hooman; Sourati, Jamshid; Erdogmus, Deniz

    2017-10-01

    A class of brain computer interfaces (BCIs) employs noninvasive recordings of electroencephalography (EEG) signals to enable users with severe speech and motor impairments to interact with their environment and social network. For example, EEG based BCIs for typing popularly utilize event related potentials (ERPs) for inference. Presentation paradigm design in current ERP-based letter by letter typing BCIs typically query the user with an arbitrary subset characters. However, the typing accuracy and also typing speed can potentially be enhanced with more informed subset selection and flash assignment. In this manuscript, we introduce the active recursive Bayesian state estimation (active-RBSE) framework for inference and sequence optimization. Prior to presentation in each iteration, rather than showing a subset of randomly selected characters, the developed framework optimally selects a subset based on a query function. Selected queries are made adaptively specialized for users during each intent detection. Through a simulation-based study, we assess the effect of active-RBSE on the performance of a language-model assisted typing BCI in terms of typing speed and accuracy. To provide a baseline for comparison, we also utilize standard presentation paradigms namely, row and column matrix presentation paradigm and also random rapid serial visual presentation paradigms. The results show that utilization of active-RBSE can enhance the online performance of the system, both in terms of typing accuracy and speed.

  18. Analysis of the single-vehicle cyclic inventory routing problem

    NASA Astrophysics Data System (ADS)

    Aghezzaf, El-Houssaine; Zhong, Yiqing; Raa, Birger; Mateo, Manel

    2012-11-01

    The single-vehicle cyclic inventory routing problem (SV-CIRP) consists of a repetitive distribution of a product from a single depot to a selected subset of customers. For each customer, selected for replenishments, the supplier collects a corresponding fixed reward. The objective is to determine the subset of customers to replenish, the quantity of the product to be delivered to each and to design the vehicle route so that the resulting profit (difference between the total reward and the total logistical cost) is maximised while preventing stockouts at each of the selected customers. This problem appears often as a sub-problem in many logistical problems. In this article, the SV-CIRP is formulated as a mixed-integer program with a nonlinear objective function. After a thorough analysis of the structure of the problem and its features, an exact algorithm for its solution is proposed. This exact algorithm requires only solutions of linear mixed-integer programs. Values of a savings-based heuristic for this problem are compared to the optimal values obtained for a set of some test problems. In general, the gap may get as large as 25%, which justifies the effort to continue exploring and developing exact and approximation algorithms for the SV-CIRP.

  19. Volume 19, Issue8 (December 2004)Articles in the Current Issue:Research ArticleTowards automation of palynology 1: analysis of pollen shape and ornamentation using simple geometric measures, derived from scanning electron microscope images

    NASA Astrophysics Data System (ADS)

    Treloar, W. J.; Taylor, G. E.; Flenley, J. R.

    2004-12-01

    This is the first of a series of papers on the theme of automated pollen analysis. The automation of pollen analysis could result in numerous advantages for the reconstruction of past environments, with larger data sets made practical, objectivity and fine resolution sampling. There are also applications in apiculture and medicine. Previous work on the classification of pollen using texture measures has been successful with small numbers of pollen taxa. However, as the number of pollen taxa to be identified increases, more features may be required to achieve a successful classification. This paper describes the use of simple geometric measures to augment the texture measures. The feasibility of this new approach is tested using scanning electron microscope (SEM) images of 12 taxa of fresh pollen taken from reference material collected on Henderson Island, Polynesia. Pollen images were captured directly from a SEM connected to a PC. A threshold grey-level was set and binary images were then generated. Pollen edges were then located and the boundaries were traced using a chain coding system. A number of simple geometric variables were calculated directly from the chain code of the pollen and a variable selection procedure was used to choose the optimal subset to be used for classification. The efficiency of these variables was tested using a leave-one-out classification procedure. The system successfully split the original 12 taxa sample into five sub-samples containing no more than six pollen taxa each. The further subdivision of echinate pollen types was then attempted with a subset of four pollen taxa. A set of difference codes was constructed for a range of displacements along the chain code. From these difference codes probability variables were calculated. A variable selection procedure was again used to choose the optimal subset of probabilities that may be used for classification. The efficiency of these variables was again tested using a leave-one-out classification procedure. The proportion of correctly classified pollen ranged from 81% to 100% depending on the subset of variables used. The best set of variables had an overall classification rate averaging at about 95%. This is comparable with the classification rates from the earlier texture analysis work for other types of pollen. Copyright

  20. Efficient Simulation Budget Allocation for Selecting an Optimal Subset

    NASA Technical Reports Server (NTRS)

    Chen, Chun-Hung; He, Donghai; Fu, Michael; Lee, Loo Hay

    2008-01-01

    We consider a class of the subset selection problem in ranking and selection. The objective is to identify the top m out of k designs based on simulated output. Traditional procedures are conservative and inefficient. Using the optimal computing budget allocation framework, we formulate the problem as that of maximizing the probability of correc tly selecting all of the top-m designs subject to a constraint on the total number of samples available. For an approximation of this corre ct selection probability, we derive an asymptotically optimal allocat ion and propose an easy-to-implement heuristic sequential allocation procedure. Numerical experiments indicate that the resulting allocatio ns are superior to other methods in the literature that we tested, and the relative efficiency increases for larger problems. In addition, preliminary numerical results indicate that the proposed new procedur e has the potential to enhance computational efficiency for simulation optimization.

  1. Automatic design of basin-specific drought indexes for highly regulated water systems

    NASA Astrophysics Data System (ADS)

    Zaniolo, Marta; Giuliani, Matteo; Castelletti, Andrea Francesco; Pulido-Velazquez, Manuel

    2018-04-01

    Socio-economic costs of drought are progressively increasing worldwide due to undergoing alterations of hydro-meteorological regimes induced by climate change. Although drought management is largely studied in the literature, traditional drought indexes often fail at detecting critical events in highly regulated systems, where natural water availability is conditioned by the operation of water infrastructures such as dams, diversions, and pumping wells. Here, ad hoc index formulations are usually adopted based on empirical combinations of several, supposed-to-be significant, hydro-meteorological variables. These customized formulations, however, while effective in the design basin, can hardly be generalized and transferred to different contexts. In this study, we contribute FRIDA (FRamework for Index-based Drought Analysis), a novel framework for the automatic design of basin-customized drought indexes. In contrast to ad hoc empirical approaches, FRIDA is fully automated, generalizable, and portable across different basins. FRIDA builds an index representing a surrogate of the drought conditions of the basin, computed by combining all the relevant available information about the water circulating in the system identified by means of a feature extraction algorithm. We used the Wrapper for Quasi-Equally Informative Subset Selection (W-QEISS), which features a multi-objective evolutionary algorithm to find Pareto-efficient subsets of variables by maximizing the wrapper accuracy, minimizing the number of selected variables, and optimizing relevance and redundancy of the subset. The preferred variable subset is selected among the efficient solutions and used to formulate the final index according to alternative model structures. We apply FRIDA to the case study of the Jucar river basin (Spain), a drought-prone and highly regulated Mediterranean water resource system, where an advanced drought management plan relying on the formulation of an ad hoc state index is used for triggering drought management measures. The state index was constructed empirically with a trial-and-error process begun in the 1980s and finalized in 2007, guided by the experts from the Confederación Hidrográfica del Júcar (CHJ). Our results show that the automated variable selection outcomes align with CHJ's 25-year-long empirical refinement. In addition, the resultant FRIDA index outperforms the official State Index in terms of accuracy in reproducing the target variable and cardinality of the selected inputs set.

  2. Designing basin-customized combined drought indices via feature extraction

    NASA Astrophysics Data System (ADS)

    Zaniolo, Marta; Giuliani, Matteo; Castelletti, Andrea

    2017-04-01

    The socio-economic costs of drought are progressively increasing worldwide due to the undergoing alteration of hydro-meteorological regimes induced by climate change. Although drought management is largely studied in the literature, most of the traditional drought indexes fail in detecting critical events in highly regulated systems, which generally rely on ad-hoc formulations and cannot be generalized to different context. In this study, we contribute a novel framework for the design of a basin-customized drought index. This index represents a surrogate of the state of the basin and is computed by combining the available information about the water available in the system to reproduce a representative target variable for the drought condition of the basin (e.g., water deficit). To select the relevant variables and how to combine them, we use an advanced feature extraction algorithm called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS). The W-QEISS algorithm relies on a multi-objective evolutionary algorithm to find Pareto-efficient subsets of variables by maximizing the wrapper accuracy, minimizing the number of selected variables (cardinality) and optimizing relevance and redundancy of the subset. The accuracy objective is evaluated trough the calibration of a pre-defined model (i.e., an extreme learning machine) of the water deficit for each candidate subset of variables, with the index selected from the resulting solutions identifying a suitable compromise between accuracy, cardinality, relevance, and redundancy. The proposed methodology is tested in the case study of Lake Como in northern Italy, a regulated lake mainly operated for irrigation supply to four downstream agricultural districts. In the absence of an institutional drought monitoring system, we constructed the combined index using all the hydrological variables from the existing monitoring system as well as the most common drought indicators at multiple time aggregations. The soil moisture deficit in the root zone computed by a distributed-parameter water balance model of the agricultural districts is used as target variable. Numerical results show that our framework succeeds in constructing a combined drought index that reproduces the soil moisture deficit. Moreover, this index represents a valuable information for supporting appropriate drought management strategies, including the possibility of directly informing the lake operations about the drought conditions and improve the overall reliability of the irrigation supply system.

  3. pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine.

    PubMed

    Zhang, Shanxin; Zhou, Zhiping; Chen, Xinmeng; Hu, Yong; Yang, Lindong

    2017-08-07

    DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. An unsupervised technique for optimal feature selection in attribute profiles for spectral-spatial classification of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Bhardwaj, Kaushal; Patra, Swarnajyoti

    2018-04-01

    Inclusion of spatial information along with spectral features play a significant role in classification of remote sensing images. Attribute profiles have already proved their ability to represent spatial information. In order to incorporate proper spatial information, multiple attributes are required and for each attribute large profiles need to be constructed by varying the filter parameter values within a wide range. Thus, the constructed profiles that represent spectral-spatial information of an hyperspectral image have huge dimension which leads to Hughes phenomenon and increases computational burden. To mitigate these problems, this work presents an unsupervised feature selection technique that selects a subset of filtered image from the constructed high dimensional multi-attribute profile which are sufficiently informative to discriminate well among classes. In this regard the proposed technique exploits genetic algorithms (GAs). The fitness function of GAs are defined in an unsupervised way with the help of mutual information. The effectiveness of the proposed technique is assessed using one-against-all support vector machine classifier. The experiments conducted on three hyperspectral data sets show the robustness of the proposed method in terms of computation time and classification accuracy.

  5. Predictive classification of self-paced upper-limb analytical movements with EEG.

    PubMed

    Ibáñez, Jaime; Serrano, J I; del Castillo, M D; Minguez, J; Pons, J L

    2015-11-01

    The extent to which the electroencephalographic activity allows the characterization of movements with the upper limb is an open question. This paper describes the design and validation of a classifier of upper-limb analytical movements based on electroencephalographic activity extracted from intervals preceding self-initiated movement tasks. Features selected for the classification are subject specific and associated with the movement tasks. Further tests are performed to reject the hypothesis that other information different from the task-related cortical activity is being used by the classifiers. Six healthy subjects were measured performing self-initiated upper-limb analytical movements. A Bayesian classifier was used to classify among seven different kinds of movements. Features considered covered the alpha and beta bands. A genetic algorithm was used to optimally select a subset of features for the classification. An average accuracy of 62.9 ± 7.5% was reached, which was above the baseline level observed with the proposed methodology (30.2 ± 4.3%). The study shows how the electroencephalography carries information about the type of analytical movement performed with the upper limb and how it can be decoded before the movement begins. In neurorehabilitation environments, this information could be used for monitoring and assisting purposes.

  6. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia.

    PubMed

    Kavakiotis, Ioannis; Xochelli, Aliki; Agathangelidis, Andreas; Tsoumakas, Grigorios; Maglaveras, Nicos; Stamatopoulos, Kostas; Hadzidimitriou, Anastasia; Vlahavas, Ioannis; Chouvarda, Ioanna

    2016-06-06

    Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.

  7. Probabilistic quantum cloning of a subset of linearly dependent states

    NASA Astrophysics Data System (ADS)

    Rui, Pinshu; Zhang, Wen; Liao, Yanlin; Zhang, Ziyun

    2018-02-01

    It is well known that a quantum state, secretly chosen from a certain set, can be probabilistically cloned with positive cloning efficiencies if and only if all the states in the set are linearly independent. In this paper, we focus on probabilistic quantum cloning of a subset of linearly dependent states. We show that a linearly-independent subset of linearly-dependent quantum states {| Ψ 1⟩,| Ψ 2⟩,…,| Ψ n ⟩} can be probabilistically cloned if and only if any state in the subset cannot be expressed as a linear superposition of the other states in the set {| Ψ 1⟩,| Ψ 2⟩,…,| Ψ n ⟩}. The optimal cloning efficiencies are also investigated.

  8. Feature selection in feature network models: finding predictive subsets of features with the Positive Lasso.

    PubMed

    Frank, Laurence E; Heiser, Willem J

    2008-05-01

    A set of features is the basis for the network representation of proximity data achieved by feature network models (FNMs). Features are binary variables that characterize the objects in an experiment, with some measure of proximity as response variable. Sometimes features are provided by theory and play an important role in the construction of the experimental conditions. In some research settings, the features are not known a priori. This paper shows how to generate features in this situation and how to select an adequate subset of features that takes into account a good compromise between model fit and model complexity, using a new version of least angle regression that restricts coefficients to be non-negative, called the Positive Lasso. It will be shown that features can be generated efficiently with Gray codes that are naturally linked to the FNMs. The model selection strategy makes use of the fact that FNM can be considered as univariate multiple regression model. A simulation study shows that the proposed strategy leads to satisfactory results if the number of objects is less than or equal to 22. If the number of objects is larger than 22, the number of features selected by our method exceeds the true number of features in some conditions.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aghaei, Faranak; Tan, Maxine; Liu, Hong

    Purpose: To identify a new clinical marker based on quantitative kinetic image features analysis and assess its feasibility to predict tumor response to neoadjuvant chemotherapy. Methods: The authors assembled a dataset involving breast MR images acquired from 68 cancer patients before undergoing neoadjuvant chemotherapy. Among them, 25 patients had complete response (CR) and 43 had partial and nonresponse (NR) to chemotherapy based on the response evaluation criteria in solid tumors. The authors developed a computer-aided detection scheme to segment breast areas and tumors depicted on the breast MR images and computed a total of 39 kinetic image features from bothmore » tumor and background parenchymal enhancement regions. The authors then applied and tested two approaches to classify between CR and NR cases. The first one analyzed each individual feature and applied a simple feature fusion method that combines classification results from multiple features. The second approach tested an attribute selected classifier that integrates an artificial neural network (ANN) with a wrapper subset evaluator, which was optimized using a leave-one-case-out validation method. Results: In the pool of 39 features, 10 yielded relatively higher classification performance with the areas under receiver operating characteristic curves (AUCs) ranging from 0.61 to 0.78 to classify between CR and NR cases. Using a feature fusion method, the maximum AUC = 0.85 ± 0.05. Using the ANN-based classifier, AUC value significantly increased to 0.96 ± 0.03 (p < 0.01). Conclusions: This study demonstrated that quantitative analysis of kinetic image features computed from breast MR images acquired prechemotherapy has potential to generate a useful clinical marker in predicting tumor response to chemotherapy.« less

  10. An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes

    PubMed Central

    2013-01-01

    Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960

  11. Choice: 36 band feature selection software with applications to multispectral pattern recognition

    NASA Technical Reports Server (NTRS)

    Jones, W. C.

    1973-01-01

    Feature selection software was developed at the Earth Resources Laboratory that is capable of inputting up to 36 channels and selecting channel subsets according to several criteria based on divergence. One of the criterion used is compatible with the table look-up classifier requirements. The software indicates which channel subset best separates (based on average divergence) each class from all other classes. The software employs an exhaustive search technique, and computer time is not prohibitive. A typical task to select the best 4 of 22 channels for 12 classes takes 9 minutes on a Univac 1108 computer.

  12. Multimodal biometric approach for cancelable face template generation

    NASA Astrophysics Data System (ADS)

    Paul, Padma Polash; Gavrilova, Marina

    2012-06-01

    Due to the rapid growth of biometric technology, template protection becomes crucial to secure integrity of the biometric security system and prevent unauthorized access. Cancelable biometrics is emerging as one of the best solutions to secure the biometric identification and verification system. We present a novel technique for robust cancelable template generation algorithm that takes advantage of the multimodal biometric using feature level fusion. Feature level fusion of different facial features is applied to generate the cancelable template. A proposed algorithm based on the multi-fold random projection and fuzzy communication scheme is used for this purpose. In cancelable template generation, one of the main difficulties is keeping interclass variance of the feature. We have found that interclass variations of the features that are lost during multi fold random projection can be recovered using fusion of different feature subsets and projecting in a new feature domain. Applying the multimodal technique in feature level, we enhance the interclass variability hence improving the performance of the system. We have tested the system for classifier fusion for different feature subset and different cancelable template fusion. Experiments have shown that cancelable template improves the performance of the biometric system compared with the original template.

  13. Optimized diffusion gradient orientation schemes for corrupted clinical DTI data sets.

    PubMed

    Dubois, J; Poupon, C; Lethimonnier, F; Le Bihan, D

    2006-08-01

    A method is proposed for generating schemes of diffusion gradient orientations which allow the diffusion tensor to be reconstructed from partial data sets in clinical DT-MRI, should the acquisition be corrupted or terminated before completion because of patient motion. A general energy-minimization electrostatic model was developed in which the interactions between orientations are weighted according to their temporal order during acquisition. In this report, two corruption scenarios were specifically considered for generating relatively uniform schemes of 18 and 60 orientations, with useful subsets of 6 and 15 orientations. The sets and subsets were compared to conventional sets through their energy, condition number and rotational invariance. Schemes of 18 orientations were tested on a volunteer. The optimized sets were similar to uniform sets in terms of energy, condition number and rotational invariance, whether the complete set or only a subset was considered. Diffusion maps obtained in vivo were close to those for uniform sets whatever the acquisition time was. This was not the case with conventional schemes, whose subset uniformity was insufficient. With the proposed approach, sets of orientations responding to several corruption scenarios can be generated, which is potentially useful for imaging uncooperative patients or infants.

  14. Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.

    PubMed

    Donoho, David; Jin, Jiashun

    2008-09-30

    In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.

  15. Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

    PubMed Central

    Donoho, David; Jin, Jiashun

    2008-01-01

    In important application fields today—genomics and proteomics are examples—selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, …, p, let πi denote the two-sided P-value associated with the ith feature Z-score and π(i) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p − π(i))/i/p(1−i/p). We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT. PMID:18815365

  16. Predictive value of initial FDG-PET features for treatment response and survival in esophageal cancer patients treated with chemo-radiation therapy using a random forest classifier.

    PubMed

    Desbordes, Paul; Ruan, Su; Modzelewski, Romain; Pineau, Pascal; Vauclin, Sébastien; Gouel, Pierrick; Michel, Pierre; Di Fiore, Frédéric; Vera, Pierre; Gardin, Isabelle

    2017-01-01

    In oncology, texture features extracted from positron emission tomography with 18-fluorodeoxyglucose images (FDG-PET) are of increasing interest for predictive and prognostic studies, leading to several tens of features per tumor. To select the best features, the use of a random forest (RF) classifier was investigated. Sixty-five patients with an esophageal cancer treated with a combined chemo-radiation therapy were retrospectively included. All patients underwent a pretreatment whole-body FDG-PET. The patients were followed for 3 years after the end of the treatment. The response assessment was performed 1 month after the end of the therapy. Patients were classified as complete responders and non-complete responders. Sixty-one features were extracted from medical records and PET images. First, Spearman's analysis was performed to eliminate correlated features. Then, the best predictive and prognostic subsets of features were selected using a RF algorithm. These results were compared to those obtained by a Mann-Whitney U test (predictive study) and a univariate Kaplan-Meier analysis (prognostic study). Among the 61 initial features, 28 were not correlated. From these 28 features, the best subset of complementary features found using the RF classifier to predict response was composed of 2 features: metabolic tumor volume (MTV) and homogeneity from the co-occurrence matrix. The corresponding predictive value (AUC = 0.836 ± 0.105, Se = 82 ± 9%, Sp = 91 ± 12%) was higher than the best predictive results found using the Mann-Whitney test: busyness from the gray level difference matrix (P < 0.0001, AUC = 0.810, Se = 66%, Sp = 88%). The best prognostic subset found using RF was composed of 3 features: MTV and 2 clinical features (WHO status and nutritional risk index) (AUC = 0.822 ± 0.059, Se = 79 ± 9%, Sp = 95 ± 6%), while no feature was significantly prognostic according to the Kaplan-Meier analysis. The RF classifier can improve predictive and prognostic values compared to the Mann-Whitney U test and the univariate Kaplan-Meier survival analysis when applied to several tens of features in a limited patient database.

  17. Input Decimated Ensembles

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)

    2001-01-01

    Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.

  18. Diagnostic support for glaucoma using retinal images: a hybrid image analysis and data mining approach.

    PubMed

    Yu, Jin; Abidi, Syed Sibte Raza; Artes, Paul; McIntyre, Andy; Heywood, Malcolm

    2005-01-01

    The availability of modern imaging techniques such as Confocal Scanning Laser Tomography (CSLT) for capturing high-quality optic nerve images offer the potential for developing automatic and objective methods for diagnosing glaucoma. We present a hybrid approach that features the analysis of CSLT images using moment methods to derive abstract image defining features. The features are then used to train classifers for automatically distinguishing CSLT images of normal and glaucoma patient. As a first, in this paper, we present investigations in feature subset selction methods for reducing the relatively large input space produced by the moment methods. We use neural networks and support vector machines to determine a sub-set of moments that offer high classification accuracy. We demonstratee the efficacy of our methods to discriminate between healthy and glaucomatous optic disks based on shape information automatically derived from optic disk topography and reflectance images.

  19. An evaluation of exact methods for the multiple subset maximum cardinality selection problem.

    PubMed

    Brusco, Michael J; Köhn, Hans-Friedrich; Steinley, Douglas

    2016-05-01

    The maximum cardinality subset selection problem requires finding the largest possible subset from a set of objects, such that one or more conditions are satisfied. An important extension of this problem is to extract multiple subsets, where the addition of one more object to a larger subset would always be preferred to increases in the size of one or more smaller subsets. We refer to this as the multiple subset maximum cardinality selection problem (MSMCSP). A recently published branch-and-bound algorithm solves the MSMCSP as a partitioning problem. Unfortunately, the computational requirement associated with the algorithm is often enormous, thus rendering the method infeasible from a practical standpoint. In this paper, we present an alternative approach that successively solves a series of binary integer linear programs to obtain a globally optimal solution to the MSMCSP. Computational comparisons of the methods using published similarity data for 45 food items reveal that the proposed sequential method is computationally far more efficient than the branch-and-bound approach. © 2016 The British Psychological Society.

  20. A genetic programming approach to oral cancer prognosis.

    PubMed

    Tan, Mei Sze; Tan, Jing Wei; Chang, Siow-Wee; Yap, Hwa Jen; Abdul Kareem, Sameem; Zain, Rosnah Binti

    2016-01-01

    The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis. GP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP. The result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis. Some of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis.

  1. Evaluating suitability of Pol-SAR (TerraSAR-X, Radarsat-2) for automated sea ice classification

    NASA Astrophysics Data System (ADS)

    Ressel, Rudolf; Singha, Suman; Lehner, Susanne

    2016-05-01

    Satellite borne SAR imagery has become an invaluable tool in the field of sea ice monitoring. Previously, single polarimetric imagery were employed in supervised and unsupervised classification schemes for sea ice investigation, which was preceded by image processing techniques such as segmentation and textural features. Recently, through the advent of polarimetric SAR sensors, investigation of polarimetric features in sea ice has attracted increased attention. While dual-polarimetric data has already been investigated in a number of works, full-polarimetric data has so far not been a major scientific focus. To explore the possibilities of full-polarimetric data and compare the differences in C- and X-bands, we endeavor to analyze in detail an array of datasets, simultaneously acquired, in C-band (RADARSAT-2) and X-band (TerraSAR-X) over ice infested areas. First, we propose an array of polarimetric features (Pauli and lexicographic based). Ancillary data from national ice services, SMOS data and expert judgement were utilized to identify the governing ice regimes. Based on these observations, we then extracted mentioned features. The subsequent supervised classification approach was based on an Artificial Neural Network (ANN). To gain quantitative insight into the quality of the features themselves (and reduce a possible impact of the Hughes phenomenon), we employed mutual information to unearth the relevance and redundancy of features. The results of this information theoretic analysis guided a pruning process regarding the optimal subset of features. In the last step we compared the classified results of all sensors and images, stated respective accuracies and discussed output discrepancies in the cases of simultaneous acquisitions.

  2. A novel automated spike sorting algorithm with adaptable feature extraction.

    PubMed

    Bestel, Robert; Daus, Andreas W; Thielemann, Christiane

    2012-10-15

    To study the electrophysiological properties of neuronal networks, in vitro studies based on microelectrode arrays have become a viable tool for analysis. Although in constant progress, a challenging task still remains in this area: the development of an efficient spike sorting algorithm that allows an accurate signal analysis at the single-cell level. Most sorting algorithms currently available only extract a specific feature type, such as the principal components or Wavelet coefficients of the measured spike signals in order to separate different spike shapes generated by different neurons. However, due to the great variety in the obtained spike shapes, the derivation of an optimal feature set is still a very complex issue that current algorithms struggle with. To address this problem, we propose a novel algorithm that (i) extracts a variety of geometric, Wavelet and principal component-based features and (ii) automatically derives a feature subset, most suitable for sorting an individual set of spike signals. Thus, there is a new approach that evaluates the probability distribution of the obtained spike features and consequently determines the candidates most suitable for the actual spike sorting. These candidates can be formed into an individually adjusted set of spike features, allowing a separation of the various shapes present in the obtained neuronal signal by a subsequent expectation maximisation clustering algorithm. Test results with simulated data files and data obtained from chick embryonic neurons cultured on microelectrode arrays showed an excellent classification result, indicating the superior performance of the described algorithm approach. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Structural Optimization of a Force Balance Using a Computational Experiment Design

    NASA Technical Reports Server (NTRS)

    Parker, P. A.; DeLoach, R.

    2002-01-01

    This paper proposes a new approach to force balance structural optimization featuring a computational experiment design. Currently, this multi-dimensional design process requires the designer to perform a simplification by executing parameter studies on a small subset of design variables. This one-factor-at-a-time approach varies a single variable while holding all others at a constant level. Consequently, subtle interactions among the design variables, which can be exploited to achieve the design objectives, are undetected. The proposed method combines Modern Design of Experiments techniques to direct the exploration of the multi-dimensional design space, and a finite element analysis code to generate the experimental data. To efficiently search for an optimum combination of design variables and minimize the computational resources, a sequential design strategy was employed. Experimental results from the optimization of a non-traditional force balance measurement section are presented. An approach to overcome the unique problems associated with the simultaneous optimization of multiple response criteria is described. A quantitative single-point design procedure that reflects the designer's subjective impression of the relative importance of various design objectives, and a graphical multi-response optimization procedure that provides further insights into available tradeoffs among competing design objectives are illustrated. The proposed method enhances the intuition and experience of the designer by providing new perspectives on the relationships between the design variables and the competing design objectives providing a systematic foundation for advancements in structural design.

  4. Data approximation using a blending type spline construction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dalmo, Rune; Bratlie, Jostein

    2014-11-18

    Generalized expo-rational B-splines (GERBS) is a blending type spline construction where local functions at each knot are blended together by C{sup k}-smooth basis functions. One way of approximating discrete regular data using GERBS is by partitioning the data set into subsets and fit a local function to each subset. Partitioning and fitting strategies can be devised such that important or interesting data points are interpolated in order to preserve certain features. We present a method for fitting discrete data using a tensor product GERBS construction. The method is based on detection of feature points using differential geometry. Derivatives, which aremore » necessary for feature point detection and used to construct local surface patches, are approximated from the discrete data using finite differences.« less

  5. Optimization of black-box models with uncertain climatic inputs—Application to sunflower ideotype design

    PubMed Central

    Picheny, Victor; Trépos, Ronan; Casadebaig, Pierre

    2017-01-01

    Accounting for the interannual climatic variations is a well-known issue for simulation-based studies of environmental systems. It often requires intensive sampling (e.g., averaging the simulation outputs over many climatic series), which hinders many sequential processes, in particular optimization algorithms. We propose here an approach based on a subset selection in a large basis of climatic series, using an ad-hoc similarity function and clustering. A non-parametric reconstruction technique is introduced to estimate accurately the distribution of the output of interest using only the subset sampling. The proposed strategy is non-intrusive and generic (i.e. transposable to most models with climatic data inputs), and can be combined to most “off-the-shelf” optimization solvers. We apply our approach to sunflower ideotype design using the crop model SUNFLO. The underlying optimization problem is formulated as a multi-objective one to account for risk-aversion. Our approach achieves good performances even for limited computational budgets, outperforming significantly standard strategies. PMID:28542198

  6. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hinders, Mark K.; Miller, Corey A.

    2014-02-18

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crossholemore » tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.« less

  7. Variational theory of the tapered impedance transformer

    NASA Astrophysics Data System (ADS)

    Erickson, Robert P.

    2018-02-01

    Superconducting amplifiers are key components of modern quantum information circuits. To minimize information loss and reduce oscillations, a tapered impedance transformer of new design is needed at the input/output for compliance with other 50 Ω components. We show that an optimal tapered transformer of length ℓ, joining the amplifier to the input line, can be constructed using a variational principle applied to the linearized Riccati equation describing the voltage reflection coefficient of the taper. For an incident signal of frequency ωo, the variational solution results in an infinite set of equivalent optimal transformers, each with the same form for the reflection coefficient, each able to eliminate input-line reflections. For the special case of optimal lossless transformers, the group velocity vg is shown to be constant, with characteristic impedance dependent on frequency ωc = πvg/ℓ. While these solutions inhibit input-line reflections only for frequency ωo, a subset of optimal lossless transformers with ωo significantly detuned from ωc does exhibit a wide bandpass. Specifically, by choosing ωo → 0 (ωo → ∞), we obtain a subset of optimal low-pass (high-pass) lossless tapers with bandwidth (0, ˜ ωc) [(˜ωc, ∞)]. From the subset of solutions, we derive both the wide-band low-pass and high-pass transformers, and we discuss the extent to which they can be realized given fabrication constraints. In particular, we demonstrate the superior reflection response of our high-pass transformer when compared to other taper designs. Our results have application to amplifiers, transceivers, and other components sensitive to impedance mismatch.

  8. Feature extraction from high resolution satellite imagery as an input to the development and rapid update of a METRANS geographic information system (GIS).

    DOT National Transportation Integrated Search

    2011-06-01

    This report describes an accuracy assessment of extracted features derived from three : subsets of Quickbird pan-sharpened high resolution satellite image for the area of the : Port of Los Angeles, CA. Visual Learning Systems Feature Analyst and D...

  9. Feature Sampling in Detection: Implications for the Measurement of Perceptual Independence

    ERIC Educational Resources Information Center

    Macho, Siegfried

    2007-01-01

    The article presents the feature sampling signal detection (FS-SDT) model, an extension of the multivariate signal detection (SDT) model. The FS-SDT model assumes that, because of attentional shifts, different subsets of features are sampled for different presentations of the same multidimensional stimulus. Contrary to the SDT model, the FS-SDT…

  10. Paving the way to a full chip gate level double patterning application

    NASA Astrophysics Data System (ADS)

    Haffner, Henning; Meiring, Jason; Baum, Zachary; Halle, Scott

    2007-10-01

    Double patterning lithography processes can offer significant yield enhancement for challenging circuit designs. Many decomposition (i.e. the process of dividing the layout design into first and second exposures) techniques are possible, but the focus of this paper is on the use of a secondary "cut" mask to trim away extraneous features left from the first exposure. This approach has the advantage that each exposure only needs to support a subset of critical features (e.g. dense lines with the first exposure, isolated spaces with the second one). The extraneous features ("printing assist features" or PrAFs) are designed to support the process window of critical features much like the role of the subresolution assist features (SRAFs) in conventional processes. However, the printing nature of PrAFs leads to many more design options, and hence a greater process and decomposition parameter exploration space, than are available for SRAFs. A decomposition scheme using PRAFs was developed for a gate level process. A critical driver of the work was to deliver improved across-chip linewidth variation (ACLV) performance versus an optimized single exposure process while providing support for a larger range of critical features. A variety of PRAF techniques were investigated by simulation, with a PrAF scheme similar to standard SRAF rules being chosen as the optimal solution [1]. This paper discusses aspects of the code development for an automated PrAF generation and placement scheme and the subsequent decomposition of a layout into two mask levels. While PrAF placement and decomposition is straightforward for layouts with pitch and orientation restrictions, it becomes rather complex for unrestricted layout styles. Because this higher complexity yields more irregularly shaped PrAFs, mask making becomes another critical driver of the optimum placement and clean-up strategies. Examples are given of how those challenges are met or can be successfully circumvented. During subsequent decomposition of the PrAF-enhanced layout into two independent mask levels, various geometric decomposition parameters have to be considered. As an example, the removal of PrAFs has to be guaranteed by a minimum required overlap of the cut mask opening past any PrAF edge. It is discussed that process assumptions such as CD tolerances and overlay as well as inter-level relationship ground rules need to be considered to successfully optimize the final decomposition scheme. Furthermore, simulation and experimental results regarding not only ACLV but also across-device linewidth variation (ADLV) are analyzed.

  11. Defining an essence of structure determining residue contacts in proteins.

    PubMed

    Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-12-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.

  12. Defining an Essence of Structure Determining Residue Contacts in Proteins

    PubMed Central

    Sathyapriya, R.; Duarte, Jose M.; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-01-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. PMID:19997489

  13. Random Partition Distribution Indexed by Pairwise Information

    PubMed Central

    Dahl, David B.; Day, Ryan; Tsai, Jerry W.

    2017-01-01

    We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318

  14. Role for early-differentiated natural killer cells in infectious mononucleosis

    PubMed Central

    Azzi, Tarik; Lünemann, Anna; Murer, Anita; Ueda, Seigo; Béziat, Vivien; Malmberg, Karl-Johan; Staubli, Georg; Gysin, Claudine; Berger, Christoph; Münz, Christian

    2014-01-01

    A growing body of evidence suggests that the human natural killer (NK)-cell compartment is phenotypically and functionally heterogeneous and is composed of several differentiation stages. Moreover, NK-cell subsets have been shown to exhibit adaptive immune features during herpes virus infection in experimental mice and to expand preferentially during viral infections in humans. However, both phenotype and role of NK cells during acute symptomatic Epstein-Barr virus (EBV) infection, termed infectious mononucleosis (IM), remain unclear. Here, we longitudinally assessed the kinetics, the differentiation, and the proliferation of subsets of NK cells in pediatric IM patients. Our results indicate that acute IM is characterized by the preferential proliferation of early-differentiated CD56dim NKG2A+ immunoglobulin-like receptor- NK cells. Moreover, this NK-cell subset exhibits features of terminal differentiation and persists at higher frequency during at least the first 6 months after acute IM. Finally, we demonstrate that this NK-cell subset preferentially degranulates and proliferates on exposure to EBV-infected B cells expressing lytic antigens. Thus, early-differentiated NK cells might play a key role in the immune control of primary infection with this persistent tumor-associated virus. PMID:25205117

  15. Role for early-differentiated natural killer cells in infectious mononucleosis.

    PubMed

    Azzi, Tarik; Lünemann, Anna; Murer, Anita; Ueda, Seigo; Béziat, Vivien; Malmberg, Karl-Johan; Staubli, Georg; Gysin, Claudine; Berger, Christoph; Münz, Christian; Chijioke, Obinna; Nadal, David

    2014-10-16

    A growing body of evidence suggests that the human natural killer (NK)-cell compartment is phenotypically and functionally heterogeneous and is composed of several differentiation stages. Moreover, NK-cell subsets have been shown to exhibit adaptive immune features during herpes virus infection in experimental mice and to expand preferentially during viral infections in humans. However, both phenotype and role of NK cells during acute symptomatic Epstein-Barr virus (EBV) infection, termed infectious mononucleosis (IM), remain unclear. Here, we longitudinally assessed the kinetics, the differentiation, and the proliferation of subsets of NK cells in pediatric IM patients. Our results indicate that acute IM is characterized by the preferential proliferation of early-differentiated CD56(dim) NKG2A(+) immunoglobulin-like receptor(-) NK cells. Moreover, this NK-cell subset exhibits features of terminal differentiation and persists at higher frequency during at least the first 6 months after acute IM. Finally, we demonstrate that this NK-cell subset preferentially degranulates and proliferates on exposure to EBV-infected B cells expressing lytic antigens. Thus, early-differentiated NK cells might play a key role in the immune control of primary infection with this persistent tumor-associated virus. © 2014 by The American Society of Hematology.

  16. Anatomical constraints on attention: Hemifield independence is a signature of multifocal spatial selection

    PubMed Central

    Alvarez, George A; Gill, Jonathan; Cavanagh, Patrick

    2012-01-01

    Previous studies have shown independent attentional selection of targets in the left and right visual hemifields during attentional tracking (Alvarez & Cavanagh, 2005) but not during a visual search (Luck, Hillyard, Mangun, & Gazzaniga, 1989). Here we tested whether multifocal spatial attention is the critical process that operates independently in the two hemifields. It is explicitly required in tracking (attend to a subset of object locations, suppress the others) but not in the standard visual search task (where all items are potential targets). We used a modified visual search task in which observers searched for a target within a subset of display items, where the subset was selected based on location (Experiments 1 and 3A) or based on a salient feature difference (Experiments 2 and 3B). The results show hemifield independence in this subset visual search task with location-based selection but not with feature-based selection; this effect cannot be explained by general difficulty (Experiment 4). Combined, these findings suggest that hemifield independence is a signature of multifocal spatial attention and highlight the need for cognitive and neural theories of attention to account for anatomical constraints on selection mechanisms. PMID:22637710

  17. Immune Reactions against Gene Gun Vaccines Are Differentially Modulated by Distinct Dendritic Cell Subsets in the Skin

    PubMed Central

    Deressa, Tekalign; Strandt, Helen; Florindo Pinheiro, Douglas; Mittermair, Roberta; Pizarro Pesado, Jennifer; Thalhamer, Josef; Hammerl, Peter; Stoecklinger, Angelika

    2015-01-01

    The skin accommodates multiple dendritic cell (DC) subsets with remarkable functional diversity. Immune reactions are initiated and modulated by the triggering of DC by pathogen-associated or endogenous danger signals. In contrast to these processes, the influence of intrinsic features of protein antigens on the strength and type of immune responses is much less understood. Therefore, we investigated the involvement of distinct DC subsets in immune reactions against two structurally different model antigens, E. coli beta-galactosidase (betaGal) and chicken ovalbumin (OVA) under otherwise identical conditions. After epicutaneous administration of the respective DNA vaccines with a gene gun, wild type mice induced robust immune responses against both antigens. However, ablation of langerin+ DC almost abolished IgG1 and cytotoxic T lymphocytes against betaGal but enhanced T cell and antibody responses against OVA. We identified epidermal Langerhans cells (LC) as the subset responsible for the suppression of anti-OVA reactions and found regulatory T cells critically involved in this process. In contrast, reactions against betaGal were not affected by the selective elimination of LC, indicating that this antigen required a different langerin+ DC subset. The opposing findings obtained with OVA and betaGal vaccines were not due to immune-modulating activities of either the plasmid DNA or the antigen gene products, nor did the differential cellular localization, size or dose of the two proteins account for the opposite effects. Thus, skin-borne protein antigens may be differentially handled by distinct DC subsets, and, in this way, intrinsic features of the antigen can participate in immune modulation. PMID:26030383

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eads, Damian Ryan; Rosten, Edward; Helmbold, David

    The authors present BEAMER: a new spatially exploitative approach to learning object detectors which shows excellent results when applied to the task of detecting objects in greyscale aerial imagery in the presence of ambiguous and noisy data. There are four main contributions used to produce these results. First, they introduce a grammar-guided feature extraction system, enabling the exploration of a richer feature space while constraining the features to a useful subset. This is specified with a rule-based generative grammer crafted by a human expert. Second, they learn a classifier on this data using a newly proposed variant of AdaBoost whichmore » takes into account the spatially correlated nature of the data. Third, they perform another round of training to optimize the method of converting the pixel classifications generated by boosting into a high quality set of (x,y) locations. lastly, they carefully define three common problems in object detection and define two evaluation criteria that are tightly matched to these problems. Major strengths of this approach are: (1) a way of randomly searching a broad feature space, (2) its performance when evaluated on well-matched evaluation criteria, and (3) its use of the location prediction domain to learn object detectors as well as to generate detections that perform well on several tasks: object counting, tracking, and target detection. They demonstrate the efficacy of BEAMER with a comprehensive experimental evaluation on a challenging data set.« less

  19. A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling.

    PubMed

    Leger, Stefan; Zwanenburg, Alex; Pilz, Karoline; Lohaus, Fabian; Linge, Annett; Zöphel, Klaus; Kotzerke, Jörg; Schreiber, Andreas; Tinhofer, Inge; Budach, Volker; Sak, Ali; Stuschke, Martin; Balermpas, Panagiotis; Rödel, Claus; Ganswindt, Ute; Belka, Claus; Pigorsch, Steffi; Combs, Stephanie E; Mönnich, David; Zips, Daniel; Krause, Mechthild; Baumann, Michael; Troost, Esther G C; Löck, Steffen; Richter, Christian

    2017-10-16

    Radiomics applies machine learning algorithms to quantitative imaging data to characterise the tumour phenotype and predict clinical outcome. For the development of radiomics risk models, a variety of different algorithms is available and it is not clear which one gives optimal results. Therefore, we assessed the performance of 11 machine learning algorithms combined with 12 feature selection methods by the concordance index (C-Index), to predict loco-regional tumour control (LRC) and overall survival for patients with head and neck squamous cell carcinoma. The considered algorithms are able to deal with continuous time-to-event survival data. Feature selection and model building were performed on a multicentre cohort (213 patients) and validated using an independent cohort (80 patients). We found several combinations of machine learning algorithms and feature selection methods which achieve similar results, e.g. C-Index = 0.71 and BT-COX: C-Index = 0.70 in combination with Spearman feature selection. Using the best performing models, patients were stratified into groups of low and high risk of recurrence. Significant differences in LRC were obtained between both groups on the validation cohort. Based on the presented analysis, we identified a subset of algorithms which should be considered in future radiomics studies to develop stable and clinically relevant predictive models for time-to-event endpoints.

  20. EEG correlates of P300-based brain-computer interface (BCI) performance in people with amyotrophic lateral sclerosis

    NASA Astrophysics Data System (ADS)

    Mak, Joseph N.; McFarland, Dennis J.; Vaughan, Theresa M.; McCane, Lynn M.; Tsui, Phillippa Z.; Zeitlin, Debra J.; Sellers, Eric W.; Wolpaw, Jonathan R.

    2012-04-01

    The purpose of this study was to identify electroencephalography (EEG) features that correlate with P300-based brain-computer interface (P300 BCI) performance in people with amyotrophic lateral sclerosis (ALS). Twenty people with ALS used a P300 BCI spelling application in copy-spelling mode. Three types of EEG features were found to be good predictors of P300 BCI performance: (1) the root-mean-square amplitude and (2) the negative peak amplitude of the event-related potential to target stimuli (target ERP) at Fz, Cz, P3, Pz, and P4; and (3) EEG theta frequency (4.5-8 Hz) power at Fz, Cz, P3, Pz, P4, PO7, PO8 and Oz. A statistical prediction model that used a subset of these features accounted for >60% of the variance in copy-spelling performance (p < 0.001, mean R2 = 0.6175). The correlations reflected between-subject, rather than within-subject, effects. The results enhance understanding of performance differences among P300 BCI users. The predictors found in this study might help in: (1) identifying suitable candidates for long-term P300 BCI operation; (2) assessing performance online. Further work on within-subject effects needs to be done to establish whether P300 BCI user performance could be improved by optimizing one or more of these EEG features.

  1. Long range personalized cancer treatment strategies incorporating evolutionary dynamics.

    PubMed

    Yeang, Chen-Hsiang; Beckman, Robert A

    2016-10-22

    Current cancer precision medicine strategies match therapies to static consensus molecular properties of an individual's cancer, thus determining the next therapeutic maneuver. These strategies typically maintain a constant treatment while the cancer is not worsening. However, cancers feature complicated sub-clonal structure and dynamic evolution. We have recently shown, in a comprehensive simulation of two non-cross resistant therapies across a broad parameter space representing realistic tumors, that substantial improvement in cure rates and median survival can be obtained utilizing dynamic precision medicine strategies. These dynamic strategies explicitly consider intratumoral heterogeneity and evolutionary dynamics, including predicted future drug resistance states, and reevaluate optimal therapy every 45 days. However, the optimization is performed in single 45 day steps ("single-step optimization"). Herein we evaluate analogous strategies that think multiple therapeutic maneuvers ahead, considering potential outcomes at 5 steps ahead ("multi-step optimization") or 40 steps ahead ("adaptive long term optimization (ALTO)") when recommending the optimal therapy in each 45 day block, in simulations involving both 2 and 3 non-cross resistant therapies. We also evaluate an ALTO approach for situations where simultaneous combination therapy is not feasible ("Adaptive long term optimization: serial monotherapy only (ALTO-SMO)"). Simulations utilize populations of 764,000 and 1,700,000 virtual patients for 2 and 3 drug cases, respectively. Each virtual patient represents a unique clinical presentation including sizes of major and minor tumor subclones, growth rates, evolution rates, and drug sensitivities. While multi-step optimization and ALTO provide no significant average survival benefit, cure rates are significantly increased by ALTO. Furthermore, in the subset of individual virtual patients demonstrating clinically significant difference in outcome between approaches, by far the majority show an advantage of multi-step or ALTO over single-step optimization. ALTO-SMO delivers cure rates superior or equal to those of single- or multi-step optimization, in 2 and 3 drug cases respectively. In selected virtual patients incurable by dynamic precision medicine using single-step optimization, analogous strategies that "think ahead" can deliver long-term survival and cure without any disadvantage for non-responders. When therapies require dose reduction in combination (due to toxicity), optimal strategies feature complex patterns involving rapidly interleaved pulses of combinations and high dose monotherapy. This article was reviewed by Wendy Cornell, Marek Kimmel, and Andrzej Swierniak. Wendy Cornell and Andrzej Swierniak are external reviewers (not members of the Biology Direct editorial board). Andrzej Swierniak was nominated by Marek Kimmel.

  2. The mechanisms and boundary conditions of the Einstellung effect in chess: evidence from eye movements.

    PubMed

    Sheridan, Heather; Reingold, Eyal M

    2013-01-01

    In a wide range of problem-solving settings, the presence of a familiar solution can block the discovery of better solutions (i.e., the Einstellung effect). To investigate this effect, we monitored the eye movements of expert and novice chess players while they solved chess problems that contained a familiar move (i.e., the Einstellung move), as well as an optimal move that was located in a different region of the board. When the Einstellung move was an advantageous (but suboptimal) move, both the expert and novice chess players who chose the Einstellung move continued to look at this move throughout the trial, whereas the subset of expert players who chose the optimal move were able to gradually disengage their attention from the Einstellung move. However, when the Einstellung move was a blunder, all of the experts and the majority of the novices were able to avoid selecting the Einstellung move, and both the experts and novices gradually disengaged their attention from the Einstellung move. These findings shed light on the boundary conditions of the Einstellung effect, and provide convergent evidence for Bilalić, McLeod, & Gobet (2008)'s conclusion that the Einstellung effect operates by biasing attention towards problem features that are associated with the familiar solution rather than the optimal solution.

  3. Avian photoreceptor patterns represent a disordered hyperuniform solution to a multiscale packing problem

    NASA Astrophysics Data System (ADS)

    Jiao, Yang; Lau, Timothy; Hatzikirou, Haralampos; Meyer-Hermann, Michael; Corbo, Joseph C.; Torquato, Salvatore

    2014-02-01

    Optimal spatial sampling of light rigorously requires that identical photoreceptors be arranged in perfectly regular arrays in two dimensions. Examples of such perfect arrays in nature include the compound eyes of insects and the nearly crystalline photoreceptor patterns of some fish and reptiles. Birds are highly visual animals with five different cone photoreceptor subtypes, yet their photoreceptor patterns are not perfectly regular. By analyzing the chicken cone photoreceptor system consisting of five different cell types using a variety of sensitive microstructural descriptors, we find that the disordered photoreceptor patterns are "hyperuniform" (exhibiting vanishing infinite-wavelength density fluctuations), a property that had heretofore been identified in a unique subset of physical systems, but had never been observed in any living organism. Remarkably, the patterns of both the total population and the individual cell types are simultaneously hyperuniform. We term such patterns "multihyperuniform" because multiple distinct subsets of the overall point pattern are themselves hyperuniform. We have devised a unique multiscale cell packing model in two dimensions that suggests that photoreceptor types interact with both short- and long-ranged repulsive forces and that the resultant competition between the types gives rise to the aforementioned singular spatial features characterizing the system, including multihyperuniformity. These findings suggest that a disordered hyperuniform pattern may represent the most uniform sampling arrangement attainable in the avian system, given intrinsic packing constraints within the photoreceptor epithelium. In addition, they show how fundamental physical constraints can change the course of a biological optimization process. Our results suggest that multihyperuniform disordered structures have implications for the design of materials with novel physical properties and therefore may represent a fruitful area for future research.

  4. Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.

    PubMed

    Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai

    2017-12-26

    Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.

  5. Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets

    PubMed Central

    Gratzl, Samuel; Gehlenborg, Nils; Lex, Alexander; Pfister, Hanspeter; Streit, Marc

    2016-01-01

    Answering questions about complex issues often requires analysts to take into account information contained in multiple interconnected datasets. A common strategy in analyzing and visualizing large and heterogeneous data is dividing it into meaningful subsets. Interesting subsets can then be selected and the associated data and the relationships between the subsets visualized. However, neither the extraction and manipulation nor the comparison of subsets is well supported by state-of-the-art techniques. In this paper we present Domino, a novel multiform visualization technique for effectively representing subsets and the relationships between them. By providing comprehensive tools to arrange, combine, and extract subsets, Domino allows users to create both common visualization techniques and advanced visualizations tailored to specific use cases. In addition to the novel technique, we present an implementation that enables analysts to manage the wide range of options that our approach offers. Innovative interactive features such as placeholders and live previews support rapid creation of complex analysis setups. We introduce the technique and the implementation using a simple example and demonstrate scalability and effectiveness in a use case from the field of cancer genomics. PMID:26356916

  6. Vibration and acoustic frequency spectra for industrial process modeling using selective fusion multi-condition samples and multi-source features

    NASA Astrophysics Data System (ADS)

    Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen

    2018-01-01

    Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.

  7. Structured sparse linear graph embedding.

    PubMed

    Wang, Haixian

    2012-03-01

    Subspace learning is a core issue in pattern recognition and machine learning. Linear graph embedding (LGE) is a general framework for subspace learning. In this paper, we propose a structured sparse extension to LGE (SSLGE) by introducing a structured sparsity-inducing norm into LGE. Specifically, SSLGE casts the projection bases learning into a regression-type optimization problem, and then the structured sparsity regularization is applied to the regression coefficients. The regularization selects a subset of features and meanwhile encodes high-order information reflecting a priori structure information of the data. The SSLGE technique provides a unified framework for discovering structured sparse subspace. Computationally, by using a variational equality and the Procrustes transformation, SSLGE is efficiently solved with closed-form updates. Experimental results on face image show the effectiveness of the proposed method. Copyright © 2011 Elsevier Ltd. All rights reserved.

  8. An Investigation of Unified Memory Access Performance in CUDA

    PubMed Central

    Landaverde, Raphael; Zhang, Tiansheng; Coskun, Ayse K.; Herbordt, Martin

    2015-01-01

    Managing memory between the CPU and GPU is a major challenge in GPU computing. A programming model, Unified Memory Access (UMA), has been recently introduced by Nvidia to simplify the complexities of memory management while claiming good overall performance. In this paper, we investigate this programming model and evaluate its performance and programming model simplifications based on our experimental results. We find that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of data it requires on demand. This feature allows UMA to outperform full data transfer methods for certain parallel applications and small data sizes. We also find, however, that for the majority of applications and memory access patterns, the performance overheads associated with UMA are significant, while the simplifications to the programming model restrict flexibility for adding future optimizations. PMID:26594668

  9. Prediction of lysine ubiquitylation with ensemble classifier and feature selection.

    PubMed

    Zhao, Xiaowei; Li, Xiangtao; Ma, Zhiqiang; Yin, Minghao

    2011-01-01

    Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.

  10. Machine learning-based quantitative texture analysis of CT images of small renal masses: Differentiation of angiomyolipoma without visible fat from renal cell carcinoma.

    PubMed

    Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei

    2018-04-01

    To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P < 0.05) and had good interobserver agreement. An optimal feature subset including 11 features was further selected by the SVM-RFE method. The SVM-RFE+SMOTE classifier achieved the best performance in discriminating between small AMLwvf and RCC, with the highest accuracy, sensitivity, specificity and AUC of 93.9 %, 87.8 %, 100 % and 0.955, respectively. Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.

  11. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

    PubMed Central

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud

    2018-01-01

    Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919

  12. Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines.

    PubMed

    Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella

    2017-04-01

    Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA+SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Column Subset Selection, Matrix Factorization, and Eigenvalue Optimization

    DTIC Science & Technology

    2008-07-01

    Pietsch and Grothendieck, which are regarded as basic instruments in modern functional analysis [Pis86]. • The methods for computing these... Pietsch factorization and the maxcut semi- definite program [GW95]. 1.2. Overview. We focus on the algorithmic version of the Kashin–Tzafriri theorem...will see that the desired subset is exposed by factoring the random submatrix. This factorization, which was invented by Pietsch , is regarded as a basic

  14. Ratiometric Decoding of Pheromones for a Biomimetic Infochemical Communication System.

    PubMed

    Wei, Guangfen; Thomas, Sanju; Cole, Marina; Rácz, Zoltán; Gardner, Julian W

    2017-10-30

    Biosynthetic infochemical communication is an emerging scientific field employing molecular compounds for information transmission, labelling, and biochemical interfacing; having potential application in diverse areas ranging from pest management to group coordination of swarming robots. Our communication system comprises a chemoemitter module that encodes information by producing volatile pheromone components and a chemoreceiver module that decodes the transmitted ratiometric information via polymer-coated piezoelectric Surface Acoustic Wave Resonator (SAWR) sensors. The inspiration for such a system is based on the pheromone-based communication between insects. Ten features are extracted from the SAWR sensor response and analysed using multi-variate classification techniques, i.e., Linear Discriminant Analysis (LDA), Probabilistic Neural Network (PNN), and Multilayer Perception Neural Network (MLPNN) methods, and an optimal feature subset is identified. A combination of steady state and transient features of the sensor signals showed superior performances with LDA and MLPNN. Although MLPNN gave excellent results reaching 100% recognition rate at 400 s, over all time stations PNN gave the best performance based on an expanded data-set with adjacent neighbours. In this case, 100% of the pheromone mixtures were successfully identified just 200 s after they were first injected into the wind tunnel. We believe that this approach can be used for future chemical communication employing simple mixtures of airborne molecules.

  15. Ratiometric Decoding of Pheromones for a Biomimetic Infochemical Communication System

    PubMed Central

    Wei, Guangfen; Thomas, Sanju; Cole, Marina; Rácz, Zoltán

    2017-01-01

    Biosynthetic infochemical communication is an emerging scientific field employing molecular compounds for information transmission, labelling, and biochemical interfacing; having potential application in diverse areas ranging from pest management to group coordination of swarming robots. Our communication system comprises a chemoemitter module that encodes information by producing volatile pheromone components and a chemoreceiver module that decodes the transmitted ratiometric information via polymer-coated piezoelectric Surface Acoustic Wave Resonator (SAWR) sensors. The inspiration for such a system is based on the pheromone-based communication between insects. Ten features are extracted from the SAWR sensor response and analysed using multi-variate classification techniques, i.e., Linear Discriminant Analysis (LDA), Probabilistic Neural Network (PNN), and Multilayer Perception Neural Network (MLPNN) methods, and an optimal feature subset is identified. A combination of steady state and transient features of the sensor signals showed superior performances with LDA and MLPNN. Although MLPNN gave excellent results reaching 100% recognition rate at 400 s, over all time stations PNN gave the best performance based on an expanded data-set with adjacent neighbours. In this case, 100% of the pheromone mixtures were successfully identified just 200 s after they were first injected into the wind tunnel. We believe that this approach can be used for future chemical communication employing simple mixtures of airborne molecules. PMID:29084158

  16. Integration trumps selection in object recognition.

    PubMed

    Saarela, Toni P; Landy, Michael S

    2015-03-30

    Finding and recognizing objects is a fundamental task of vision. Objects can be defined by several "cues" (color, luminance, texture, etc.), and humans can integrate sensory cues to improve detection and recognition [1-3]. Cortical mechanisms fuse information from multiple cues [4], and shape-selective neural mechanisms can display cue invariance by responding to a given shape independent of the visual cue defining it [5-8]. Selective attention, in contrast, improves recognition by isolating a subset of the visual information [9]. Humans can select single features (red or vertical) within a perceptual dimension (color or orientation), giving faster and more accurate responses to items having the attended feature [10, 11]. Attention elevates neural responses and sharpens neural tuning to the attended feature, as shown by studies in psychophysics and modeling [11, 12], imaging [13-16], and single-cell and neural population recordings [17, 18]. Besides single features, attention can select whole objects [19-21]. Objects are among the suggested "units" of attention because attention to a single feature of an object causes the selection of all of its features [19-21]. Here, we pit integration against attentional selection in object recognition. We find, first, that humans can integrate information near optimally from several perceptual dimensions (color, texture, luminance) to improve recognition. They cannot, however, isolate a single dimension even when the other dimensions provide task-irrelevant, potentially conflicting information. For object recognition, it appears that there is mandatory integration of information from multiple dimensions of visual experience. The advantage afforded by this integration, however, comes at the expense of attentional selection. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Integration trumps selection in object recognition

    PubMed Central

    Saarela, Toni P.; Landy, Michael S.

    2015-01-01

    Summary Finding and recognizing objects is a fundamental task of vision. Objects can be defined by several “cues” (color, luminance, texture etc.), and humans can integrate sensory cues to improve detection and recognition [1–3]. Cortical mechanisms fuse information from multiple cues [4], and shape-selective neural mechanisms can display cue-invariance by responding to a given shape independent of the visual cue defining it [5–8]. Selective attention, in contrast, improves recognition by isolating a subset of the visual information [9]. Humans can select single features (red or vertical) within a perceptual dimension (color or orientation), giving faster and more accurate responses to items having the attended feature [10,11]. Attention elevates neural responses and sharpens neural tuning to the attended feature, as shown by studies in psychophysics and modeling [11,12], imaging [13–16], and single-cell and neural population recordings [17,18]. Besides single features, attention can select whole objects [19–21]. Objects are among the suggested “units” of attention because attention to a single feature of an object causes the selection of all of its features [19–21]. Here, we pit integration against attentional selection in object recognition. We find, first, that humans can integrate information near-optimally from several perceptual dimensions (color, texture, luminance) to improve recognition. They cannot, however, isolate a single dimension even when the other dimensions provide task-irrelevant, potentially conflicting information. For object recognition, it appears that there is mandatory integration of information from multiple dimensions of visual experience. The advantage afforded by this integration, however, comes at the expense of attentional selection. PMID:25802154

  18. Integrating prediction, provenance, and optimization into high energy workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schram, M.; Bansal, V.; Friese, R. D.

    We propose a novel approach for efficient execution of workflows on distributed resources. The key components of this framework include: performance modeling to quantitatively predict workflow component behavior; optimization-based scheduling such as choosing an optimal subset of resources to meet demand and assignment of tasks to resources; distributed I/O optimizations such as prefetching; and provenance methods for collecting performance data. In preliminary results, these techniques improve throughput on a small Belle II workflow by 20%.

  19. Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.

    PubMed

    Martínez, María Jimena; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J

    2015-01-01

    The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.

  20. Feature relevance assessment for the semantic interpretation of 3D point cloud data

    NASA Astrophysics Data System (ADS)

    Weinmann, M.; Jutzi, B.; Mallet, C.

    2013-10-01

    The automatic analysis of large 3D point clouds represents a crucial task in photogrammetry, remote sensing and computer vision. In this paper, we propose a new methodology for the semantic interpretation of such point clouds which involves feature relevance assessment in order to reduce both processing time and memory consumption. Given a standard benchmark dataset with 1.3 million 3D points, we first extract a set of 21 geometric 3D and 2D features. Subsequently, we apply a classifier-independent ranking procedure which involves a general relevance metric in order to derive compact and robust subsets of versatile features which are generally applicable for a large variety of subsequent tasks. This metric is based on 7 different feature selection strategies and thus addresses different intrinsic properties of the given data. For the example of semantically interpreting 3D point cloud data, we demonstrate the great potential of smaller subsets consisting of only the most relevant features with 4 different state-of-the-art classifiers. The results reveal that, instead of including as many features as possible in order to compensate for lack of knowledge, a crucial task such as scene interpretation can be carried out with only few versatile features and even improved accuracy.

  1. A Scalable and Robust Multi-Agent Approach to Distributed Optimization

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan

    2005-01-01

    Modularizing a large optimization problem so that the solutions to the subproblems provide a good overall solution is a challenging problem. In this paper we present a multi-agent approach to this problem based on aligning the agent objectives with the system objectives, obviating the need to impose external mechanisms to achieve collaboration among the agents. This approach naturally addresses scaling and robustness issues by ensuring that the agents do not rely on the reliable operation of other agents We test this approach in the difficult distributed optimization problem of imperfect device subset selection [Challet and Johnson, 2002]. In this problem, there are n devices, each of which has a "distortion", and the task is to find the subset of those n devices that minimizes the average distortion. Our results show that in large systems (1000 agents) the proposed approach provides improvements of over an order of magnitude over both traditional optimization methods and traditional multi-agent methods. Furthermore, the results show that even in extreme cases of agent failures (i.e., half the agents fail midway through the simulation) the system remains coordinated and still outperforms a failure-free and centralized optimization algorithm.

  2. An overview of the NASA Langley Atmospheric Data Center: Online tools to effectively disseminate Earth science data products

    NASA Astrophysics Data System (ADS)

    Parker, L.; Dye, R. A.; Perez, J.; Rinsland, P.

    2012-12-01

    Over the past decade the Atmospheric Science Data Center (ASDC) at NASA Langley Research Center has archived and distributed a variety of satellite mission and aircraft campaign data sets. These datasets posed unique challenges to the user community at large due to the sheer volume and variety of the data and the lack of intuitive features in the order tools available to the investigator. Some of these data sets also lack sufficient metadata to provide rudimentary data discovery. To meet the needs of emerging users, the ASDC addressed issues in data discovery and delivery through the use of standards in data and access methods, and distribution through appropriate portals. The ASDC is currently undergoing a refresh of its webpages and Ordering Tools that will leverage updated collection level metadata in an effort to enhance the user experience. The ASDC is now providing search and subset capability to key mission satellite data sets. The ASDC has collaborated with Science Teams to accommodate prospective science users in the climate and modeling communities. The ASDC is using a common framework that enables more rapid development and deployment of search and subset tools that provide enhanced access features for the user community. Features of the Search and Subset web application enables a more sophisticated approach to selecting and ordering data subsets by parameter, date, time, and geographic area. The ASDC has also applied key practices from satellite missions to the multi-campaign aircraft missions executed for Earth Venture-1 and MEaSUReS

  3. Differential IL-1β secretion by monocyte subsets is regulated by Hsp27 through modulating mRNA stability.

    PubMed

    Hadadi, Eva; Zhang, Biyan; Baidžajevas, Kajus; Yusof, Nurhashikin; Puan, Kia Joo; Ong, Siew Min; Yeap, Wei Hseun; Rotzschke, Olaf; Kiss-Toth, Endre; Wilson, Heather; Wong, Siew Cheng

    2016-12-15

    Monocytes play a central role in regulating inflammation in response to infection or injury, and during auto-inflammatory diseases. Human blood contains classical, intermediate and non-classical monocyte subsets that each express characteristic patterns of cell surface CD16 and CD14; each subset also has specific functional properties, but the mechanisms underlying many of their distinctive features are undefined. Of particular interest is how monocyte subsets regulate secretion of the apical pro-inflammatory cytokine IL-1β, which is central to the initiation of immune responses but is also implicated in the pathology of various auto-immune/auto-inflammatory conditions. Here we show that primary human non-classical monocytes, exposed to LPS or LPS + BzATP (3'-O-(4-benzoyl)benzyl-ATP, a P2X7R agonist), produce approx. 80% less IL-1β than intermediate or classical monocytes. Despite their low CD14 expression, LPS-sensing, caspase-1 activation and P2X7R activity were comparable in non-classical monocytes to other subsets: their diminished ability to produce IL-1β instead arose from 50% increased IL-1β mRNA decay rates, mediated by Hsp27. These findings identify the Hsp27 pathway as a novel therapeutic target for the management of conditions featuring dysregulated IL-1β production, and represent an advancement in understanding of both physiological inflammatory responses and the pathogenesis of inflammatory diseases involving monocyte-derived IL-1β.

  4. Subset selective search on the basis of color and preview.

    PubMed

    Donk, Mieke

    2017-01-01

    In the preview paradigm observers are presented with one set of elements (the irrelevant set) followed by the addition of a second set among which the target is presented (the relevant set). Search efficiency in such a preview condition has been demonstrated to be higher than that in a full-baseline condition in which both sets are simultaneously presented, suggesting that a preview of the irrelevant set reduces its influence on the search process. However, numbers of irrelevant and relevant elements are typically not independently manipulated. Moreover, subset selective search also occurs when both sets are presented simultaneously but differ in color. The aim of the present study was to investigate how numbers of irrelevant and relevant elements contribute to preview search in the absence and presence of a color difference between subsets. In two experiments it was demonstrated that a preview reduced the influence of the number of irrelevant elements in the absence but not in the presence of a color difference between subsets. In the presence of a color difference, a preview lowered the effect of the number of relevant elements but only when the target was defined by a unique feature within the relevant set (Experiment 1); when the target was defined by a conjunction of features (Experiment 2), search efficiency as a function of the number of relevant elements was not modulated by a preview. Together the results are in line with the idea that subset selective search is based on different simultaneously operating mechanisms.

  5. Automatic Target Recognition: Statistical Feature Selection of Non-Gaussian Distributed Target Classes

    DTIC Science & Technology

    2011-06-01

    implementing, and evaluating many feature selection algorithms. Mucciardi and Gose compared seven different techniques for choosing subsets of pattern...122 THIS PAGE INTENTIONALLY LEFT BLANK 123 LIST OF REFERENCES [1] A. Mucciardi and E. Gose , “A comparison of seven techniques for

  6. Orthotopic AY-27 rat bladder urothelial cell carcinoma model presented an elevated methemoglobin proportion in the increased total hemoglobin content when evaluated in vivo by single-fiber reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Sun, Tengfei; Davis, Carole A.; Hurst, Robert E.; Slaton, Joel W.; Piao, Daqing

    2017-02-01

    In vivo single-fiber reflectance spectroscopy (SfRS) was performed on an orthotopic AY-27 rat bladder urothelial cell carcinoma model to explore potential spectroscopic features revealing neoplastic changes. AY-27 bladder tumor cells were intravesically instilled in four rats and allowed to implant and grow for one week, with two additional rats as the control. A total of 107 SfRS measurements were taken from 27 sites on two control bladders and 80 from four AY-27 treated bladders. The spectral profiles obtained from AY-27 treated bladders revealed various levels of a methemoglobin (MetHb) characteristic spectral feature around 635nm. A multisegment spectral analysis method estimated concentrations of five chromophore compositions including oxyhemoglobin, deoxyhemoglobin, MetHb, lipid and water. The total hemoglobin concentration ([HbT]), the MetHb proportion in the total hemoglobin and the lipid volume content showed possible correlations. The 80 measurements from the AY-27 treated bladders could separate to three sub-sets according to the MetHb proportion. Specifically, 72 were in subset 1 with low proportion (5.3%<[MetHb]<7%), 6 in subset 2 with moderate proportion (7%<[MetHb]<30%), and 2 in subset 3 with significant proportion (>30%). When grouped according to [MetHB], the [HbT] increased from 368 μM of subset 1 to 488 μM of subset 2 to 541 μM of subset 3, in comparison to the 285 μM of the control. The increased total hemoglobin and the elevation of MetHb proportion may signify angiogenesis and degradation in hemoglobin oxygen-transport. Additionally, the lipid volume content decreased from 2.58% in the control to <0.2% in the tumor groups, indicating disruption of subepithelium tissue architecture.

  7. A Survey of Object Oriented Languages in Programming Environments.

    DTIC Science & Technology

    1987-06-01

    subset of natural languages might be more effective , and make the human-computer interface more friendly. 19 .. .. . . -.. -, " ,. o...and complexty of Ada. He meant that the language contained too many features that made it complicated to use effectively . Much of the complexity comes...by sending messages to a class instance. A small subset of the methods in Smalltalk-80 are not expressed in the !-’ Smalhalk-80 programming language

  8. High-dimensional cluster analysis with the Masked EM Algorithm

    PubMed Central

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  9. The Subset Sum game.

    PubMed

    Darmann, Andreas; Nicosia, Gaia; Pferschy, Ulrich; Schauer, Joachim

    2014-03-16

    In this work we address a game theoretic variant of the Subset Sum problem, in which two decision makers (agents/players) compete for the usage of a common resource represented by a knapsack capacity. Each agent owns a set of integer weighted items and wants to maximize the total weight of its own items included in the knapsack. The solution is built as follows: Each agent, in turn, selects one of its items (not previously selected) and includes it in the knapsack if there is enough capacity. The process ends when the remaining capacity is too small for including any item left. We look at the problem from a single agent point of view and show that finding an optimal sequence of items to select is an [Formula: see text]-hard problem. Therefore we propose two natural heuristic strategies and analyze their worst-case performance when (1) the opponent is able to play optimally and (2) the opponent adopts a greedy strategy. From a centralized perspective we observe that some known results on the approximation of the classical Subset Sum can be effectively adapted to the multi-agent version of the problem.

  10. The Subset Sum game☆

    PubMed Central

    Darmann, Andreas; Nicosia, Gaia; Pferschy, Ulrich; Schauer, Joachim

    2014-01-01

    In this work we address a game theoretic variant of the Subset Sum problem, in which two decision makers (agents/players) compete for the usage of a common resource represented by a knapsack capacity. Each agent owns a set of integer weighted items and wants to maximize the total weight of its own items included in the knapsack. The solution is built as follows: Each agent, in turn, selects one of its items (not previously selected) and includes it in the knapsack if there is enough capacity. The process ends when the remaining capacity is too small for including any item left. We look at the problem from a single agent point of view and show that finding an optimal sequence of items to select is an NP-hard problem. Therefore we propose two natural heuristic strategies and analyze their worst-case performance when (1) the opponent is able to play optimally and (2) the opponent adopts a greedy strategy. From a centralized perspective we observe that some known results on the approximation of the classical Subset Sum can be effectively adapted to the multi-agent version of the problem. PMID:25844012

  11. Performance Optimizing Multi-Objective Adaptive Control with Time-Varying Model Reference Modification

    NASA Technical Reports Server (NTRS)

    Nguyen, Nhan T.; Hashemi, Kelley E.; Yucelen, Tansel; Arabi, Ehsan

    2017-01-01

    This paper presents a new adaptive control approach that involves a performance optimization objective. The problem is cast as a multi-objective optimal control. The control synthesis involves the design of a performance optimizing controller from a subset of control inputs. The effect of the performance optimizing controller is to introduce an uncertainty into the system that can degrade tracking of the reference model. An adaptive controller from the remaining control inputs is designed to reduce the effect of the uncertainty while maintaining a notion of performance optimization in the adaptive control system.

  12. A genetic programming approach to oral cancer prognosis

    PubMed Central

    Tan, Mei Sze; Tan, Jing Wei; Yap, Hwa Jen; Abdul Kareem, Sameem; Zain, Rosnah Binti

    2016-01-01

    Background The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis. Method GP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP. Result The result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis. Discussion Some of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis. PMID:27688975

  13. Uncertainty Analysis via Failure Domain Characterization: Unrestricted Requirement Functions

    NASA Technical Reports Server (NTRS)

    Crespo, Luis G.; Kenny, Sean P.; Giesy, Daniel P.

    2011-01-01

    This paper proposes an uncertainty analysis framework based on the characterization of the uncertain parameter space. This characterization enables the identification of worst-case uncertainty combinations and the approximation of the failure and safe domains with a high level of accuracy. Because these approximations are comprised of subsets of readily computable probability, they enable the calculation of arbitrarily tight upper and lower bounds to the failure probability. The methods developed herein, which are based on nonlinear constrained optimization, are applicable to requirement functions whose functional dependency on the uncertainty is arbitrary and whose explicit form may even be unknown. Some of the most prominent features of the methodology are the substantial desensitization of the calculations from the assumed uncertainty model (i.e., the probability distribution describing the uncertainty) as well as the accommodation for changes in such a model with a practically insignificant amount of computational effort.

  14. Predicting the disease of Alzheimer with SNP biomarkers and clinical data using data mining classification approach: decision tree.

    PubMed

    Erdoğan, Onur; Aydin Son, Yeşim

    2014-01-01

    Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.

  15. Treatment of ALK-rearranged non-small cell lung cancer: A review of the landscape and approach to emerging patterns of treatment resistance in the Australian context.

    PubMed

    Itchins, M; Chia, P L; Hayes, S A; Howell, V M; Gill, A J; Cooper, W A; John, T; Mitchell, P; Millward, M; Clarke, S J; Solomon, B; Pavlakis, N

    2017-08-01

    Since the identification of anaplastic lymphoma kinase (ALK) gene rearrangements in non-small cell lung cancer (NSCLC) in 2005, the treatment of ALK-rearranged NSCLC (ALK+ NSCLC) has evolved at a rapid pace. This molecularly distinct subset of NSCLC has uniquely important biology, clinicopathologic features and mechanisms of drug resistance which impact on the choice of treatment for a patient with this disease. There are multiple ALK tyrosine kinase inhibitors now available in clinical practice with efficacy data continuing to emerge and guide the optimal treatment algorithm. A detailed search of medical databases and clinical trial registries was conducted to capture all relevant articles on this topic enabling an updated detailed overview of the landscape of management of ALK-rearranged NSCLC. © 2017 John Wiley & Sons Australia, Ltd.

  16. Optimizing data collection for public health decisions: a data mining approach

    PubMed Central

    2014-01-01

    Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484

  17. Optimizing data collection for public health decisions: a data mining approach.

    PubMed

    Partington, Susan N; Papakroni, Vasil; Menzies, Tim

    2014-06-12

    Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.

  18. Fast, Safe, Propellant-Efficient Spacecraft Motion Planning Under Clohessy-Wiltshire-Hill Dynamics

    NASA Technical Reports Server (NTRS)

    Starek, Joseph A.; Schmerling, Edward; Maher, Gabriel D.; Barbee, Brent W.; Pavone, Marco

    2016-01-01

    This paper presents a sampling-based motion planning algorithm for real-time and propellant-optimized autonomous spacecraft trajectory generation in near-circular orbits. Specifically, this paper leverages recent algorithmic advances in the field of robot motion planning to the problem of impulsively actuated, propellant- optimized rendezvous and proximity operations under the Clohessy-Wiltshire-Hill dynamics model. The approach calls upon a modified version of the FMT* algorithm to grow a set of feasible trajectories over a deterministic, low-dispersion set of sample points covering the free state space. To enforce safety, the tree is only grown over the subset of actively safe samples, from which there exists a feasible one-burn collision-avoidance maneuver that can safely circularize the spacecraft orbit along its coasting arc under a given set of potential thruster failures. Key features of the proposed algorithm include 1) theoretical guarantees in terms of trajectory safety and performance, 2) amenability to real-time implementation, and 3) generality, in the sense that a large class of constraints can be handled directly. As a result, the proposed algorithm offers the potential for widespread application, ranging from on-orbit satellite servicing to orbital debris removal and autonomous inspection missions.

  19. Rectal bleeding, fecal incontinence, and high stool frequency after conformal radiotherapy for prostate cancer: Normal tissue complication probability modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peeters, Stephanie; Hoogeman, Mischa S.; Heemsbergen, Wilma D.

    2006-09-01

    Purpose: To analyze whether inclusion of predisposing clinical features in the Lyman-Kutcher-Burman (LKB) normal tissue complication probability (NTCP) model improves the estimation of late gastrointestinal toxicity. Methods and Materials: This study includes 468 prostate cancer patients participating in a randomized trial comparing 68 with 78 Gy. We fitted the probability of developing late toxicity within 3 years (rectal bleeding, high stool frequency, and fecal incontinence) with the original, and a modified LKB model, in which a clinical feature (e.g., history of abdominal surgery) was taken into account by fitting subset specific TD50s. The ratio of these TD50s is the dose-modifyingmore » factor for that clinical feature. Dose distributions of anorectal (bleeding and frequency) and anal wall (fecal incontinence) were used. Results: The modified LKB model gave significantly better fits than the original LKB model. Patients with a history of abdominal surgery had a lower tolerance to radiation than did patients without previous surgery, with a dose-modifying factor of 1.1 for bleeding and of 2.5 for fecal incontinence. The dose-response curve for bleeding was approximately two times steeper than that for frequency and three times steeper than that for fecal incontinence. Conclusions: Inclusion of predisposing clinical features significantly improved the estimation of the NTCP. For patients with a history of abdominal surgery, more severe dose constraints should therefore be used during treatment plan optimization.« less

  20. Learning what matters: A neural explanation for the sparsity bias.

    PubMed

    Hassall, Cameron D; Connor, Patrick C; Trappenberg, Thomas P; McDonald, John J; Krigolson, Olave E

    2018-05-01

    The visual environment is filled with complex, multi-dimensional objects that vary in their value to an observer's current goals. When faced with multi-dimensional stimuli, humans may rely on biases to learn to select those objects that are most valuable to the task at hand. Here, we show that decision making in a complex task is guided by the sparsity bias: the focusing of attention on a subset of available features. Participants completed a gambling task in which they selected complex stimuli that varied randomly along three dimensions: shape, color, and texture. Each dimension comprised three features (e.g., color: red, green, yellow). Only one dimension was relevant in each block (e.g., color), and a randomly-chosen value ranking determined outcome probabilities (e.g., green > yellow > red). Participants were faster to respond to infrequent probe stimuli that appeared unexpectedly within stimuli that possessed a more valuable feature than to probes appearing within stimuli possessing a less valuable feature. Event-related brain potentials recorded during the task provided a neurophysiological explanation for sparsity as a learning-dependent increase in optimal attentional performance (as measured by the N2pc component of the human event-related potential) and a concomitant learning-dependent decrease in prediction errors (as measured by the feedback-elicited reward positivity). Together, our results suggest that the sparsity bias guides human reinforcement learning in complex environments. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Optimal tree increment models for the Northeastern United Statesq

    Treesearch

    Don C. Bragg

    2003-01-01

    used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...

  2. Optimal Tree Increment Models for the Northeastern United States

    Treesearch

    Don C. Bragg

    2005-01-01

    I used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...

  3. Narrative Performance of Optimal Outcome Children and Adolescents with a History of an Autism Spectrum Disorder (ASD)

    ERIC Educational Resources Information Center

    Suh, Joyce; Eigsti, Inge-Marie; Naigles, Letitia; Barton, Marianne; Kelley, Elizabeth; Fein, Deborah

    2014-01-01

    Autism Spectrum Disorders (ASDs) have traditionally been considered a lifelong condition; however, a subset of people makes such significant improvements that they no longer meet diagnostic criteria for an ASD. The current study examines whether these "optimal outcome" (OO) children and adolescents continue to have subtle pragmatic…

  4. Image segmentation via foreground and background semantic descriptors

    NASA Astrophysics Data System (ADS)

    Yuan, Ding; Qiang, Jingjing; Yin, Jihao

    2017-09-01

    In the field of image processing, it has been a challenging task to obtain a complete foreground that is not uniform in color or texture. Unlike other methods, which segment the image by only using low-level features, we present a segmentation framework, in which high-level visual features, such as semantic information, are used. First, the initial semantic labels were obtained by using the nonparametric method. Then, a subset of the training images, with a similar foreground to the input image, was selected. Consequently, the semantic labels could be further refined according to the subset. Finally, the input image was segmented by integrating the object affinity and refined semantic labels. State-of-the-art performance was achieved in experiments with the challenging MSRC 21 dataset.

  5. An alternative approach for neural network evolution with a genetic algorithm: crossover by combinatorial optimization.

    PubMed

    García-Pedrajas, Nicolás; Ortiz-Boyer, Domingo; Hervás-Martínez, César

    2006-05-01

    In this work we present a new approach to crossover operator in the genetic evolution of neural networks. The most widely used evolutionary computation paradigm for neural network evolution is evolutionary programming. This paradigm is usually preferred due to the problems caused by the application of crossover to neural network evolution. However, crossover is the most innovative operator within the field of evolutionary computation. One of the most notorious problems with the application of crossover to neural networks is known as the permutation problem. This problem occurs due to the fact that the same network can be represented in a genetic coding by many different codifications. Our approach modifies the standard crossover operator taking into account the special features of the individuals to be mated. We present a new model for mating individuals that considers the structure of the hidden layer and redefines the crossover operator. As each hidden node represents a non-linear projection of the input variables, we approach the crossover as a problem on combinatorial optimization. We can formulate the problem as the extraction of a subset of near-optimal projections to create the hidden layer of the new network. This new approach is compared to a classical crossover in 25 real-world problems with an excellent performance. Moreover, the networks obtained are much smaller than those obtained with classical crossover operator.

  6. Classification of Alzheimer's disease and prediction of mild cognitive impairment-to-Alzheimer's conversion from structural magnetic resource imaging using feature ranking and a genetic algorithm.

    PubMed

    Beheshti, Iman; Demirel, Hasan; Matsuda, Hiroshi

    2017-04-01

    We developed a novel computer-aided diagnosis (CAD) system that uses feature-ranking and a genetic algorithm to analyze structural magnetic resonance imaging data; using this system, we can predict conversion of mild cognitive impairment (MCI)-to-Alzheimer's disease (AD) at between one and three years before clinical diagnosis. The CAD system was developed in four stages. First, we used a voxel-based morphometry technique to investigate global and local gray matter (GM) atrophy in an AD group compared with healthy controls (HCs). Regions with significant GM volume reduction were segmented as volumes of interest (VOIs). Second, these VOIs were used to extract voxel values from the respective atrophy regions in AD, HC, stable MCI (sMCI) and progressive MCI (pMCI) patient groups. The voxel values were then extracted into a feature vector. Third, at the feature-selection stage, all features were ranked according to their respective t-test scores and a genetic algorithm designed to find the optimal feature subset. The Fisher criterion was used as part of the objective function in the genetic algorithm. Finally, the classification was carried out using a support vector machine (SVM) with 10-fold cross validation. We evaluated the proposed automatic CAD system by applying it to baseline values from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (160 AD, 162 HC, 65 sMCI and 71 pMCI subjects). The experimental results indicated that the proposed system is capable of distinguishing between sMCI and pMCI patients, and would be appropriate for practical use in a clinical setting. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Modeling shape and topology of low-resolution density maps of biological macromolecules.

    PubMed Central

    De-Alarcón, Pedro A; Pascual-Montano, Alberto; Gupta, Amarnath; Carazo, Jose M

    2002-01-01

    In the present work we develop an efficient way of representing the geometry and topology of volumetric datasets of biological structures from medium to low resolution, aiming at storing and querying them in a database framework. We make use of a new vector quantization algorithm to select the points within the macromolecule that best approximate the probability density function of the original volume data. Connectivity among points is obtained with the use of the alpha shapes theory. This novel data representation has a number of interesting characteristics, such as 1) it allows us to automatically segment and quantify a number of important structural features from low-resolution maps, such as cavities and channels, opening the possibility of querying large collections of maps on the basis of these quantitative structural features; 2) it provides a compact representation in terms of size; 3) it contains a subset of three-dimensional points that optimally quantify the densities of medium resolution data; and 4) a general model of the geometry and topology of the macromolecule (as opposite to a spatially unrelated bunch of voxels) is easily obtained by the use of the alpha shapes theory. PMID:12124252

  8. Patient-Specific Early Seizure Detection from Scalp EEG

    PubMed Central

    Minasyan, Georgiy R.; Chatten, John B.; Chatten, Martha Jane; Harner, Richard N.

    2010-01-01

    Objective Develop a method for automatic detection of seizures prior to or immediately after clinical onset using features derived from scalp EEG. Methods This detection method is patient-specific. It uses recurrent neural networks and a variety of input features. For each patient we trained and optimized the detection algorithm for two cases: 1) during the period immediately preceding seizure onset, and 2) during the period immediately following seizure onset. Continuous scalp EEG recordings (duration 15 – 62 h, median 25 h) from 25 patients, including a total of 86 seizures, were used in this study. Results Pre-onset detection was successful in 14 of the 25 patients. For these 14 patients, all of the testing seizures were detected prior to seizure onset with a median pre-onset time of 51 sec and false positive rate was 0.06/h. Post-onset detection had 100% sensitivity, 0.023/hr false positive rate and median delay of 4 sec after onset. Conclusions The unique results of this study relate to pre-onset detection. Significance Our results suggest that reliable pre-onset seizure detection may be achievable for a significant subset of epilepsy patients without use of invasive electrodes. PMID:20461014

  9. IL-12-producing monocytes and HLA-E control HCMV-driven NKG2C+ NK cell expansion.

    PubMed

    Rölle, Alexander; Pollmann, Julia; Ewen, Eva-Maria; Le, Vu Thuy Khanh; Halenius, Anne; Hengel, Hartmut; Cerwenka, Adelheid

    2014-12-01

    Human cytomegalovirus (HCMV) infection is the most common cause of congenital viral infections and a major source of morbidity and mortality after organ transplantation. NK cells are pivotal effector cells in the innate defense against CMV. Recently, hallmarks of adaptive responses, such as memory-like features, have been recognized in NK cells. HCMV infection elicits the expansion of an NK cell subset carrying an activating receptor heterodimer, comprising CD94 and NKG2C (CD94/NKG2C), a response that resembles the clonal expansion of adaptive immune cells. Here, we determined that expansion of this NKG2C(+) subset and general NK cell recovery rely on signals derived from CD14(+) monocytes. In a coculture system, a subset of CD14(+) cells with inflammatory monocyte features produced IL-12 in response to HCMV-infected fibroblasts, and neutralization of IL-12 in this model substantially reduced CD25 upregulation and NKG2C(+) subset expansion. Finally, blockade of CD94/NKG2C on NK cells or silencing of the cognate ligand HLA-E in infected fibroblasts greatly impaired expansion of NKG2C(+) NK cells. Together, our results reveal that IL-12, CD14(+) cells, and the CD94/NKG2C/HLA-E axis are critical for the expansion of NKG2C(+) NK cells in response to HCMV infection. Moreover, strategies targeting the NKG2C(+) NK cell subset have the potential to be exploited in NK cell-based intervention strategies against viral infections and cancer.

  10. IL-12–producing monocytes and HLA-E control HCMV-driven NKG2C+ NK cell expansion

    PubMed Central

    Rölle, Alexander; Pollmann, Julia; Ewen, Eva-Maria; Le, Vu Thuy Khanh; Halenius, Anne; Hengel, Hartmut; Cerwenka, Adelheid

    2014-01-01

    Human cytomegalovirus (HCMV) infection is the most common cause of congenital viral infections and a major source of morbidity and mortality after organ transplantation. NK cells are pivotal effector cells in the innate defense against CMV. Recently, hallmarks of adaptive responses, such as memory-like features, have been recognized in NK cells. HCMV infection elicits the expansion of an NK cell subset carrying an activating receptor heterodimer, comprising CD94 and NKG2C (CD94/NKG2C), a response that resembles the clonal expansion of adaptive immune cells. Here, we determined that expansion of this NKG2C+ subset and general NK cell recovery rely on signals derived from CD14+ monocytes. In a coculture system, a subset of CD14+ cells with inflammatory monocyte features produced IL-12 in response to HCMV-infected fibroblasts, and neutralization of IL-12 in this model substantially reduced CD25 upregulation and NKG2C+ subset expansion. Finally, blockade of CD94/NKG2C on NK cells or silencing of the cognate ligand HLA-E in infected fibroblasts greatly impaired expansion of NKG2C+ NK cells. Together, our results reveal that IL-12, CD14+ cells, and the CD94/NKG2C/HLA-E axis are critical for the expansion of NKG2C+ NK cells in response to HCMV infection. Moreover, strategies targeting the NKG2C+ NK cell subset have the potential to be exploited in NK cell–based intervention strategies against viral infections and cancer. PMID:25384219

  11. Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data.

    PubMed

    Kasthurirathne, Suranga N; Dixon, Brian E; Gichoya, Judy; Xu, Huiping; Xia, Yuni; Mamlin, Burke; Grannis, Shaun J

    2017-05-01

    Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and "off the shelf" tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing "off the shelf" approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Multi-color flow cytometry for evaluating age-related changes in memory lymphocyte subsets in dogs.

    PubMed

    Withers, Sita S; Moore, Peter F; Chang, Hong; Choi, Jin W; McSorley, Stephen J; Kent, Michael S; Monjazeb, Arta M; Canter, Robert J; Murphy, William J; Sparger, Ellen E; Rebhun, Robert B

    2018-05-31

    While dogs are increasingly being utilized as large-animal models of disease, important features of age-related immunosenescence in the dog have yet to be evaluated due to the lack of defined naïve vs. memory T lymphocyte phenotypes. We therefore performed multi-color flow cytometry on peripheral blood mononuclear cells from young and aged beagles, and determined the differential cytokine production by proposed memory subsets. CD4+ and CD8+ T lymphocytes in aged dogs displayed increased cytokine production, and decreased proliferative capacity. Antibodies targeting CD45RA and CD62L, but less so CD28 or CD44, defined canine cells that consistently exhibited properties of naïve-, central memory-, effector memory-, and terminal effector-like CD4+ and CD8+ T lymphocyte subsets. Older dogs demonstrated decreased frequencies of naïve-like CD4+ and CD8+ T lymphocytes, and an increased frequency of terminal effector-like CD8+ T lymphocytes. Overall findings revealed that aged dogs displayed features of immunosenescence similar to those reported in other species. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. ROC analysis of lesion descriptors in breast ultrasound images

    NASA Astrophysics Data System (ADS)

    Andre, Michael P.; Galperin, Michael; Phan, Peter; Chiu, Peter

    2003-05-01

    Breast biopsy serves as the key diagnostic tool in the evaluation of breast masses for malignancy, yet the procedure affects patients physically and emotionally and may obscure results of future mammograms. Studies show that high quality ultrasound can distinguish a benign from malignant lesions with accuracy, however, it has proven difficult to teach and clinical results are highly variable. The purpose of this study is to develop a means to optimize an automated Computer Aided Imaging System (CAIS) to assess Level of Suspicion (LOS) of a breast mass. We examine the contribution of 15 object features to lesion classification by calculating the Wilcoxon area under the ROC curve, AW, for all combinations in a set of 146 masses with known findings. For each interval A, the frequency of appearance of each feature and its combinations with others was computed as a means to find an "optimum" feature vector. The original set of 15 was reduced to 6 (area, perimeter, diameter ferret Y, relief, homogeneity, average energy) with an improvement from Aw=0.82-/+0.04 for the original 15 to Aw=0.93-/+0.02 for the subset of 6, p=0.03. For comparison, two sub-specialty mammography radiologists also scored the images for LOS resulting in Az of 0.90 and 0.87. The CAIS performed significantly higher, p=0.02.

  14. Unique cortical physiology associated with ipsilateral hand movements and neuroprosthetic implications.

    PubMed

    Wisneski, Kimberly J; Anderson, Nicholas; Schalk, Gerwin; Smyth, Matt; Moran, Daniel; Leuthardt, Eric C

    2008-12-01

    Brain computer interfaces (BCIs) offer little direct benefit to patients with hemispheric stroke because current platforms rely on signals derived from the contralateral motor cortex (the same region injured by the stroke). For BCIs to assist hemiparetic patients, the implant must use unaffected cortex ipsilateral to the affected limb. This requires the identification of distinct electrophysiological features from the motor cortex associated with ipsilateral hand movements. In this study we studied 6 patients undergoing temporary placement of intracranial electrode arrays. Electrocorticographic (ECoG) signals were recorded while the subjects engaged in specific ipsilateral or contralateral hand motor tasks. Spectral changes were identified with regards to frequency, location, and timing. Ipsilateral hand movements were associated with electrophysiological changes that occur in lower frequency spectra, at distinct anatomic locations, and earlier than changes associated with contralateral hand movements. In a subset of 3 patients, features specific to ipsilateral and contralateral hand movements were used to control a cursor on a screen in real time. In ipsilateral derived control this was optimal with lower frequency spectra. There are distinctive cortical electrophysiological features associated with ipsilateral movements which can be used for device control. These findings have implications for patients with hemispheric stroke because they offer a potential methodology for which a single hemisphere can be used to enhance the function of a stroke induced hemiparesis.

  15. Automated global structure extraction for effective local building block processing in XCS.

    PubMed

    Butz, Martin V; Pelikan, Martin; Llorà, Xavier; Goldberg, David E

    2006-01-01

    Learning Classifier Systems (LCSs), such as the accuracy-based XCS, evolve distributed problem solutions represented by a population of rules. During evolution, features are specialized, propagated, and recombined to provide increasingly accurate subsolutions. Recently, it was shown that, as in conventional genetic algorithms (GAs), some problems require efficient processing of subsets of features to find problem solutions efficiently. In such problems, standard variation operators of genetic and evolutionary algorithms used in LCSs suffer from potential disruption of groups of interacting features, resulting in poor performance. This paper introduces efficient crossover operators to XCS by incorporating techniques derived from competent GAs: the extended compact GA (ECGA) and the Bayesian optimization algorithm (BOA). Instead of simple crossover operators such as uniform crossover or one-point crossover, ECGA or BOA-derived mechanisms are used to build a probabilistic model of the global population and to generate offspring classifiers locally using the model. Several offspring generation variations are introduced and evaluated. The results show that it is possible to achieve performance similar to runs with an informed crossover operator that is specifically designed to yield ideal problem-dependent exploration, exploiting provided problem structure information. Thus, we create the first competent LCSs, XCS/ECGA and XCS/BOA, that detect dependency structures online and propagate corresponding lower-level dependency structures effectively without any information about these structures given in advance.

  16. On sample size and different interpretations of snow stability datasets

    NASA Astrophysics Data System (ADS)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar aspect distributions to the large dataset. We used 100 different subsets for each sample size. Statistical variations obtained in the complete dataset were also tested on the smaller subsets using the Mann-Whitney or the Kruskal-Wallis test. For each subset size, the number of subsets were counted in which the significance level was reached. For these tests no nominal data scale was assumed. (iii) For the same subsets described above, the distribution of the aspect median was determined. A count of how often this distribution was substantially different from the distribution obtained with the complete dataset was made. Since two valid stability interpretations were available (an objective and a subjective interpretation as described above), the effect of the arbitrary choice of the interpretation on spatial variability results was tested. In over one third of the cases the two interpretations came to different results. The effect of these differences were studied in a similar method as described in (iii): the distribution of the aspect median was determined for subsets of the complete dataset using both interpretations, compared against each other as well as to the results of the complete dataset. For the complete dataset the two interpretations showed mainly identical results. Therefore the subset size was determined from the point at which the results of the two interpretations converged. A universal result for the optimal subset size cannot be presented since results differed between different situations contained in the dataset. The optimal subset size is thus dependent on stability variation in a given situation, which is unknown initially. There are indications that for some situations even the complete dataset might be not large enough. At a subset size of approximately 25, the significant differences between aspect groups (as determined using the whole dataset) were only obtained in one out of five situations. In some situations, up to 20% of the subsets showed a substantially different distribution of the aspect median. Thus, in most cases, 25 measurements (which can be achieved by six two-person teams in one day) did not allow to draw reliable conclusions.

  17. Design and Application of Drought Indexes in Highly Regulated Mediterranean Water Systems

    NASA Astrophysics Data System (ADS)

    Castelletti, A.; Zaniolo, M.; Giuliani, M.

    2017-12-01

    Costs of drought are progressively increasing due to the undergoing alteration of hydro-meteorological regimes induced by climate change. Although drought management is largely studied in the literature, most of the traditional drought indexes fail in detecting critical events in highly regulated systems, which generally rely on ad-hoc formulations and cannot be generalized to different context. In this study, we contribute a novel framework for the design of a basin-customized drought index. This index represents a surrogate of the state of the basin and is computed by combining the available information about the water available in the system to reproduce a representative target variable for the drought condition of the basin (e.g., water deficit). To select the relevant variables and combinatione thereof, we use an advanced feature extraction algorithm called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS). W-QEISS relies on a multi-objective evolutionary algorithm to find Pareto-efficient subsets of variables by maximizing the wrapper accuracy, minimizing the number of selected variables, and optimizing relevance and redundancy of the subset. The accuracy objective is evaluated trough the calibration of an extreme learning machine of the water deficit for each candidate subset of variables, with the index selected from the resulting solutions identifying a suitable compromise between accuracy, cardinality, relevance, and redundancy. The approach is tested on Lake Como, Italy, a regulated lake mainly operated for irrigation supply. In the absence of an institutional drought monitoring system, we constructed the combined index using all the hydrological variables from the existing monitoring system as well as common drought indicators at multiple time aggregations. The soil moisture deficit in the root zone computed by a distributed-parameter water balance model of the agricultural districts is used as target variable. Numerical results show that our combined drought index succesfully reproduces the deficit. The index represents a valuable information for supporting appropriate drought management strategies, including the possibility of directly informing the lake operations about the drought conditions and improve the overall reliability of the irrigation supply system.

  18. Physical function metric over measure: An illustration with the Patient-Reported Outcomes Measurement Information System (PROMIS) and the Functional Assessment of Cancer Therapy (FACT).

    PubMed

    Kaat, Aaron J; Schalet, Benjamin D; Rutsohn, Joshua; Jensen, Roxanne E; Cella, David

    2018-01-01

    Measuring patient-reported outcomes (PROs) is becoming an integral component of quality improvement initiatives, clinical care, and research studies in cancer, including comparative effectiveness research. However, the number of PROs limits comparability across studies. Herein, the authors attempted to link the Functional Assessment of Cancer Therapy-General Physical Well-Being (FACT-G PWB) subscale with the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) calibrated item bank. The also sought to augment a subset of the conceptually most similar FACT-G PWB items with PROMIS PF items to improve the linking. Baseline data from 5506 participants in the Measuring Your Health (MY-Health) study were used to identify the optimal items for linking FACT-G PWB with PROMIS PF. A mixed methods approach identified the optimal items for creating the 5-item FACT/PROMIS-PF5 scale. Both the linked and augmented relationships were cross-validated using the follow-up MY-Health data. A 5-item FACT-G PWB item subset was found to be optimal for linking with PROMIS PF. In addition, a 2-item subset, including only items that were conceptually very similar to the PROMIS item bank content, were augmented with 3 PROMIS PF items. This new FACT/PROMIS-PF5 provided superior score recovery. The PROMIS PF metric allows for the evaluation of the extent to which similar questionnaires can be linked and therefore expressed on the same metric. These results allow for the aggregation of existing data and provide an optimal measure for future studies wishing to use the FACT yet also report on the PROMIS PF metric. Cancer 2018;124:153-60. © 2017 American Cancer Society. © 2017 American Cancer Society.

  19. Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis

    NASA Astrophysics Data System (ADS)

    Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan

    2017-09-01

    Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.

  20. Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief.

    PubMed

    Douglas, P K; Harris, Sam; Yuille, Alan; Cohen, Mark S

    2011-05-15

    Machine learning (ML) has become a popular tool for mining functional neuroimaging data, and there are now hopes of performing such analyses efficiently in real-time. Towards this goal, we compared accuracy of six different ML algorithms applied to neuroimaging data of persons engaged in a bivariate task, asserting their belief or disbelief of a variety of propositional statements. We performed unsupervised dimension reduction and automated feature extraction using independent component (IC) analysis and extracted IC time courses. Optimization of classification hyperparameters across each classifier occurred prior to assessment. Maximum accuracy was achieved at 92% for Random Forest, followed by 91% for AdaBoost, 89% for Naïve Bayes, 87% for a J48 decision tree, 86% for K*, and 84% for support vector machine. For real-time decoding applications, finding a parsimonious subset of diagnostic ICs might be useful. We used a forward search technique to sequentially add ranked ICs to the feature subspace. For the current data set, we determined that approximately six ICs represented a meaningful basis set for classification. We then projected these six IC spatial maps forward onto a later scanning session within subject. We then applied the optimized ML algorithms to these new data instances, and found that classification accuracy results were reproducible. Additionally, we compared our classification method to our previously published general linear model results on this same data set. The highest ranked IC spatial maps show similarity to brain regions associated with contrasts for belief > disbelief, and disbelief < belief. Copyright © 2010 Elsevier Inc. All rights reserved.

  1. Optimization of OSEM parameters in myocardial perfusion imaging reconstruction as a function of body mass index: a clinical approach*

    PubMed Central

    de Barros, Pietro Paolo; Metello, Luis F.; Camozzato, Tatiane Sabriela Cagol; Vieira, Domingos Manuel da Silva

    2015-01-01

    Objective The present study is aimed at contributing to identify the most appropriate OSEM parameters to generate myocardial perfusion imaging reconstructions with the best diagnostic quality, correlating them with patients’ body mass index. Materials and Methods The present study included 28 adult patients submitted to myocardial perfusion imaging in a public hospital. The OSEM method was utilized in the images reconstruction with six different combinations of iterations and subsets numbers. The images were analyzed by nuclear cardiology specialists taking their diagnostic value into consideration and indicating the most appropriate images in terms of diagnostic quality. Results An overall scoring analysis demonstrated that the combination of four iterations and four subsets has generated the most appropriate images in terms of diagnostic quality for all the classes of body mass index; however, the role played by the combination of six iterations and four subsets is highlighted in relation to the higher body mass index classes. Conclusion The use of optimized parameters seems to play a relevant role in the generation of images with better diagnostic quality, ensuring the diagnosis and consequential appropriate and effective treatment for the patient. PMID:26543282

  2. Learning discriminative functional network features of schizophrenia

    NASA Astrophysics Data System (ADS)

    Gheiratmand, Mina; Rish, Irina; Cecchi, Guillermo; Brown, Matthew; Greiner, Russell; Bashivan, Pouya; Polosecki, Pablo; Dursun, Serdar

    2017-03-01

    Associating schizophrenia with disrupted functional connectivity is a central idea in schizophrenia research. However, identifying neuroimaging-based features that can serve as reliable "statistical biomarkers" of the disease remains a challenging open problem. We argue that generalization accuracy and stability of candidate features ("biomarkers") must be used as additional criteria on top of standard significance tests in order to discover more robust biomarkers. Generalization accuracy refers to the utility of biomarkers for making predictions about individuals, for example discriminating between patients and controls, in novel datasets. Feature stability refers to the reproducibility of the candidate features across different datasets. Here, we extracted functional connectivity network features from fMRI data at both high-resolution (voxel-level) and a spatially down-sampled lower-resolution ("supervoxel" level). At the supervoxel level, we used whole-brain network links, while at the voxel level, due to the intractably large number of features, we sampled a subset of them. We compared statistical significance, stability and discriminative utility of both feature types in a multi-site fMRI dataset, composed of schizophrenia patients and healthy controls. For both feature types, a considerable fraction of features showed significant differences between the two groups. Also, both feature types were similarly stable across multiple data subsets. However, the whole-brain supervoxel functional connectivity features showed a higher cross-validation classification accuracy of 78.7% vs. 72.4% for the voxel-level features. Cross-site variability and heterogeneity in the patient samples in the multi-site FBIRN dataset made the task more challenging compared to single-site studies. The use of the above methodology in combination with the fully data-driven approach using the whole brain information have the potential to shed light on "biomarker discovery" in schizophrenia.

  3. Upper bounds on sequential decoding performance parameters

    NASA Technical Reports Server (NTRS)

    Jelinek, F.

    1974-01-01

    This paper presents the best obtainable random coding and expurgated upper bounds on the probabilities of undetectable error, of t-order failure (advance to depth t into an incorrect subset), and of likelihood rise in the incorrect subset, applicable to sequential decoding when the metric bias G is arbitrary. Upper bounds on the Pareto exponent are also presented. The G-values optimizing each of the parameters of interest are determined, and are shown to lie in intervals that in general have nonzero widths. The G-optimal expurgated bound on undetectable error is shown to agree with that for maximum likelihood decoding of convolutional codes, and that on failure agrees with the block code expurgated bound. Included are curves evaluating the bounds for interesting choices of G and SNR for a binary-input quantized-output Gaussian additive noise channel.

  4. Chemical library subset selection algorithms: a unified derivation using spatial statistics.

    PubMed

    Hamprecht, Fred A; Thiel, Walter; van Gunsteren, Wilfred F

    2002-01-01

    If similar compounds have similar activity, rational subset selection becomes superior to random selection in screening for pharmacological lead discovery programs. Traditional approaches to this experimental design problem fall into two classes: (i) a linear or quadratic response function is assumed (ii) some space filling criterion is optimized. The assumptions underlying the first approach are clear but not always defendable; the second approach yields more intuitive designs but lacks a clear theoretical foundation. We model activity in a bioassay as realization of a stochastic process and use the best linear unbiased estimator to construct spatial sampling designs that optimize the integrated mean square prediction error, the maximum mean square prediction error, or the entropy. We argue that our approach constitutes a unifying framework encompassing most proposed techniques as limiting cases and sheds light on their underlying assumptions. In particular, vector quantization is obtained, in dimensions up to eight, in the limiting case of very smooth response surfaces for the integrated mean square error criterion. Closest packing is obtained for very rough surfaces under the integrated mean square error and entropy criteria. We suggest to use either the integrated mean square prediction error or the entropy as optimization criteria rather than approximations thereof and propose a scheme for direct iterative minimization of the integrated mean square prediction error. Finally, we discuss how the quality of chemical descriptors manifests itself and clarify the assumptions underlying the selection of diverse or representative subsets.

  5. Dense mesh sampling for video-based facial animation

    NASA Astrophysics Data System (ADS)

    Peszor, Damian; Wojciechowska, Marzena

    2016-06-01

    The paper describes an approach for selection of feature points on three-dimensional, triangle mesh obtained using various techniques from several video footages. This approach has a dual purpose. First, it allows to minimize the data stored for the purpose of facial animation, so that instead of storing position of each vertex in each frame, one could store only a small subset of vertices for each frame and calculate positions of others based on the subset. Second purpose is to select feature points that could be used for anthropometry-based retargeting of recorded mimicry to another model, with sampling density beyond that which can be achieved using marker-based performance capture techniques. Developed approach was successfully tested on artificial models, models constructed using structured light scanner, and models constructed from video footages using stereophotogrammetry.

  6. Evaluating and optimizing the NERSC workload on Knights Landing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnes, T; Cook, B; Deslippe, J

    2017-01-30

    NERSC has partnered with 20 representative application teams to evaluate performance on the Xeon-Phi Knights Landing architecture and develop an application-optimization strategy for the greater NERSC workload on the recently installed Cori system. In this article, we present early case studies and summarized results from a subset of the 20 applications highlighting the impact of important architecture differences between the Xeon-Phi and traditional Xeon processors. We summarize the status of the applications and describe the greater optimization strategy that has formed.

  7. Evaluating and Optimizing the NERSC Workload on Knights Landing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnes, Taylor; Cook, Brandon; Doerfler, Douglas

    2016-01-01

    NERSC has partnered with 20 representative application teams to evaluate performance on the Xeon-Phi Knights Landing architecture and develop an application-optimization strategy for the greater NERSC workload on the recently installed Cori system. In this article, we present early case studies and summarized results from a subset of the 20 applications highlighting the impact of important architecture differences between the Xeon-Phi and traditional Xeon processors. We summarize the status of the applications and describe the greater optimization strategy that has formed.

  8. Differential diagnosis of CT focal liver lesions using texture features, feature selection and ensemble driven classifiers.

    PubMed

    Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S

    2007-09-01

    The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.

  9. A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.

    PubMed

    Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho

    2014-10-01

    Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.

  10. Criteria to Extract High-Quality Protein Data Bank Subsets for Structure Users.

    PubMed

    Carugo, Oliviero; Djinović-Carugo, Kristina

    2016-01-01

    It is often necessary to build subsets of the Protein Data Bank to extract structural trends and average values. For this purpose it is mandatory that the subsets are non-redundant and of high quality. The first problem can be solved relatively easily at the sequence level or at the structural level. The second, on the contrary, needs special attention. It is not sufficient, in fact, to consider the crystallographic resolution and other feature must be taken into account: the absence of strings of residues from the electron density maps and from the files deposited in the Protein Data Bank; the B-factor values; the appropriate validation of the structural models; the quality of the electron density maps, which is not uniform; and the temperature of the diffraction experiments. More stringent criteria produce smaller subsets, which can be enlarged with more tolerant selection criteria. The incessant growth of the Protein Data Bank and especially of the number of high-resolution structures is allowing the use of more stringent selection criteria, with a consequent improvement of the quality of the subsets of the Protein Data Bank.

  11. An Overlay Architecture for Throughput Optimal Multipath Routing

    DTIC Science & Technology

    2017-01-14

    1 An Overlay Architecture for Throughput Optimal Multipath Routing Nathaniel M. Jones, Georgios S. Paschos, Brooke Shrader, and Eytan Modiano...decisions. In this work, we study an overlay architecture for dynamic routing such that only a subset of devices (overlay nodes) need to make dynamic routing...a legacy network. Network overlays are frequently used to deploy new communication architectures in legacy networks [13]. To accomplish this, messages

  12. 'Dressed for success' C-type lectin receptors for the delivery of glyco-vaccines to dendritic cells.

    PubMed

    Unger, Wendy W J; van Kooyk, Yvette

    2011-02-01

    Current strategies in immunotherapy for the treatment of tumors or autoimmunity focus on direct in vivo targeting of antigens to dendritic cells (DC), as these cells are the key regulators of immune responses. Multiple DC subsets can be distinguished in both humans and mice, based on phenotype and location. Moreover, recent data show that these subsets have distinct functions. All these features have implications for the design of DC-targeting vaccines. In this review we integrate recent knowledge on the different DC subsets in human and mice and how DC-expressed C-type lectin receptors (CLR) can be exploited for the induction of either antigen-specific immunity or tolerance. Copyright © 2010 Elsevier Ltd. All rights reserved.

  13. Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships.

    PubMed

    Janet, Jon Paul; Kulik, Heather J

    2017-11-22

    Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.

  14. A Novel Protocol for Model Calibration in Biological Wastewater Treatment

    PubMed Central

    Zhu, Ao; Guo, Jianhua; Ni, Bing-Jie; Wang, Shuying; Yang, Qing; Peng, Yongzhen

    2015-01-01

    Activated sludge models (ASMs) have been widely used for process design, operation and optimization in wastewater treatment plants. However, it is still a challenge to achieve an efficient calibration for reliable application by using the conventional approaches. Hereby, we propose a novel calibration protocol, i.e. Numerical Optimal Approaching Procedure (NOAP), for the systematic calibration of ASMs. The NOAP consists of three key steps in an iterative scheme flow: i) global factors sensitivity analysis for factors fixing; ii) pseudo-global parameter correlation analysis for non-identifiable factors detection; and iii) formation of a parameter subset through an estimation by using genetic algorithm. The validity and applicability are confirmed using experimental data obtained from two independent wastewater treatment systems, including a sequencing batch reactor and a continuous stirred-tank reactor. The results indicate that the NOAP can effectively determine the optimal parameter subset and successfully perform model calibration and validation for these two different systems. The proposed NOAP is expected to use for automatic calibration of ASMs and be applied potentially to other ordinary differential equations models. PMID:25682959

  15. ARResT/AssignSubsets: a novel application for robust subclassification of chronic lymphocytic leukemia based on B cell receptor IG stereotypy.

    PubMed

    Bystry, Vojtech; Agathangelidis, Andreas; Bikos, Vasilis; Sutton, Lesley Ann; Baliakas, Panagiotis; Hadzidimitriou, Anastasia; Stamatopoulos, Kostas; Darzentas, Nikos

    2015-12-01

    An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for ∼30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution. ARResT/AssignSubsets is freely available on the web at http://bat.infspire.org/arrest/assignsubsets/ nikos.darzentas@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Optimizing procedures for a human genome repository

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nierman, W.C.

    1991-03-01

    Large numbers of clones will be generated during the Human Genome Project. As each is characterized, subsets will be identified which are useful to the scientific community at large. These subsets are most readily distributed through public repositories. The American Type Culture Collection (ATCC) is experienced in repository operation, but before this project had no history in managing clones and associated information in large batches instead of individually. This project permitted the ATCC to develop several procedures for automating and thus reducing the cost of characterizing, preserving, and maintaining information about clones.

  17. Web-Enabled Distributed Health-Care Framework for Automated Malaria Parasite Classification: an E-Health Approach.

    PubMed

    Maity, Maitreya; Dhane, Dhiraj; Mungle, Tushar; Maiti, A K; Chakraborty, Chandan

    2017-10-26

    Web-enabled e-healthcare system or computer assisted disease diagnosis has a potential to improve the quality and service of conventional healthcare delivery approach. The article describes the design and development of a web-based distributed healthcare management system for medical information and quantitative evaluation of microscopic images using machine learning approach for malaria. In the proposed study, all the health-care centres are connected in a distributed computer network. Each peripheral centre manages its' own health-care service independently and communicates with the central server for remote assistance. The proposed methodology for automated evaluation of parasites includes pre-processing of blood smear microscopic images followed by erythrocytes segmentation. To differentiate between different parasites; a total of 138 quantitative features characterising colour, morphology, and texture are extracted from segmented erythrocytes. An integrated pattern classification framework is designed where four feature selection methods viz. Correlation-based Feature Selection (CFS), Chi-square, Information Gain, and RELIEF are employed with three different classifiers i.e. Naive Bayes', C4.5, and Instance-Based Learning (IB1) individually. Optimal features subset with the best classifier is selected for achieving maximum diagnostic precision. It is seen that the proposed method achieved with 99.2% sensitivity and 99.6% specificity by combining CFS and C4.5 in comparison with other methods. Moreover, the web-based tool is entirely designed using open standards like Java for a web application, ImageJ for image processing, and WEKA for data mining considering its feasibility in rural places with minimal health care facilities.

  18. Quality optimization of H.264/AVC video transmission over noisy environments using a sparse regression framework

    NASA Astrophysics Data System (ADS)

    Pandremmenou, K.; Tziortziotis, N.; Paluri, S.; Zhang, W.; Blekas, K.; Kondi, L. P.; Kumar, S.

    2015-03-01

    We propose the use of the Least Absolute Shrinkage and Selection Operator (LASSO) regression method in order to predict the Cumulative Mean Squared Error (CMSE), incurred by the loss of individual slices in video transmission. We extract a number of quality-relevant features from the H.264/AVC video sequences, which are given as input to the LASSO. This method has the benefit of not only keeping a subset of the features that have the strongest effects towards video quality, but also produces accurate CMSE predictions. Particularly, we study the LASSO regression through two different architectures; the Global LASSO (G.LASSO) and Local LASSO (L.LASSO). In G.LASSO, a single regression model is trained for all slice types together, while in L.LASSO, motivated by the fact that the values for some features are closely dependent on the considered slice type, each slice type has its own regression model, in an e ort to improve LASSO's prediction capability. Based on the predicted CMSE values, we group the video slices into four priority classes. Additionally, we consider a video transmission scenario over a noisy channel, where Unequal Error Protection (UEP) is applied to all prioritized slices. The provided results demonstrate the efficiency of LASSO in estimating CMSE with high accuracy, using only a few features. les that typically contain high-entropy data, producing a footprint that is far less conspicuous than existing methods. The system uses a local web server to provide a le system, user interface and applications through an web architecture.

  19. Machine Learning Techniques for the Detection of Shockable Rhythms in Automated External Defibrillators

    PubMed Central

    Irusta, Unai; Morgado, Eduardo; Aramendi, Elisabete; Ayala, Unai; Wik, Lars; Kramer-Johansen, Jo; Eftestøl, Trygve; Alonso-Atienza, Felipe

    2016-01-01

    Early recognition of ventricular fibrillation (VF) and electrical therapy are key for the survival of out-of-hospital cardiac arrest (OHCA) patients treated with automated external defibrillators (AED). AED algorithms for VF-detection are customarily assessed using Holter recordings from public electrocardiogram (ECG) databases, which may be different from the ECG seen during OHCA events. This study evaluates VF-detection using data from both OHCA patients and public Holter recordings. ECG-segments of 4-s and 8-s duration were analyzed. For each segment 30 features were computed and fed to state of the art machine learning (ML) algorithms. ML-algorithms with built-in feature selection capabilities were used to determine the optimal feature subsets for both databases. Patient-wise bootstrap techniques were used to evaluate algorithm performance in terms of sensitivity (Se), specificity (Sp) and balanced error rate (BER). Performance was significantly better for public data with a mean Se of 96.6%, Sp of 98.8% and BER 2.2% compared to a mean Se of 94.7%, Sp of 96.5% and BER 4.4% for OHCA data. OHCA data required two times more features than the data from public databases for an accurate detection (6 vs 3). No significant differences in performance were found for different segment lengths, the BER differences were below 0.5-points in all cases. Our results show that VF-detection is more challenging for OHCA data than for data from public databases, and that accurate VF-detection is possible with segments as short as 4-s. PMID:27441719

  20. Machine Learning Techniques for the Detection of Shockable Rhythms in Automated External Defibrillators.

    PubMed

    Figuera, Carlos; Irusta, Unai; Morgado, Eduardo; Aramendi, Elisabete; Ayala, Unai; Wik, Lars; Kramer-Johansen, Jo; Eftestøl, Trygve; Alonso-Atienza, Felipe

    2016-01-01

    Early recognition of ventricular fibrillation (VF) and electrical therapy are key for the survival of out-of-hospital cardiac arrest (OHCA) patients treated with automated external defibrillators (AED). AED algorithms for VF-detection are customarily assessed using Holter recordings from public electrocardiogram (ECG) databases, which may be different from the ECG seen during OHCA events. This study evaluates VF-detection using data from both OHCA patients and public Holter recordings. ECG-segments of 4-s and 8-s duration were analyzed. For each segment 30 features were computed and fed to state of the art machine learning (ML) algorithms. ML-algorithms with built-in feature selection capabilities were used to determine the optimal feature subsets for both databases. Patient-wise bootstrap techniques were used to evaluate algorithm performance in terms of sensitivity (Se), specificity (Sp) and balanced error rate (BER). Performance was significantly better for public data with a mean Se of 96.6%, Sp of 98.8% and BER 2.2% compared to a mean Se of 94.7%, Sp of 96.5% and BER 4.4% for OHCA data. OHCA data required two times more features than the data from public databases for an accurate detection (6 vs 3). No significant differences in performance were found for different segment lengths, the BER differences were below 0.5-points in all cases. Our results show that VF-detection is more challenging for OHCA data than for data from public databases, and that accurate VF-detection is possible with segments as short as 4-s.

  1. Multiobjective optimization of combinatorial libraries.

    PubMed

    Agrafiotis, D K

    2002-01-01

    Combinatorial chemistry and high-throughput screening have caused a fundamental shift in the way chemists contemplate experiments. Designing a combinatorial library is a controversial art that involves a heterogeneous mix of chemistry, mathematics, economics, experience, and intuition. Although there seems to be little agreement as to what constitutes an ideal library, one thing is certain: only one property or measure seldom defines the quality of the design. In most real-world applications, a good experiment requires the simultaneous optimization of several, often conflicting, design objectives, some of which may be vague and uncertain. In this paper, we discuss a class of algorithms for subset selection rooted in the principles of multiobjective optimization. Our approach is to employ an objective function that encodes all of the desired selection criteria, and then use a simulated annealing or evolutionary approach to identify the optimal (or a nearly optimal) subset from among the vast number of possibilities. Many design criteria can be accommodated, including diversity, similarity to known actives, predicted activity and/or selectivity determined by quantitative structure-activity relationship (QSAR) models or receptor binding models, enforcement of certain property distributions, reagent cost and availability, and many others. The method is robust, convergent, and extensible, offers the user full control over the relative significance of the various objectives in the final design, and permits the simultaneous selection of compounds from multiple libraries in full- or sparse-array format.

  2. Discovering semantic features in the literature: a foundation for building functional associations

    PubMed Central

    Chagoyen, Monica; Carmona-Saez, Pedro; Shatkay, Hagit; Carazo, Jose M; Pascual-Montano, Alberto

    2006-01-01

    Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. PMID:16438716

  3. Method of generating features optimal to a dataset and classifier

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

    A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.

  4. An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy.

    PubMed

    Stranieri, Andrew; Abawajy, Jemal; Kelarev, Andrei; Huda, Shamsul; Chowdhury, Morshed; Jelinek, Herbert F

    2013-07-01

    This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN). We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery. This is important as not all five Ewing tests can always be applied in each situation in practice. We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN. We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests. We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery. We found the best sequences of tests for cost-function equal to the number of tests. The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93. They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests. The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure. We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained. The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure. The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test. Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. Classification Preictal and Interictal Stages via Integrating Interchannel and Time-Domain Analysis of EEG Features.

    PubMed

    Lin, Lung-Chang; Chen, Sharon Chia-Ju; Chiang, Ching-Tai; Wu, Hui-Chuan; Yang, Rei-Cheng; Ouyang, Chen-Sen

    2017-03-01

    The life quality of patients with refractory epilepsy is extremely affected by abrupt and unpredictable seizures. A reliable method for predicting seizures is important in the management of refractory epilepsy. A critical factor in seizure prediction involves the classification of the preictal and interictal stages. This study aimed to develop an efficient, automatic, quantitative, and individualized approach for preictal/interictal stage identification. Five epileptic children, who had experienced at least 2 episodes of seizures during a 24-hour video EEG recording, were included. Artifact-free preictal and interictal EEG epochs were acquired, respectively, and characterized with 216 global feature descriptors. The best subset of 5 discriminative descriptors was identified. The best subsets showed differences among the patients. Statistical analysis revealed most of the 5 descriptors in each subset were significantly different between the preictal and interictal stages for each patient. The proposed approach yielded weighted averages of 97.50% correctness, 96.92% sensitivity, 97.78% specificity, and 95.45% precision on classifying test epochs. Although the case number was limited, this study successfully integrated a new EEG analytical method to classify preictal and interictal EEG segments and might be used further in predicting the occurrence of seizures.

  6. Detection and Classification of Pole-Like Objects from Mobile Mapping Data

    NASA Astrophysics Data System (ADS)

    Fukano, K.; Masuda, H.

    2015-08-01

    Laser scanners on a vehicle-based mobile mapping system can capture 3D point-clouds of roads and roadside objects. Since roadside objects have to be maintained periodically, their 3D models are useful for planning maintenance tasks. In our previous work, we proposed a method for detecting cylindrical poles and planar plates in a point-cloud. However, it is often required to further classify pole-like objects into utility poles, streetlights, traffic signals and signs, which are managed by different organizations. In addition, our previous method may fail to extract low pole-like objects, which are often observed in urban residential areas. In this paper, we propose new methods for extracting and classifying pole-like objects. In our method, we robustly extract a wide variety of poles by converting point-clouds into wireframe models and calculating cross-sections between wireframe models and horizontal cutting planes. For classifying pole-like objects, we subdivide a pole-like object into five subsets by extracting poles and planes, and calculate feature values of each subset. Then we apply a supervised machine learning method using feature variables of subsets. In our experiments, our method could achieve excellent results for detection and classification of pole-like objects.

  7. Neural networks for vertical microcode compaction

    NASA Astrophysics Data System (ADS)

    Chu, Pong P.

    1992-09-01

    Neural networks provide an alternative way to solve complex optimization problems. Instead of performing a program of instructions sequentially as in a traditional computer, neural network model explores many competing hypotheses simultaneously using its massively parallel net. The paper shows how to use the neural network approach to perform vertical micro-code compaction for a micro-programmed control unit. The compaction procedure includes two basic steps. The first step determines the compatibility classes and the second step selects a minimal subset to cover the control signals. Since the selection process is an NP- complete problem, to find an optimal solution is impractical. In this study, we employ a customized neural network to obtain the minimal subset. We first formalize this problem, and then define an `energy function' and map it to a two-layer fully connected neural network. The modified network has two types of neurons and can always obtain a valid solution.

  8. Recognition of emotions using multimodal physiological signals and an ensemble deep learning model.

    PubMed

    Yin, Zhong; Zhao, Mengyuan; Wang, Yongxiong; Yang, Jingdong; Zhang, Jianhua

    2017-03-01

    Using deep-learning methodologies to analyze multimodal physiological signals becomes increasingly attractive for recognizing human emotions. However, the conventional deep emotion classifiers may suffer from the drawback of the lack of the expertise for determining model structure and the oversimplification of combining multimodal feature abstractions. In this study, a multiple-fusion-layer based ensemble classifier of stacked autoencoder (MESAE) is proposed for recognizing emotions, in which the deep structure is identified based on a physiological-data-driven approach. Each SAE consists of three hidden layers to filter the unwanted noise in the physiological features and derives the stable feature representations. An additional deep model is used to achieve the SAE ensembles. The physiological features are split into several subsets according to different feature extraction approaches with each subset separately encoded by a SAE. The derived SAE abstractions are combined according to the physiological modality to create six sets of encodings, which are then fed to a three-layer, adjacent-graph-based network for feature fusion. The fused features are used to recognize binary arousal or valence states. DEAP multimodal database was employed to validate the performance of the MESAE. By comparing with the best existing emotion classifier, the mean of classification rate and F-score improves by 5.26%. The superiority of the MESAE against the state-of-the-art shallow and deep emotion classifiers has been demonstrated under different sizes of the available physiological instances. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. The proposal of architecture for chemical splitting to optimize QSAR models for aquatic toxicity.

    PubMed

    Colombo, Andrea; Benfenati, Emilio; Karelson, Mati; Maran, Uko

    2008-06-01

    One of the challenges in the field of quantitative structure-activity relationship (QSAR) analysis is the correct classification of a chemical compound to an appropriate model for the prediction of activity. Thus, in previous studies, compounds have been divided into distinct groups according to their mode of action or chemical class. In the current study, theoretical molecular descriptors were used to divide 568 organic substances into subsets with toxicity measured for the 96-h lethal median concentration for the Fathead minnow (Pimephales promelas). Simple constitutional descriptors such as the number of aliphatic and aromatic rings and a quantum chemical descriptor, maximum bond order of a carbon atom divide compounds into nine subsets. For each subset of compounds the automatic forward selection of descriptors was applied to construct QSAR models. Significant correlations were achieved for each subset of chemicals and all models were validated with the leave-one-out internal validation procedure (R(2)(cv) approximately 0.80). The results encourage to consider this alternative way for the prediction of toxicity using QSAR subset models without direct reference to the mechanism of toxic action or the traditional chemical classification.

  10. Intrinsic gene expression subsets of diffuse cutaneous systemic sclerosis are stable in serial skin biopsies

    PubMed Central

    Pendergrass, Sarah A.; Lemaire, Raphael; Francis, Ian; Mahoney, J. Matthew; Lafyatis, Robert; Whitfield, Michael L.

    2012-01-01

    Skin biopsy gene expression was analyzed by DNA microarray from 13 dSSc patients enrolled in an open label study of rituximab, 9 dSSc patients not treated with rituximab, and 9 healthy controls. These data recapitulate the patient ‘intrinsic’ gene expression subsets described previously including proliferation, inflammatory, and normal-like groups. Serial skin biopsies showed consistent and non-progressing gene expression over time, and importantly, the patients in the inflammatory subset do not move to the fibroproliferative subset, and vice versa. We were unable to detect significant differences in gene expression before and after rituximab treatment, consistent with an apparent lack of clinical response. Serial biopsies from each patient stayed within the same gene expression subset regardless of treatment regimen or the time point at which they were taken. Collectively, these data emphasize the heterogeneous nature of SSc and demonstrate that the intrinsic subsets are an inherent, reproducible and stable feature of the disease that is independent of disease duration. Moreover, these data have fundamental importance for the future development of personalized therapy for SSc; drugs targeting inflammation are likely to benefit those patients with an inflammatory signature, whereas drugs targeting fibrosis are likely to benefit those with a fibroproliferative signature. PMID:22318389

  11. Green Power Community News

    EPA Pesticide Factsheets

    This page features news about EPA's Green Power Communities. GPCs are a subset of the Green Power Partnership; municipalities or tribal governments where government, businesses, and residents collectively use enough green power to meet GPP requirements.

  12. UAV Mission Planning under Uncertainty

    DTIC Science & Technology

    2006-06-01

    algorithm , adapted from [13] . 57 4-5 Robust Optimization considers only a subset of the feasible region . 61 5-1 Overview of simulation with parameter...incorporates the robust optimization method suggested by Bertsimas and Sim [12], and is solved with a standard Branch- and-Cut algorithm . The chapter... algorithms , and the heuristic methods of Local Search methods and Simulated Annealing. With each method, we attempt to give a review of research that has

  13. G-STRATEGY: Optimal Selection of Individuals for Sequencing in Genetic Association Studies

    PubMed Central

    Wang, Miaoyan; Jakobsdottir, Johanna; Smith, Albert V.; McPeek, Mary Sara

    2017-01-01

    In a large-scale genetic association study, the number of phenotyped individuals available for sequencing may, in some cases, be greater than the study’s sequencing budget will allow. In that case, it can be important to prioritize individuals for sequencing in a way that optimizes power for association with the trait. Suppose a cohort of phenotyped individuals is available, with some subset of them possibly already sequenced, and one wants to choose an additional fixed-size subset of individuals to sequence in such a way that the power to detect association is maximized. When the phenotyped sample includes related individuals, power for association can be gained by including partial information, such as phenotype data of ungenotyped relatives, in the analysis, and this should be taken into account when assessing whom to sequence. We propose G-STRATEGY, which uses simulated annealing to choose a subset of individuals for sequencing that maximizes the expected power for association. In simulations, G-STRATEGY performs extremely well for a range of complex disease models and outperforms other strategies with, in many cases, relative power increases of 20–40% over the next best strategy, while maintaining correct type 1 error. G-STRATEGY is computationally feasible even for large datasets and complex pedigrees. We apply G-STRATEGY to data on HDL and LDL from the AGES-Reykjavik and REFINE-Reykjavik studies, in which G-STRATEGY is able to closely-approximate the power of sequencing the full sample by selecting for sequencing a only small subset of the individuals. PMID:27256766

  14. Morphological and physiological analysis of type-5 and other bipolar cells in the Mouse Retina.

    PubMed

    Hellmer, C B; Zhou, Y; Fyk-Kolodziej, B; Hu, Z; Ichinose, T

    2016-02-19

    Retinal bipolar cells are second-order neurons in the visual system, which initiate multiple image feature-based neural streams. Among more than ten types of bipolar cells, type-5 cells are thought to play a role in motion detection pathways. Multiple subsets of type-5 cells have been reported; however, detailed characteristics of each subset have not yet been elucidated. Here, we found that they exhibit distinct morphological features as well as unique voltage-gated channel expression. We have conducted electrophysiological and immunohistochemical analysis of retinal bipolar cells. We defined type-5 cells by their axon terminal ramification in the inner plexiform layer between the border of ON/OFF sublaminae and the ON choline acetyltransferase (ChAT) band. We found three subsets of type-5 cells: XBCs had the widest axon terminals that stratified at a close approximation of the ON ChAT band as well as exhibiting large voltage-gated Na(+) channel activity, type-5-1 cells had compact terminals and no Na(+) channel activity, and type-5-2 cells contained umbrella-shaped terminals as well as large voltage-gated Na(+) channel activity. Hyperpolarization-activated cyclic nucleotide-gated (HCN) currents were also evoked in all type-5 bipolar cells. We found that XBCs and type-5-2 cells exhibited larger HCN currents than type-5-1 cells. Furthermore, the former two types showed stronger HCN1 expression than the latter. Our previous observations (Ichinose et al., 2014) match the current study: low temporal tuning cells that we named 5S corresponded to 5-1 in this study, while high temporal tuning 5f cells from the previous study corresponded to 5-2 cells. Taken together, we found three subsets of type-5 bipolar cells based on their morphologies and physiological features. Copyright © 2015 IBRO. Published by Elsevier Ltd. All rights reserved.

  15. Improving EMG based classification of basic hand movements using EMD.

    PubMed

    Sapsanis, Christos; Georgoulas, George; Tzes, Anthony; Lymberopoulos, Dimitrios

    2013-01-01

    This paper presents a pattern recognition approach for the identification of basic hand movements using surface electromyographic (EMG) data. The EMG signal is decomposed using Empirical Mode Decomposition (EMD) into Intrinsic Mode Functions (IMFs) and subsequently a feature extraction stage takes place. Various combinations of feature subsets are tested using a simple linear classifier for the detection task. Our results suggest that the use of EMD can increase the discrimination ability of the conventional feature sets extracted from the raw EMG signal.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reister, D.B.; Lenhart, S.M.

    Recent theoretical results have completely solved the problem of determining the minimum length path for a vehicle with a minimum turning radius moving from an initial configuration to a final configuration. Time optimal paths for a constant speed vehicle are a subset of the minimum length paths. This paper uses the Pontryagin maximum principle to find time optimal paths for a constant speed vehicle. The time optimal paths consist of sequences of axes of circles and straight lines. The maximum principle introduces concepts (dual variables, bang-bang solutions, singular solutions, and transversality conditions) that provide important insight into the nature ofmore » the time optimal paths. We explore the properties of the optimal paths and present some experimental results for a mobile robot following an optimal path.« less

  17. Performance Optimizing Adaptive Control with Time-Varying Reference Model Modification

    NASA Technical Reports Server (NTRS)

    Nguyen, Nhan T.; Hashemi, Kelley E.

    2017-01-01

    This paper presents a new adaptive control approach that involves a performance optimization objective. The control synthesis involves the design of a performance optimizing adaptive controller from a subset of control inputs. The resulting effect of the performance optimizing adaptive controller is to modify the initial reference model into a time-varying reference model which satisfies the performance optimization requirement obtained from an optimal control problem. The time-varying reference model modification is accomplished by the real-time solutions of the time-varying Riccati and Sylvester equations coupled with the least-squares parameter estimation of the sensitivities of the performance metric. The effectiveness of the proposed method is demonstrated by an application of maneuver load alleviation control for a flexible aircraft.

  18. Non-corticosteroid risk factors of symptomatic avascular necrosis of bone in systemic lupus erythematosus: A retrospective case-control study.

    PubMed

    Faezi, Seyedeh Tahereh; Hoseinian, Azam Sadat; Paragomi, Pedram; Akbarian, Mahmood; Esfahanian, Fatemeh; Gharibdoost, Farhad; Akhlaghi, Maassoumeh; Nadji, Abdolhadi; Jamshidi, Ahmad Reza; Shahram, Farhad; Nejadhosseinian, Mohammad; Davatchi, Fereydoun

    2015-07-01

    Avascular necrosis of bone (AVN) is an important complication of systemic lupus erythematosus (SLE). Corticosteroid therapy has been underlined as a main risk factor for osteonecrosis. However, AVN development in patients who have never received corticosteroid and the absence of AVN in the majority of the patients, who received corticosteroid, propose a role for non-corticosteroid risk factors in AVN development. This case-control study included two subsets: oral corticosteroid (66 AVN and 248 non-AVN patients) and pulse-therapy subset (39 AVN and 312 non-AVN patients) who have attended our Lupus clinic from 1979 to 2009. Patients received similar cumulative dose corticosteroid, equal maximum dose and 1-year maximum dose of corticosteroid. The demographic data (including sex, age of disease onset, age at the diagnosis of AVN), organs involvement, SLE Disease Activity Index (SLEDAI), Systemic Lupus International Collaborating Clinics/American College of Rheumatology-Damage index (SLICC/ACR-DI), number of disease flare ups were compared between two subsets. The mean age of SLE onset was younger (P value = 0.04) in the AVN patients. In oral corticosteroid subset, malar rash (P value < 0.001) and oral ulcer (P value = 0.003) were seen more frequently in non-AVN patients, whereas psychosis (P value = 0.03) was significantly more prevalent AVN subset in oral corticosteroid subset. In corticosteroid pulse subset, no significant difference in clinical features was noted. In oral corticosteroid subset, younger age of disease onset and psychosis were significantly associated with AVN, whereas malar rash and oral ulcer showed negative association AVN.

  19. Automatic exudate detection by fusing multiple active contours and regionwise classification.

    PubMed

    Harangi, Balazs; Hajdu, Andras

    2014-11-01

    In this paper, we propose a method for the automatic detection of exudates in digital fundus images. Our approach can be divided into three stages: candidate extraction, precise contour segmentation and the labeling of candidates as true or false exudates. For candidate detection, we borrow a grayscale morphology-based method to identify possible regions containing these bright lesions. Then, to extract the precise boundary of the candidates, we introduce a complex active contour-based method. Namely, to increase the accuracy of segmentation, we extract additional possible contours by taking advantage of the diverse behavior of different pre-processing methods. After selecting an appropriate combination of the extracted contours, a region-wise classifier is applied to remove the false exudate candidates. For this task, we consider several region-based features, and extract an appropriate feature subset to train a Naïve-Bayes classifier optimized further by an adaptive boosting technique. Regarding experimental studies, the method was tested on publicly available databases both to measure the accuracy of the segmentation of exudate regions and to recognize their presence at image-level. In a proper quantitative evaluation on publicly available datasets the proposed approach outperformed several state-of-the-art exudate detector algorithms. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Hidden Markov Model and Support Vector Machine based decoding of finger movements using Electrocorticography

    PubMed Central

    Wissel, Tobias; Pfeiffer, Tim; Frysch, Robert; Knight, Robert T.; Chang, Edward F.; Hinrichs, Hermann; Rieger, Jochem W.; Rose, Georg

    2013-01-01

    Objective Support Vector Machines (SVM) have developed into a gold standard for accurate classification in Brain-Computer-Interfaces (BCI). The choice of the most appropriate classifier for a particular application depends on several characteristics in addition to decoding accuracy. Here we investigate the implementation of Hidden Markov Models (HMM)for online BCIs and discuss strategies to improve their performance. Approach We compare the SVM, serving as a reference, and HMMs for classifying discrete finger movements obtained from the Electrocorticograms of four subjects doing a finger tapping experiment. The classifier decisions are based on a subset of low-frequency time domain and high gamma oscillation features. Main results We show that decoding optimization between the two approaches is due to the way features are extracted and selected and less dependent on the classifier. An additional gain in HMM performance of up to 6% was obtained by introducing model constraints. Comparable accuracies of up to 90% were achieved with both SVM and HMM with the high gamma cortical response providing the most important decoding information for both techniques. Significance We discuss technical HMM characteristics and adaptations in the context of the presented data as well as for general BCI applications. Our findings suggest that HMMs and their characteristics are promising for efficient online brain-computer interfaces. PMID:24045504

  1. Rule Extraction Based on Extreme Learning Machine and an Improved Ant-Miner Algorithm for Transient Stability Assessment.

    PubMed

    Li, Yang; Li, Guoqing; Wang, Zhenhao

    2015-01-01

    In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA) methods, a new rule extraction method based on extreme learning machine (ELM) and an improved Ant-miner (IAM) algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system--the southern power system of Hebei province.

  2. Bacterial microcompartment assembly: The key role of encapsulation peptides

    DOE PAGES

    Aussignargues, Clément; Paasch, Bradley C.; Gonzalez-Esquer, Raul; ...

    2015-06-23

    Bacterial microcompartments (BMCs) are proteinaceous organelles used by a broad range of bacteria to segregate and optimize metabolic reactions. Their functions are diverse, and can be divided into anabolic (carboxysome) and catabolic (metabolosomes) processes, depending on their cargo enzymes. The assembly pathway for the β-carboxysome has been characterized, revealing that biogenesis proceeds from the inside out. The enzymes coalesce into a procarboxysome, followed by encapsulation in a protein shell that is recruited to the procarboxysome by a short (~17 amino acids) extension on the C-terminus of one of the encapsulated proteins. A similar extension is also found on the N-more » or C-termini of a subset of metabolosome core enzymes. These encapsulation peptides (EPs) are characterized by a primary structure predicted to form an amphipathic α-helix that interacts with shell proteins. In this study, we review the features, function and widespread occurrence of EPs among metabolosomes, and propose an expanded role for EPs in the assembly of diverse BMCs.« less

  3. Migration and Tissue Tropism of Innate Lymphoid Cells

    PubMed Central

    Kim, Chang H.; Hashimoto-Hill, Seika; Kim, Myunghoo

    2016-01-01

    Innate lymphoid cell (ILCs) subsets differentially populate various barrier and non-barrier tissues, where they play important roles in tissue homeostasis and tissue-specific responses to pathogen attack. Recent findings have provided insight into the molecular mechanisms that guide ILC migration into peripheral tissues, revealing common features among different ILC subsets as well as important distinctions. Recent studies have also highlighted the impact of tissue-specific cues on ILC migration, and the importance of the local immunological milieu. We review these findings here and discuss how the migratory patterns and tissue tropism of different ILC subsets relate to the development and differentiation of these cells, and to ILC-mediated tissue-specific regulation of innate and adaptive immune responses. In this context we outline open questions and important areas of future research. PMID:26708278

  4. Comparative analyses of structural features and scaffold diversity for purchasable compound libraries.

    PubMed

    Shang, Jun; Sun, Huiyong; Liu, Hui; Chen, Fu; Tian, Sheng; Pan, Peichen; Li, Dan; Kong, Dexin; Hou, Tingjun

    2017-04-21

    Large purchasable screening libraries of small molecules afforded by commercial vendors are indispensable sources for virtual screening (VS). Selecting an optimal screening library for a specific VS campaign is quite important to improve the success rates and avoid wasting resources in later experimental phases. Analysis of the structural features and molecular diversity for different screening libraries can provide valuable information to the decision making process when selecting screening libraries for VS. In this study, the structural features and scaffold diversity of eleven purchasable screening libraries and Traditional Chinese Medicine Compound Database (TCMCD) were analyzed and compared. Their scaffold diversity represented by the Murcko frameworks and Level 1 scaffolds was characterized by the scaffold counts and cumulative scaffold frequency plots, and visualized by Tree Maps and SAR Maps. The analysis demonstrates that, based on the standardized subsets with similar molecular weight distributions, Chembridge, ChemicalBlock, Mucle, TCMCD and VitasM are more structurally diverse than the others. Compared with all purchasable screening libraries, TCMCD has the highest structural complexity indeed but more conservative molecular scaffolds. Moreover, we found that some representative scaffolds were important components of drug candidates against different drug targets, such as kinases and guanosine-binding protein coupled receptors, and therefore the molecules containing pharmacologically important scaffolds found in screening libraries might be potential inhibitors against the relevant targets. This study may provide valuable perspective on which purchasable compound libraries are better for you to screen. Graphical abstract Selecting diverse compound libraries with scaffold analyses.

  5. Time optimal paths for high speed maneuvering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reister, D.B.; Lenhart, S.M.

    1993-01-01

    Recent theoretical results have completely solved the problem of determining the minimum length path for a vehicle with a minimum turning radius moving from an initial configuration to a final configuration. Time optimal paths for a constant speed vehicle are a subset of the minimum length paths. This paper uses the Pontryagin maximum principle to find time optimal paths for a constant speed vehicle. The time optimal paths consist of sequences of axes of circles and straight lines. The maximum principle introduces concepts (dual variables, bang-bang solutions, singular solutions, and transversality conditions) that provide important insight into the nature ofmore » the time optimal paths. We explore the properties of the optimal paths and present some experimental results for a mobile robot following an optimal path.« less

  6. Systematic wavelength selection for improved multivariate spectral analysis

    DOEpatents

    Thomas, Edward V.; Robinson, Mark R.; Haaland, David M.

    1995-01-01

    Methods and apparatus for determining in a biological material one or more unknown values of at least one known characteristic (e.g. the concentration of an analyte such as glucose in blood or the concentration of one or more blood gas parameters) with a model based on a set of samples with known values of the known characteristics and a multivariate algorithm using several wavelength subsets. The method includes selecting multiple wavelength subsets, from the electromagnetic spectral region appropriate for determining the known characteristic, for use by an algorithm wherein the selection of wavelength subsets improves the model's fitness of the determination for the unknown values of the known characteristic. The selection process utilizes multivariate search methods that select both predictive and synergistic wavelengths within the range of wavelengths utilized. The fitness of the wavelength subsets is determined by the fitness function F=.function.(cost, performance). The method includes the steps of: (1) using one or more applications of a genetic algorithm to produce one or more count spectra, with multiple count spectra then combined to produce a combined count spectrum; (2) smoothing the count spectrum; (3) selecting a threshold count from a count spectrum to select these wavelength subsets which optimize the fitness function; and (4) eliminating a portion of the selected wavelength subsets. The determination of the unknown values can be made: (1) noninvasively and in vivo; (2) invasively and in vivo; or (3) in vitro.

  7. Computer-aided classification of breast masses using contrast-enhanced digital mammograms

    NASA Astrophysics Data System (ADS)

    Danala, Gopichandh; Aghaei, Faranak; Heidari, Morteza; Wu, Teresa; Patel, Bhavika; Zheng, Bin

    2018-02-01

    By taking advantages of both mammography and breast MRI, contrast-enhanced digital mammography (CEDM) has emerged as a new promising imaging modality to improve efficacy of breast cancer screening and diagnosis. The primary objective of study is to develop and evaluate a new computer-aided detection and diagnosis (CAD) scheme of CEDM images to classify between malignant and benign breast masses. A CEDM dataset consisting of 111 patients (33 benign and 78 malignant) was retrospectively assembled. Each case includes two types of images namely, low-energy (LE) and dual-energy subtracted (DES) images. First, CAD scheme applied a hybrid segmentation method to automatically segment masses depicting on LE and DES images separately. Optimal segmentation results from DES images were also mapped to LE images and vice versa. Next, a set of 109 quantitative image features related to mass shape and density heterogeneity was initially computed. Last, four multilayer perceptron-based machine learning classifiers integrated with correlationbased feature subset evaluator and leave-one-case-out cross-validation method was built to classify mass regions depicting on LE and DES images, respectively. Initially, when CAD scheme was applied to original segmentation of DES and LE images, the areas under ROC curves were 0.7585+/-0.0526 and 0.7534+/-0.0470, respectively. After optimal segmentation mapping from DES to LE images, AUC value of CAD scheme significantly increased to 0.8477+/-0.0376 (p<0.01). Since DES images eliminate overlapping effect of dense breast tissue on lesions, segmentation accuracy was significantly improved as compared to regular mammograms, the study demonstrated that computer-aided classification of breast masses using CEDM images yielded higher performance.

  8. Image alignment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dowell, Larry Jonathan

    Disclosed is a method and device for aligning at least two digital images. An embodiment may use frequency-domain transforms of small tiles created from each image to identify substantially similar, "distinguishing" features within each of the images, and then align the images together based on the location of the distinguishing features. To accomplish this, an embodiment may create equal sized tile sub-images for each image. A "key" for each tile may be created by performing a frequency-domain transform calculation on each tile. A information-distance difference between each possible pair of tiles on each image may be calculated to identify distinguishingmore » features. From analysis of the information-distance differences of the pairs of tiles, a subset of tiles with high discrimination metrics in relation to other tiles may be located for each image. The subset of distinguishing tiles for each image may then be compared to locate tiles with substantially similar keys and/or information-distance metrics to other tiles of other images. Once similar tiles are located for each image, the images may be aligned in relation to the identified similar tiles.« less

  9. Incorporating High-Frequency Physiologic Data Using Computational Dictionary Learning Improves Prediction of Delayed Cerebral Ischemia Compared to Existing Methods.

    PubMed

    Megjhani, Murad; Terilli, Kalijah; Frey, Hans-Peter; Velazquez, Angela G; Doyle, Kevin William; Connolly, Edward Sander; Roh, David Jinou; Agarwal, Sachin; Claassen, Jan; Elhadad, Noemie; Park, Soojin

    2018-01-01

    Accurate prediction of delayed cerebral ischemia (DCI) after subarachnoid hemorrhage (SAH) can be critical for planning interventions to prevent poor neurological outcome. This paper presents a model using convolution dictionary learning to extract features from physiological data available from bedside monitors. We develop and validate a prediction model for DCI after SAH, demonstrating improved precision over standard methods alone. 488 consecutive SAH admissions from 2006 to 2014 to a tertiary care hospital were included. Models were trained on 80%, while 20% were set aside for validation testing. Modified Fisher Scale was considered the standard grading scale in clinical use; baseline features also analyzed included age, sex, Hunt-Hess, and Glasgow Coma Scales. An unsupervised approach using convolution dictionary learning was used to extract features from physiological time series (systolic blood pressure and diastolic blood pressure, heart rate, respiratory rate, and oxygen saturation). Classifiers (partial least squares and linear and kernel support vector machines) were trained on feature subsets of the derivation dataset. Models were applied to the validation dataset. The performances of the best classifiers on the validation dataset are reported by feature subset. Standard grading scale (mFS): AUC 0.54. Combined demographics and grading scales (baseline features): AUC 0.63. Kernel derived physiologic features: AUC 0.66. Combined baseline and physiologic features with redundant feature reduction: AUC 0.71 on derivation dataset and 0.78 on validation dataset. Current DCI prediction tools rely on admission imaging and are advantageously simple to employ. However, using an agnostic and computationally inexpensive learning approach for high-frequency physiologic time series data, we demonstrated that we could incorporate individual physiologic data to achieve higher classification accuracy.

  10. Optimizing procedures for a human genome repository. Final report, June 1, 1988--November 30, 1990

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nierman, W.C.

    1991-03-01

    Large numbers of clones will be generated during the Human Genome Project. As each is characterized, subsets will be identified which are useful to the scientific community at large. These subsets are most readily distributed through public repositories. The American Type Culture Collection (ATCC) is experienced in repository operation, but before this project had no history in managing clones and associated information in large batches instead of individually. This project permitted the ATCC to develop several procedures for automating and thus reducing the cost of characterizing, preserving, and maintaining information about clones.

  11. Information Fusion for High Level Situation Assessment and Prediction

    DTIC Science & Technology

    2007-03-01

    procedure includes deciding a sensor set that achieves the optimal trade -off between its cost and benefit, activating the identified sensors, integrating...and effective decision can be made by dynamic inference based on selecting a subset of sensors with the optimal trade -off between their cost and...first step is achieved by designing a sensor selection criterion that represents the trade -off between the sensor benefit and sensor cost. This is then

  12. The kinship2 R package for pedigree data.

    PubMed

    Sinnwell, Jason P; Therneau, Terry M; Schaid, Daniel J

    2014-01-01

    The kinship2 package is restructured from the previous kinship package. Existing features are now enhanced and new features added for handling pedigree objects. Pedigree plotting features have been updated to display features on complex pedigrees while adhering to pedigree plotting standards. Kinship matrices can now be calculated for the X chromosome. Other methods have been added to subset and trim pedigrees while maintaining the pedigree structure. We make the kinship2 package available for R on the Contributed R Archives Network (CRAN), where data management is built-in and other packages can use the pedigree object.

  13. Hybrid NN/SVM Computational System for Optimizing Designs

    NASA Technical Reports Server (NTRS)

    Rai, Man Mohan

    2009-01-01

    A computational method and system based on a hybrid of an artificial neural network (NN) and a support vector machine (SVM) (see figure) has been conceived as a means of maximizing or minimizing an objective function, optionally subject to one or more constraints. Such maximization or minimization could be performed, for example, to optimize solve a data-regression or data-classification problem or to optimize a design associated with a response function. A response function can be considered as a subset of a response surface, which is a surface in a vector space of design and performance parameters. A typical example of a design problem that the method and system can be used to solve is that of an airfoil, for which a response function could be the spatial distribution of pressure over the airfoil. In this example, the response surface would describe the pressure distribution as a function of the operating conditions and the geometric parameters of the airfoil. The use of NNs to analyze physical objects in order to optimize their responses under specified physical conditions is well known. NN analysis is suitable for multidimensional interpolation of data that lack structure and enables the representation and optimization of a succession of numerical solutions of increasing complexity or increasing fidelity to the real world. NN analysis is especially useful in helping to satisfy multiple design objectives. Feedforward NNs can be used to make estimates based on nonlinear mathematical models. One difficulty associated with use of a feedforward NN arises from the need for nonlinear optimization to determine connection weights among input, intermediate, and output variables. It can be very expensive to train an NN in cases in which it is necessary to model large amounts of information. Less widely known (in comparison with NNs) are support vector machines (SVMs), which were originally applied in statistical learning theory. In terms that are necessarily oversimplified to fit the scope of this article, an SVM can be characterized as an algorithm that (1) effects a nonlinear mapping of input vectors into a higher-dimensional feature space and (2) involves a dual formulation of governing equations and constraints. One advantageous feature of the SVM approach is that an objective function (which one seeks to minimize to obtain coefficients that define an SVM mathematical model) is convex, so that unlike in the cases of many NN models, any local minimum of an SVM model is also a global minimum.

  14. An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records

    PubMed Central

    Kavuluru, Ramakanth; Rios, Anthony; Lu, Yuan

    2015-01-01

    Background Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. Objective We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. Methods We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011–2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. Results Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. Conclusions We show that datasets at different scale (size of the EMRs, number of distinct codes) and with different characteristics warrant different learning approaches. For shorter narratives pertaining to a particular medical subdomain (e.g., radiology, pathology), classifier chaining is ideal given the codes are highly related with each other. For realistic in-patient full EMRs, feature and data selection methods offer high performance for smaller datasets. However, for large EMR datasets, we observe that the binary relevance approach with learning-to-rank based code reranking offers the best performance. Regardless of the training dataset size, for general EMRs, label calibration to select the optimal number of labels is an indispensable final step. PMID:26054428

  15. An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

    PubMed

    Kavuluru, Ramakanth; Rios, Anthony; Lu, Yuan

    2015-10-01

    Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011-2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. We show that datasets at different scale (size of the EMRs, number of distinct codes) and with different characteristics warrant different learning approaches. For shorter narratives pertaining to a particular medical subdomain (e.g., radiology, pathology), classifier chaining is ideal given the codes are highly related with each other. For realistic in-patient full EMRs, feature and data selection methods offer high performance for smaller datasets. However, for large EMR datasets, we observe that the binary relevance approach with learning-to-rank based code reranking offers the best performance. Regardless of the training dataset size, for general EMRs, label calibration to select the optimal number of labels is an indispensable final step. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Estimation of Symptom Severity During Chemotherapy From Passively Sensed Data: Exploratory Study

    PubMed Central

    Dey, Anind K; Ferreira, Denzil; Kamarck, Thomas; Sun, Weijing; Bae, Sangwon; Doryab, Afsaneh

    2017-01-01

    Background Physical and psychological symptoms are common during chemotherapy in cancer patients, and real-time monitoring of these symptoms can improve patient outcomes. Sensors embedded in mobile phones and wearable activity trackers could be potentially useful in monitoring symptoms passively, with minimal patient burden. Objective The aim of this study was to explore whether passively sensed mobile phone and Fitbit data could be used to estimate daily symptom burden during chemotherapy. Methods A total of 14 patients undergoing chemotherapy for gastrointestinal cancer participated in the 4-week study. Participants carried an Android phone and wore a Fitbit device for the duration of the study and also completed daily severity ratings of 12 common symptoms. Symptom severity ratings were summed to create a total symptom burden score for each day, and ratings were centered on individual patient means and categorized into low, average, and high symptom burden days. Day-level features were extracted from raw mobile phone sensor and Fitbit data and included features reflecting mobility and activity, sleep, phone usage (eg, duration of interaction with phone and apps), and communication (eg, number of incoming and outgoing calls and messages). We used a rotation random forests classifier with cross-validation and resampling with replacement to evaluate population and individual model performance and correlation-based feature subset selection to select nonredundant features with the best predictive ability. Results Across 295 days of data with both symptom and sensor data, a number of mobile phone and Fitbit features were correlated with patient-reported symptom burden scores. We achieved an accuracy of 88.1% for our population model. The subset of features with the best accuracy included sedentary behavior as the most frequent activity, fewer minutes in light physical activity, less variable and average acceleration of the phone, and longer screen-on time and interactions with apps on the phone. Mobile phone features had better predictive ability than Fitbit features. Accuracy of individual models ranged from 78.1% to 100% (mean 88.4%), and subsets of relevant features varied across participants. Conclusions Passive sensor data, including mobile phone accelerometer and usage and Fitbit-assessed activity and sleep, were related to daily symptom burden during chemotherapy. These findings highlight opportunities for long-term monitoring of cancer patients during chemotherapy with minimal patient burden as well as real-time adaptive interventions aimed at early management of worsening or severe symptoms. PMID:29258977

  17. Estimation of Symptom Severity During Chemotherapy From Passively Sensed Data: Exploratory Study.

    PubMed

    Low, Carissa A; Dey, Anind K; Ferreira, Denzil; Kamarck, Thomas; Sun, Weijing; Bae, Sangwon; Doryab, Afsaneh

    2017-12-19

    Physical and psychological symptoms are common during chemotherapy in cancer patients, and real-time monitoring of these symptoms can improve patient outcomes. Sensors embedded in mobile phones and wearable activity trackers could be potentially useful in monitoring symptoms passively, with minimal patient burden. The aim of this study was to explore whether passively sensed mobile phone and Fitbit data could be used to estimate daily symptom burden during chemotherapy. A total of 14 patients undergoing chemotherapy for gastrointestinal cancer participated in the 4-week study. Participants carried an Android phone and wore a Fitbit device for the duration of the study and also completed daily severity ratings of 12 common symptoms. Symptom severity ratings were summed to create a total symptom burden score for each day, and ratings were centered on individual patient means and categorized into low, average, and high symptom burden days. Day-level features were extracted from raw mobile phone sensor and Fitbit data and included features reflecting mobility and activity, sleep, phone usage (eg, duration of interaction with phone and apps), and communication (eg, number of incoming and outgoing calls and messages). We used a rotation random forests classifier with cross-validation and resampling with replacement to evaluate population and individual model performance and correlation-based feature subset selection to select nonredundant features with the best predictive ability. Across 295 days of data with both symptom and sensor data, a number of mobile phone and Fitbit features were correlated with patient-reported symptom burden scores. We achieved an accuracy of 88.1% for our population model. The subset of features with the best accuracy included sedentary behavior as the most frequent activity, fewer minutes in light physical activity, less variable and average acceleration of the phone, and longer screen-on time and interactions with apps on the phone. Mobile phone features had better predictive ability than Fitbit features. Accuracy of individual models ranged from 78.1% to 100% (mean 88.4%), and subsets of relevant features varied across participants. Passive sensor data, including mobile phone accelerometer and usage and Fitbit-assessed activity and sleep, were related to daily symptom burden during chemotherapy. These findings highlight opportunities for long-term monitoring of cancer patients during chemotherapy with minimal patient burden as well as real-time adaptive interventions aimed at early management of worsening or severe symptoms. ©Carissa A Low, Anind K Dey, Denzil Ferreira, Thomas Kamarck, Weijing Sun, Sangwon Bae, Afsaneh Doryab. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 19.12.2017.

  18. Removing ballistocardiogram (BCG) artifact from full-scalp EEG acquired inside the MR scanner with Orthogonal Matching Pursuit (OMP)

    PubMed Central

    Xia, Hongjing; Ruan, Dan; Cohen, Mark S.

    2014-01-01

    Ballistocardiogram (BCG) artifact remains a major challenge that renders electroencephalographic (EEG) signals hard to interpret in simultaneous EEG and functional MRI (fMRI) data acquisition. Here, we propose an integrated learning and inference approach that takes advantage of a commercial high-density EEG cap, to estimate the BCG contribution in noisy EEG recordings from inside the MR scanner. To estimate reliably the full-scalp BCG artifacts, a near-optimal subset (20 out of 256) of channels first was identified using a modified recording setup. In subsequent recordings inside the MR scanner, BCG-only signal from this subset of channels was used to generate continuous estimates of the full-scalp BCG artifacts via inference, from which the intended EEG signal was recovered. The reconstruction of the EEG was performed with both a direct subtraction and an optimization scheme. We evaluated the performance on both synthetic and real contaminated recordings, and compared it to the benchmark Optimal Basis Set (OBS) method. In the challenging non-event-related-potential (non-ERP) EEG studies, our reconstruction can yield more than fourteen-fold improvement in reducing the normalized RMS error of EEG signals, compared to OBS. PMID:25120421

  19. Optimized probability sampling of study sites to improve generalizability in a multisite intervention trial.

    PubMed

    Kraschnewski, Jennifer L; Keyserling, Thomas C; Bangdiwala, Shrikant I; Gizlice, Ziya; Garcia, Beverly A; Johnston, Larry F; Gustafson, Alison; Petrovic, Lindsay; Glasgow, Russell E; Samuel-Hodge, Carmen D

    2010-01-01

    Studies of type 2 translation, the adaption of evidence-based interventions to real-world settings, should include representative study sites and staff to improve external validity. Sites for such studies are, however, often selected by convenience sampling, which limits generalizability. We used an optimized probability sampling protocol to select an unbiased, representative sample of study sites to prepare for a randomized trial of a weight loss intervention. We invited North Carolina health departments within 200 miles of the research center to participate (N = 81). Of the 43 health departments that were eligible, 30 were interested in participating. To select a representative and feasible sample of 6 health departments that met inclusion criteria, we generated all combinations of 6 from the 30 health departments that were eligible and interested. From the subset of combinations that met inclusion criteria, we selected 1 at random. Of 593,775 possible combinations of 6 counties, 15,177 (3%) met inclusion criteria. Sites in the selected subset were similar to all eligible sites in terms of health department characteristics and county demographics. Optimized probability sampling improved generalizability by ensuring an unbiased and representative sample of study sites.

  20. Application of Sequential Quadratic Programming to Minimize Smart Active Flap Rotor Hub Loads

    NASA Technical Reports Server (NTRS)

    Kottapalli, Sesi; Leyland, Jane

    2014-01-01

    In an analytical study, SMART active flap rotor hub loads have been minimized using nonlinear programming constrained optimization methodology. The recently developed NLPQLP system (Schittkowski, 2010) that employs Sequential Quadratic Programming (SQP) as its core algorithm was embedded into a driver code (NLP10x10) specifically designed to minimize active flap rotor hub loads (Leyland, 2014). Three types of practical constraints on the flap deflections have been considered. To validate the current application, two other optimization methods have been used: i) the standard, linear unconstrained method, and ii) the nonlinear Generalized Reduced Gradient (GRG) method with constraints. The new software code NLP10x10 has been systematically checked out. It has been verified that NLP10x10 is functioning as desired. The following are briefly covered in this paper: relevant optimization theory; implementation of the capability of minimizing a metric of all, or a subset, of the hub loads as well as the capability of using all, or a subset, of the flap harmonics; and finally, solutions for the SMART rotor. The eventual goal is to implement NLP10x10 in a real-time wind tunnel environment.

  1. Day-Ahead PM2.5 Concentration Forecasting Using WT-VMD Based Decomposition Method and Back Propagation Neural Network Improved by Differential Evolution

    PubMed Central

    Wang, Deyun; Liu, Yanling; Luo, Hongyuan; Yue, Chenqiang; Cheng, Sheng

    2017-01-01

    Accurate PM2.5 concentration forecasting is crucial for protecting public health and atmospheric environment. However, the intermittent and unstable nature of PM2.5 concentration series makes its forecasting become a very difficult task. In order to improve the forecast accuracy of PM2.5 concentration, this paper proposes a hybrid model based on wavelet transform (WT), variational mode decomposition (VMD) and back propagation (BP) neural network optimized by differential evolution (DE) algorithm. Firstly, WT is employed to disassemble the PM2.5 concentration series into a number of subsets with different frequencies. Secondly, VMD is applied to decompose each subset into a set of variational modes (VMs). Thirdly, DE-BP model is utilized to forecast all the VMs. Fourthly, the forecast value of each subset is obtained through aggregating the forecast results of all the VMs obtained from VMD decomposition of this subset. Finally, the final forecast series of PM2.5 concentration is obtained by adding up the forecast values of all subsets. Two PM2.5 concentration series collected from Wuhan and Tianjin, respectively, located in China are used to test the effectiveness of the proposed model. The results demonstrate that the proposed model outperforms all the other considered models in this paper. PMID:28704955

  2. System and method of cylinder deactivation for optimal engine torque-speed map operation

    DOEpatents

    Sujan, Vivek A; Frazier, Timothy R; Follen, Kenneth; Moon, Suk-Min

    2014-11-11

    This disclosure provides a system and method for determining cylinder deactivation in a vehicle engine to optimize fuel consumption while providing the desired or demanded power. In one aspect, data indicative of terrain variation is utilized in determining a vehicle target operating state. An optimal active cylinder distribution and corresponding fueling is determined from a recommendation from a supervisory agent monitoring the operating state of the vehicle of a subset of the total number of cylinders, and a determination as to which number of cylinders provides the optimal fuel consumption. Once the optimal cylinder number is determined, a transmission gear shift recommendation is provided in view of the determined active cylinder distribution and target operating state.

  3. MEN1, MEN4, and Carney Complex: Pathology and Molecular Genetics

    PubMed Central

    Schernthaner-Reiter, Marie Helene; Trivellin, Giampaolo; Stratakis, Constantine A.

    2015-01-01

    Pituitary adenomas are a common feature of a subset of endocrine neoplasia syndromes, which have otherwise highly variable disease manifestations. We provide here a review of the clinical features and human molecular genetics of multiple endocrine neoplasia type 1 and 4 (MEN1 and MEN4, respectively) and Carney complex (CNC). MEN1, MEN4 and CNC are hereditary autosomal dominant syndromes that can present with pituitary adenomas. MEN1 is caused by inactivating mutations in the MEN1 gene, whose product menin is involved in multiple intracellular pathways contributing to transcriptional control and cell proliferation. MEN1 clinical features include primary hyperparathyroidism, pancreatic neuroendocrine tumours and prolactinomas and other pituitary adenomas. A subset of patients with pituitary adenomas and other MEN1 features have mutations in the CDKN1B gene; their disease has been called MEN type 4 (MEN4). Inactivating mutations in the type 1α regulatory subunit of protein kinase A (PKA) (the PRKAR1A gene), that lead to dysregulation and activation of the PKA pathway, are the main genetic cause of CNC, which is clinically characterised by primary pigmented adrenocortical disease (PPNAD), spotty skin pigmentation (lentigines), cardiac and other myxomas and acromegaly due to somatotropinomas or somatotrope hyperplasia. PMID:25592387

  4. Density-Dependent Quantized Least Squares Support Vector Machine for Large Data Sets.

    PubMed

    Nan, Shengyu; Sun, Lei; Chen, Badong; Lin, Zhiping; Toh, Kar-Ann

    2017-01-01

    Based on the knowledge that input data distribution is important for learning, a data density-dependent quantization scheme (DQS) is proposed for sparse input data representation. The usefulness of the representation scheme is demonstrated by using it as a data preprocessing unit attached to the well-known least squares support vector machine (LS-SVM) for application on big data sets. Essentially, the proposed DQS adopts a single shrinkage threshold to obtain a simple quantization scheme, which adapts its outputs to input data density. With this quantization scheme, a large data set is quantized to a small subset where considerable sample size reduction is generally obtained. In particular, the sample size reduction can save significant computational cost when using the quantized subset for feature approximation via the Nyström method. Based on the quantized subset, the approximated features are incorporated into LS-SVM to develop a data density-dependent quantized LS-SVM (DQLS-SVM), where an analytic solution is obtained in the primal solution space. The developed DQLS-SVM is evaluated on synthetic and benchmark data with particular emphasis on large data sets. Extensive experimental results show that the learning machine incorporating DQS attains not only high computational efficiency but also good generalization performance.

  5. Colon cancer: it's CIN or CIMP.

    PubMed

    Issa, Jean-Pierre

    2008-10-01

    Combined genetic and epigenetic analysis of sporadic colon cancer suggest that it can no longer be viewed as a single disease. There are at least three different subsets with distinct clinico-pathologic features, with important implications for preventions, screening, and therapy.

  6. Automated pixel-wise brain tissue segmentation of diffusion-weighted images via machine learning.

    PubMed

    Ciritsis, Alexander; Boss, Andreas; Rossi, Cristina

    2018-04-26

    The diffusion-weighted (DW) MR signal sampled over a wide range of b-values potentially allows for tissue differentiation in terms of cellularity, microstructure, perfusion, and T 2 relaxivity. This study aimed to implement a machine learning algorithm for automatic brain tissue segmentation from DW-MRI datasets, and to determine the optimal sub-set of features for accurate segmentation. DWI was performed at 3 T in eight healthy volunteers using 15 b-values and 20 diffusion-encoding directions. The pixel-wise signal attenuation, as well as the trace and fractional anisotropy (FA) of the diffusion tensor, were used as features to train a support vector machine classifier for gray matter, white matter, and cerebrospinal fluid classes. The datasets of two volunteers were used for validation. For each subject, tissue classification was also performed on 3D T 1 -weighted data sets with a probabilistic framework. Confusion matrices were generated for quantitative assessment of image classification accuracy in comparison with the reference method. DWI-based tissue segmentation resulted in an accuracy of 82.1% on the validation dataset and of 82.2% on the training dataset, excluding relevant model over-fitting. A mean Dice coefficient (DSC) of 0.79 ± 0.08 was found. About 50% of the classification performance was attributable to five features (i.e. signal measured at b-values of 5/10/500/1200 s/mm 2 and the FA). This reduced set of features led to almost identical performances for the validation (82.2%) and the training (81.4%) datasets (DSC = 0.79 ± 0.08). Machine learning techniques applied to DWI data allow for accurate brain tissue segmentation based on both morphological and functional information. Copyright © 2018 John Wiley & Sons, Ltd.

  7. Automatic identification of variables in epidemiological datasets using logic regression.

    PubMed

    Lorenz, Matthias W; Abdi, Negin Ashtiani; Scheckenbach, Frank; Pflug, Anja; Bülbül, Alpaslan; Catapano, Alberico L; Agewall, Stefan; Ezhov, Marat; Bots, Michiel L; Kiechl, Stefan; Orth, Andreas

    2017-04-13

    For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.

  8. Spectral Band Selection for Urban Material Classification Using Hyperspectral Libraries

    NASA Astrophysics Data System (ADS)

    Le Bris, A.; Chehata, N.; Briottet, X.; Paparoditis, N.

    2016-06-01

    In urban areas, information concerning very high resolution land cover and especially material maps are necessary for several city modelling or monitoring applications. That is to say, knowledge concerning the roofing materials or the different kinds of ground areas is required. Airborne remote sensing techniques appear to be convenient for providing such information at a large scale. However, results obtained using most traditional processing methods based on usual red-green-blue-near infrared multispectral images remain limited for such applications. A possible way to improve classification results is to enhance the imagery spectral resolution using superspectral or hyperspectral sensors. In this study, it is intended to design a superspectral sensor dedicated to urban materials classification and this work particularly focused on the selection of the optimal spectral band subsets for such sensor. First, reflectance spectral signatures of urban materials were collected from 7 spectral libraires. Then, spectral optimization was performed using this data set. The band selection workflow included two steps, optimising first the number of spectral bands using an incremental method and then examining several possible optimised band subsets using a stochastic algorithm. The same wrapper relevance criterion relying on a confidence measure of Random Forests classifier was used at both steps. To cope with the limited number of available spectra for several classes, additional synthetic spectra were generated from the collection of reference spectra: intra-class variability was simulated by multiplying reference spectra by a random coefficient. At the end, selected band subsets were evaluated considering the classification quality reached using a rbf svm classifier. It was confirmed that a limited band subset was sufficient to classify common urban materials. The important contribution of bands from the Short Wave Infra-Red (SWIR) spectral domain (1000-2400 nm) to material classification was also shown.

  9. Optimization of brain PET imaging for a multicentre trial: the French CATI experience.

    PubMed

    Habert, Marie-Odile; Marie, Sullivan; Bertin, Hugo; Reynal, Moana; Martini, Jean-Baptiste; Diallo, Mamadou; Kas, Aurélie; Trébossen, Régine

    2016-12-01

    CATI is a French initiative launched in 2010 to handle the neuroimaging of a large cohort of subjects recruited for an Alzheimer's research program called MEMENTO. This paper presents our test protocol and results obtained for the 22 PET centres (overall 13 different scanners) involved in the MEMENTO cohort. We determined acquisition parameters using phantom experiments prior to patient studies, with the aim of optimizing PET quantitative values to the highest possible per site, while reducing, if possible, variability across centres. Jaszczak's and 3D-Hoffman's phantom measurements were used to assess image spatial resolution (ISR), recovery coefficients (RC) in hot and cold spheres, and signal-to-noise ratio (SNR). For each centre, the optimal reconstruction parameters were chosen as those maximizing ISR and RC without a noticeable decrease in SNR. Point-spread-function (PSF) modelling reconstructions were discarded. The three figures of merit extracted from the images reconstructed with optimized parameters and routine schemes were compared, as were volumes of interest ratios extracted from Hoffman acquisitions. The net effect of the 3D-OSEM reconstruction parameter optimization was investigated on a subset of 18 scanners without PSF modelling reconstruction. Compared to the routine parameters of the 22 PET centres, average RC in the two smallest hot and cold spheres and average ISR remained stable or were improved with the optimized reconstruction, at the expense of slight SNR degradation, while the dispersion of values was reduced. For the subset of scanners without PSF modelling, the mean RC of the smallest hot sphere obtained with the optimized reconstruction was significantly higher than with routine reconstruction. The putamen and caudate-to-white matter ratios measured on 3D-Hoffman acquisitions of all centres were also significantly improved by the optimization, while the variance was reduced. This study provides guidelines for optimizing quantitative results for multicentric PET neuroimaging trials.

  10. Constrained Optimal Transport

    NASA Astrophysics Data System (ADS)

    Ekren, Ibrahim; Soner, H. Mete

    2018-03-01

    The classical duality theory of Kantorovich (C R (Doklady) Acad Sci URSS (NS) 37:199-201, 1942) and Kellerer (Z Wahrsch Verw Gebiete 67(4):399-432, 1984) for classical optimal transport is generalized to an abstract framework and a characterization of the dual elements is provided. This abstract generalization is set in a Banach lattice X with an order unit. The problem is given as the supremum over a convex subset of the positive unit sphere of the topological dual of X and the dual problem is defined on the bi-dual of X. These results are then applied to several extensions of the classical optimal transport.

  11. Optimization Of Feature Weight TheVoting Feature Intervals 5 Algorithm Using Partical Swarm Optimization Algorithm

    NASA Astrophysics Data System (ADS)

    Hayana Hasibuan, Eka; Mawengkang, Herman; Efendi, Syahril

    2017-12-01

    The use of Partical Swarm Optimization Algorithm in this research is to optimize the feature weights on the Voting Feature Interval 5 algorithm so that we can find the model of using PSO algorithm with VFI 5. Optimization of feature weight on Diabetes or Dyspesia data is considered important because it is very closely related to the livelihood of many people, so if there is any inaccuracy in determining the most dominant feature weight in the data will cause death. Increased accuracy by using PSO Algorithm ie fold 1 from 92.31% to 96.15% increase accuracy of 3.8%, accuracy of fold 2 on Algorithm VFI5 of 92.52% as well as generated on PSO Algorithm means accuracy fixed, then in fold 3 increase accuracy of 85.19% Increased to 96.29% Accuracy increased by 11%. The total accuracy of all three trials increased by 14%. In general the Partical Swarm Optimization algorithm has succeeded in increasing the accuracy to several fold, therefore it can be concluded the PSO algorithm is well used in optimizing the VFI5 Classification Algorithm.

  12. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  13. Optimal Artificial Boundary Condition Configurations for Sensitivity-Based Model Updating and Damage Detection

    DTIC Science & Technology

    2010-09-01

    matrix is used in many methods, like Jacobi or Gauss Seidel , for solving linear systems. Also, no partial pivoting is necessary for a strictly column...problems that arise during the procedure, which in general, converges to the solving of a linear system. The most common issue with the solution is the... iterative procedure to find an appropriate subset of parameters that produce an optimal solution commonly known as forward selection. Then, the

  14. TU-D-207B-01: A Prediction Model for Distinguishing Radiation Necrosis From Tumor Progression After Gamma Knife Radiosurgery Based On Radiomics Features From MR Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Z; MD Anderson Cancer Center, Houston, TX; Ho, A

    Purpose: To develop and validate a prediction model using radiomics features extracted from MR images to distinguish radiation necrosis from tumor progression for brain metastases treated with Gamma knife radiosurgery. Methods: The images used to develop the model were T1 post-contrast MR scans from 71 patients who had had pathologic confirmation of necrosis or progression; 1 lesion was identified per patient (17 necrosis and 54 progression). Radiomics features were extracted from 2 images at 2 time points per patient, both obtained prior to resection. Each lesion was manually contoured on each image, and 282 radiomics features were calculated for eachmore » lesion. The correlation for each radiomics feature between two time points was calculated within each group to identify a subset of features with distinct values between two groups. The delta of this subset of radiomics features, characterizing changes from the earlier time to the later one, was included as a covariate to build a prediction model using support vector machines with a cubic polynomial kernel function. The model was evaluated with a 10-fold cross-validation. Results: Forty radiomics features were selected based on consistent correlation values of approximately 0 for the necrosis group and >0.2 for the progression group. In performing the 10-fold cross-validation, we narrowed this number down to 11 delta radiomics features for the model. This 11-delta-feature model showed an overall prediction accuracy of 83.1%, with a true positive rate of 58.8% in predicting necrosis and 90.7% for predicting tumor progression. The area under the curve for the prediction model was 0.79. Conclusion: These delta radiomics features extracted from MR scans showed potential for distinguishing radiation necrosis from tumor progression. This tool may be a useful, noninvasive means of determining the status of an enlarging lesion after radiosurgery, aiding decision-making regarding surgical resection versus conservative medical management.« less

  15. Sparse Zero-Sum Games as Stable Functional Feature Selection

    PubMed Central

    Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel

    2015-01-01

    In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268

  16. OPTIMIZING STORMWATER MANAGEMENT RETROFITS BASED ON IMPERVIOUS SURFACE CONNECTIONS TO SEWERS

    EPA Science Inventory

    Although total impervious area (TIA) is often used as an indicator of urban disturbance, recent studies suggest that the subset of impervious surfaces that route stormwater runoff directly to streams via stormwater pipes, called directly connected impervious area (DCIA), may be a...

  17. Genotyping variability of computationally categorized peach microsatellite markers

    USDA-ARS?s Scientific Manuscript database

    Numerous expressed sequence tag (EST) simple sequence repeat (SSR) primers can be easily mined out. The obstacle to develop them into usable markers is how to optimally select downsized subsets of the primers for genotyping, which accordingly reduces amplification failure and monomorphism often occu...

  18. Multivariate Methods for Prediction of Geologic Sample Composition with Laser-Induced Breakdown Spectroscopy

    NASA Technical Reports Server (NTRS)

    Morris, Richard; Anderson, R.; Clegg, S. M.; Bell, J. F., III

    2010-01-01

    Laser-induced breakdown spectroscopy (LIBS) uses pulses of laser light to ablate a material from the surface of a sample and produce an expanding plasma. The optical emission from the plasma produces a spectrum which can be used to classify target materials and estimate their composition. The ChemCam instrument on the Mars Science Laboratory (MSL) mission will use LIBS to rapidly analyze targets remotely, allowing more resource- and time-intensive in-situ analyses to be reserved for targets of particular interest. ChemCam will also be used to analyze samples that are not reachable by the rover's in-situ instruments. Due to these tactical and scientific roles, it is important that ChemCam-derived sample compositions are as accurate as possible. We have compared the results of partial least squares (PLS), multilayer perceptron (MLP) artificial neural networks (ANNs), and cascade correlation (CC) ANNs to determine which technique yields better estimates of quantitative element abundances in rock and mineral samples. The number of hidden nodes in the MLP ANNs was optimized using a genetic algorithm. The influence of two data preprocessing techniques were also investigated: genetic algorithm feature selection and averaging the spectra for each training sample prior to training the PLS and ANN algorithms. We used a ChemCam-like laboratory stand-off LIBS system to collect spectra of 30 pressed powder geostandards and a diverse suite of 196 geologic slab samples of known bulk composition. We tested the performance of PLS and ANNs on a subset of these samples, choosing to focus on silicate rocks and minerals with a loss on ignition of less than 2 percent. This resulted in a set of 22 pressed powder geostandards and 80 geologic samples. Four of the geostandards were used as a validation set and 18 were used as the training set for the algorithms. We found that PLS typically resulted in the lowest average absolute error in its predictions, but that the optimized MLP ANN and the CC ANN often gave results comparable to PLS. Averaging the spectra for each training sample and/or using feature selection to choose a small subset of wavelengths to use for predictions gave mixed results, with degraded performance in some cases and similar or slightly improved performance in other cases. However, training time was significantly reduced for both PLS and ANN methods by implementing feature selection, making this a potentially appealing method for initial, rapid-turn-around analyses necessary for Chemcam's tactical role on MSL. Choice of training samples has a strong influence on the accuracy of predictions. We are currently investigating the use of clustering algorithms (e.g. k-means, neural gas, etc.) to identify training sets that are spectrally similar to the unknown samples that are being predicted, and therefore result in improved predictions

  19. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    PubMed Central

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-01-01

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346

  20. A morphing-based scheme for large deformation analysis with stereo-DIC

    NASA Astrophysics Data System (ADS)

    Genovese, Katia; Sorgente, Donato

    2018-05-01

    A key step in the DIC-based image registration process is the definition of the initial guess for the non-linear optimization routine aimed at finding the parameters describing the pixel subset transformation. This initialization may result very challenging and possibly fail when dealing with pairs of largely deformed images such those obtained from two angled-views of not-flat objects or from the temporal undersampling of rapidly evolving phenomena. To address this problem, we developed a procedure that generates a sequence of intermediate synthetic images for gradually tracking the pixel subset transformation between the two extreme configurations. To this scope, a proper image warping function is defined over the entire image domain through the adoption of a robust feature-based algorithm followed by a NURBS-based interpolation scheme. This allows a fast and reliable estimation of the initial guess of the deformation parameters for the subsequent refinement stage of the DIC analysis. The proposed method is described step-by-step by illustrating the measurement of the large and heterogeneous deformation of a circular silicone membrane undergoing axisymmetric indentation. A comparative analysis of the results is carried out by taking as a benchmark a standard reference-updating approach. Finally, the morphing scheme is extended to the most general case of the correspondence search between two largely deformed textured 3D geometries. The feasibility of this latter approach is demonstrated on a very challenging case: the full-surface measurement of the severe deformation (> 150% strain) suffered by an aluminum sheet blank subjected to a pneumatic bulge test.

  1. Mapping tropical rainforest canopies using multi-temporal spaceborne imaging spectroscopy

    NASA Astrophysics Data System (ADS)

    Somers, Ben; Asner, Gregory P.

    2013-10-01

    The use of imaging spectroscopy for florisic mapping of forests is complicated by the spectral similarity among coexisting species. Here we evaluated an alternative spectral unmixing strategy combining a time series of EO-1 Hyperion images and an automated feature selection strategy in MESMA. Instead of using the same spectral subset to unmix each image pixel, our modified approach allowed the spectral subsets to vary on a per pixel basis such that each pixel is evaluated using a spectral subset tuned towards maximal separability of its specific endmember class combination or species mixture. The potential of the new approach for floristic mapping of tree species in Hawaiian rainforests was quantitatively demonstrated using both simulated and actual hyperspectral image time-series. With a Cohen's Kappa coefficient of 0.65, our approach provided a more accurate tree species map compared to MESMA (Kappa = 0.54). In addition, by the selection of spectral subsets our approach was about 90% faster than MESMA. The flexible or adaptive use of band sets in spectral unmixing as such provides an interesting avenue to address spectral similarities in complex vegetation canopies.

  2. Transcriptional and functional profiling defines human small intestinal macrophage subsets.

    PubMed

    Bujko, Anna; Atlasy, Nader; Landsverk, Ole J B; Richter, Lisa; Yaqub, Sheraz; Horneland, Rune; Øyen, Ole; Aandahl, Einar Martin; Aabakken, Lars; Stunnenberg, Hendrik G; Bækkevold, Espen S; Jahnsen, Frode L

    2018-02-05

    Macrophages (Mfs) are instrumental in maintaining immune homeostasis in the intestine, yet studies on the origin and heterogeneity of human intestinal Mfs are scarce. Here, we identified four distinct Mf subpopulations in human small intestine (SI). Assessment of their turnover in duodenal transplants revealed that all Mf subsets were completely replaced over time; Mf1 and Mf2, phenotypically similar to peripheral blood monocytes (PBMos), were largely replaced within 3 wk, whereas two subsets with features of mature Mfs, Mf3 and Mf4, exhibited significantly slower replacement. Mf3 and Mf4 localized differently in SI; Mf3 formed a dense network in mucosal lamina propria, whereas Mf4 was enriched in submucosa. Transcriptional analysis showed that all Mf subsets were markedly distinct from PBMos and dendritic cells. Compared with PBMos, Mf subpopulations showed reduced responsiveness to proinflammatory stimuli but were proficient at endocytosis of particulate and soluble material. These data provide a comprehensive analysis of human SI Mf population and suggest a precursor-progeny relationship with PBMos. © 2018 Bujko et al.

  3. Automatic Correction Algorithm of Hyfrology Feature Attribute in National Geographic Census

    NASA Astrophysics Data System (ADS)

    Li, C.; Guo, P.; Liu, X.

    2017-09-01

    A subset of the attributes of hydrologic features data in national geographic census are not clear, the current solution to this problem was through manual filling which is inefficient and liable to mistakes. So this paper proposes an automatic correction algorithm of hydrologic features attribute. Based on the analysis of the structure characteristics and topological relation, we put forward three basic principles of correction which include network proximity, structure robustness and topology ductility. Based on the WJ-III map workstation, we realize the automatic correction of hydrologic features. Finally, practical data is used to validate the method. The results show that our method is highly reasonable and efficient.

  4. Evidence for mesenchymal-like sub-populations within squamous cell carcinomas possessing chemoresistance and phenotypic plasticity.

    PubMed

    Basu, D; Nguyen, T-T K; Montone, K T; Zhang, G; Wang, L-P; Diehl, J A; Rustgi, A K; Lee, J T; Weinstein, G S; Herlyn, M

    2010-07-22

    Variable drug responses among malignant cells within individual tumors may represent a barrier to their eradication using chemotherapy. Carcinoma cells expressing mesenchymal markers resist conventional and epidermal growth factor receptor (EGFR)-targeted chemotherapy. In this study, we evaluated whether mesenchymal-like sub-populations within human squamous cell carcinomas (SCCs) with predominantly epithelial features contribute to overall therapy resistance. We identified a mesenchymal-like subset expressing low E-cadherin (Ecad-lo) and high vimentin within the upper aerodigestive tract SCCs. This subset was both isolated from the cell lines and was identified in xenografts and primary clinical specimens. The Ecad-lo subset contained more low-turnover cells, correlating with resistance to the conventional chemotherapeutic paclitaxel in vitro. Epidermal growth factor induced less stimulation of the mitogen-activated protein kinase and phosphatidylinositol-3-kinase pathways in Ecad-lo cells, which was likely due to lower EGFR expression in this subset and correlated with in vivo resistance to the EGFR-targeted antibody, cetuximab. The Ecad-lo and high E-cadherin subsets were dynamic in phenotype, showing the capacity to repopulate each other from single-cell clones. Taken together, these results provide evidence for a low-turnover, mesenchymal-like sub-population in SCCs with diminished EGFR pathway function and intrinsic resistance to conventional and EGFR-targeted chemotherapies.

  5. South Australian Scleroderma Register: analysis of deceased patients.

    PubMed

    Bond, C; Pile, K D; McNeil, J D; Ahern, M J; Smith, M D; Cleland, L G; Roberts-Thomson, P J

    1998-11-01

    The demographic, clinical, pathological and serological features of 123 decreased patients with systemic sclerosis have been analysed. These patients consisted of all identified patients dying with this disease in South Australia between 1983 and 1996 inclusive. There were 85 females and 38 males, with the ratio of limited:diffuse:overlap disease subset being 9:5:1. Disease characteristics revealed that patients with the limited disease tended to be female with high frequencies of the centromere autoantibody, while patients with the diffuse disease had equal gender representation with the frequent presence of nucleolar, speckled or homogeneous antinuclear antibody. Mean duration of disease and mean age of death for the limited:diffuse:overlap subsets differed significantly between groups (p < 0.05) and were 16.5, 9.3 and 10.9 years and 71.9, 57.8 and 52.8 years respectively. Cumulative survival curves for the subsets differed highly significantly, with patients with the limited diseases dying more commonly from right heart failure (documented terminally in 25% of the centromere positive limited subset), cardiovascular disease or cancer, while patients with the diffuse subset died from respiratory failure, renal failure or cardiovascular disease. In conclusion, this retrospective analysis has revealed that scleroderma is a relatively common but clinically heterogeneous disorder. There are important clinical and prognostic implications in defining limited versus diffuse versus overlap disease.

  6. Parallel object-oriented decision tree system

    DOEpatents

    Kamath,; Chandrika, Cantu-Paz [Dublin, CA; Erick, [Oakland, CA

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  7. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

    NASA Astrophysics Data System (ADS)

    Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

    2016-10-01

    Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.

  8. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.

  9. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

    PubMed Central

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports. PMID:28166263

  10. Towards improving searches for optimal phylogenies.

    PubMed

    Ford, Eric; St John, Katherine; Wheeler, Ward C

    2015-01-01

    Finding the optimal evolutionary history for a set of taxa is a challenging computational problem, even when restricting possible solutions to be "tree-like" and focusing on the maximum-parsimony optimality criterion. This has led to much work on using heuristic tree searches to find approximate solutions. We present an approach for finding exact optimal solutions that employs and complements the current heuristic methods for finding optimal trees. Given a set of taxa and a set of aligned sequences of characters, there may be subsets of characters that are compatible, and for each such subset there is an associated (possibly partially resolved) phylogeny with edges corresponding to each character state change. These perfect phylogenies serve as anchor trees for our constrained search space. We show that, for sequences with compatible sites, the parsimony score of any tree [Formula: see text] is at least the parsimony score of the anchor trees plus the number of inferred changes between [Formula: see text] and the anchor trees. As the maximum-parsimony optimality score is additive, the sum of the lower bounds on compatible character partitions provides a lower bound on the complete alignment of characters. This yields a region in the space of trees within which the best tree is guaranteed to be found; limiting the search for the optimal tree to this region can significantly reduce the number of trees that must be examined in a search of the space of trees. We analyze this method empirically using four different biological data sets as well as surveying 400 data sets from the TreeBASE repository, demonstrating the effectiveness of our technique in reducing the number of steps in exact heuristic searches for trees under the maximum-parsimony optimality criterion. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. Efficiency Improvements to the Displacement Based Multilevel Structural Optimization Algorithm

    NASA Technical Reports Server (NTRS)

    Plunkett, C. L.; Striz, A. G.; Sobieszczanski-Sobieski, J.

    2001-01-01

    Multilevel Structural Optimization (MSO) continues to be an area of research interest in engineering optimization. In the present project, the weight optimization of beams and trusses using Displacement based Multilevel Structural Optimization (DMSO), a member of the MSO set of methodologies, is investigated. In the DMSO approach, the optimization task is subdivided into a single system and multiple subsystems level optimizations. The system level optimization minimizes the load unbalance resulting from the use of displacement functions to approximate the structural displacements. The function coefficients are then the design variables. Alternately, the system level optimization can be solved using the displacements themselves as design variables, as was shown in previous research. Both approaches ensure that the calculated loads match the applied loads. In the subsystems level, the weight of the structure is minimized using the element dimensions as design variables. The approach is expected to be very efficient for large structures, since parallel computing can be utilized in the different levels of the problem. In this paper, the method is applied to a one-dimensional beam and a large three-dimensional truss. The beam was tested to study possible simplifications to the system level optimization. In previous research, polynomials were used to approximate the global nodal displacements. The number of coefficients of the polynomials equally matched the number of degrees of freedom of the problem. Here it was desired to see if it is possible to only match a subset of the degrees of freedom in the system level. This would lead to a simplification of the system level, with a resulting increase in overall efficiency. However, the methods tested for this type of system level simplification did not yield positive results. The large truss was utilized to test further improvements in the efficiency of DMSO. In previous work, parallel processing was applied to the subsystems level, where the derivative verification feature of the optimizer NPSOL had been utilized in the optimizations. This resulted in large runtimes. In this paper, the optimizations were repeated without using the derivative verification, and the results are compared to those from the previous work. Also, the optimizations were run on both, a network of SUN workstations using the MPICH implementation of the Message Passing Interface (MPI) and on the faster Beowulf cluster at ICASE, NASA Langley Research Center, using the LAM implementation of UP]. The results on both systems were consistent and showed that it is not necessary to verify the derivatives and that this gives a large increase in efficiency of the DMSO algorithm.

  12. Comparison of naïve Bayes and logistic regression for computer-aided diagnosis of breast masses using ultrasound imaging

    NASA Astrophysics Data System (ADS)

    Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.

    2012-03-01

    This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.

  13. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Label-free haemogram using wavelength modulated Raman spectroscopy for identifying immune-cell subset

    NASA Astrophysics Data System (ADS)

    Ashok, Praveen C.; Praveen, Bavishna B.; Campbell, Elaine C.; Dholakia, Kishan; Powis, Simon J.

    2014-03-01

    Leucocytes in the blood of mammals form a powerful protective system against a wide range of dangerous pathogens. There are several types of immune cells that has specific role in the whole immune system. The number and type of immune cells alter in the disease state and identifying the type of immune cell provides information about a person's state of health. There are several immune cell subsets that are essentially morphologically identical and require external labeling to enable discrimination. Here we demonstrate the feasibility of using Wavelength Modulated Raman Spectroscopy (WMRS) with suitable machine learning algorithms as a label-free method to distinguish between different closely lying immune cell subset. Principal Component Analysis (PCA) was performed on WMRS data from single cells, obtained using confocal Raman microscopy for feature reduction, followed by Support Vector Machine (SVM) for binary discrimination of various cell subset, which yielded an accuracy >85%. The method was successful in discriminating between untouched and unfixed purified populations of CD4+CD3+ and CD8+CD3+ T lymphocyte subsets, and CD56+CD3- natural killer cells with a high degree of specificity. It was also proved sensitive enough to identify unique Raman signatures that allow clear discrimination between dendritic cell subsets, comprising CD303+CD45+ plasmacytoid and CD1c+CD141+ myeloid dendritic cells. The results of this study clearly show that WMRS is highly sensitive and can distinguish between cell types that are morphologically identical.

  15. T Helper 17 Cells Interplay with CD4+CD25highFoxp3+ Tregs in Regulation of Inflammations and Autoimmune Diseases

    PubMed Central

    Mai, Jietang; Wang, Hong; Yang#, Xiao-Feng

    2010-01-01

    Interleukin-17 (IL-17)-secreting T helper 17 cells (Th17) are a recently identified CD4+ T helper subset that has been implicated in various inflammatory and autoimmune diseases. Th17, along with CD4+CD25high Foxp3+ regulatory T cells (Tregs) and other newly emergent T helper subsets, Th9 and Tfh, have expanded the Th1-Th2 paradigm. Although this newly proposed six-subset paradigm significantly improved our understanding on the differentiation of CD4+ T helper cell subsets and the regulation of T helper cells in inflammation and autoimmunity, many questions remain to be answered. In this overview, we will briefly review the following issues: a) Old Th1-Th2 paradigm versus new multi-subset paradigm; b) Structural features of IL-17 family cytokines; c) Th17 cells; d) Effects of IL-17 on various cell types and tissues; e) IL-17 receptor and signaling pathways; f) Th17-mediated inflammations; and g) Protective mechanisms of IL-17 in infections. Lastly, we will look into the interaction of Th17 and Treg in autoimmune diseases and inflammation: Th17 cells interplay with Tregs. Regulation of autoimmunity and inflammation lies in the interplays of the different T helper subsets, therefore, better understanding of these subsets’ interactions with one another would greatly improve our approaches in developing therapy to combat inflammatory and autoimmune diseases. PMID:20515737

  16. EPA Facility Registry Service (FRS): ICIS

    EPA Pesticide Factsheets

    This web feature service contains location and facility identification information from EPA's Facility Registry Service (FRS) for the subset of facilities that link to the Integrated Compliance Information System (ICIS). When complete, ICIS will provide a database that will contain integrated enforcement and compliance information across most of EPA's programs. The vision for ICIS is to replace EPA's independent databases that contain enforcement data with a single repository for that information. Currently, ICIS contains all Federal Administrative and Judicial enforcement actions and a subset of the Permit Compliance System (PCS), which supports the National Pollutant Discharge Elimination System (NPDES). ICIS exchanges non-sensitive enforcement/compliance activities, non-sensitive formal enforcement actions and NPDES information with FRS. This web feature service contains the enforcement/compliance activities and formal enforcement action related facilities; the NPDES facilities are contained in the PCS_NPDES web feature service. FRS identifies and geospatially locates facilities, sites or places subject to environmental regulations or of environmental interest. Using vigorous verification and data management procedures, FRS integrates facility data from EPA's national program systems, other federal agencies, and State and tribal master facility records and provides EPA with a centrally managed, single source of comprehensive and authoritative information on f

  17. Multimodal predictor of neurodevelopmental outcome in newborns with hypoxic-ischaemic encephalopathy.

    PubMed

    Temko, Andriy; Doyle, Orla; Murray, Deirdre; Lightbody, Gordon; Boylan, Geraldine; Marnane, William

    2015-08-01

    Automated multimodal prediction of outcome in newborns with hypoxic-ischaemic encephalopathy is investigated in this work. Routine clinical measures and 1h EEG and ECG recordings 24h after birth were obtained from 38 newborns with different grades of HIE. Each newborn was reassessed at 24 months to establish their neurodevelopmental outcome. A set of multimodal features is extracted from the clinical, heart rate and EEG measures and is fed into a support vector machine classifier. The performance is reported with the statistically most unbiased leave-one-patient-out performance assessment routine. A subset of informative features, whose rankings are consistent across all patients, is identified. The best performance is obtained using a subset of 9 EEG, 2h and 1 clinical feature, leading to an area under the ROC curve of 87% and accuracy of 84% which compares favourably to the EEG-based clinical outcome prediction, previously reported on the same data. The work presents a promising step towards the use of multimodal data in building an objective decision support tool for clinical prediction of neurodevelopmental outcome in newborns with hypoxic-ischaemic encephalopathy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. [Pathological features of myositis with myositis -specific autoantibodies].

    PubMed

    Shimizu, Jun; Mimori, Tsuneyo

    2014-01-01

    Myositis is a heterogeneous group of systemic autoimmune disorders characterized by inflammation of skeletal muscle. Historically, myositis has been defined using clinical features including muscle weakness, skin disease, internal organ involvement, and an association with cancer in adults. From a clinicopathologic approach, myositis has been classified into pathologically distinct subsets, polymyositis, dermatomyositis(DM), necrotizing autoimmune myositis, amyopathic DM, and non-specific myositis. Although the characteristic pathological changes are believed to be important in pathological mechanisms of each subset of myositis, in clinical practices, the percentages of the patients with typical pathological findings are usually not high. On the other hand, with the recent discovery of new myositis-specific autoantibodies (MSAs), it has been revealed that around 60% of patients with IIMs have been shown to have a anti-myositis-specific autoantibody, including anti-synthetase, anti-Mi-2, anti-MDA5, anti-TIF1 and anti-SRP antibodies. Because of striking association between unique MSAs and distinct clinical phenotypes, these antibodies are thought to be important not only for classifications of IIMs, but also as factors involved in the mechanism underlying their pathogenesis. This review reports recent progress in understanding of pathological features of myositis with MSAs.

  19. Developing an A Priori Database for Passive Microwave Snow Water Retrievals Over Ocean

    NASA Astrophysics Data System (ADS)

    Yin, Mengtao; Liu, Guosheng

    2017-12-01

    A physically optimized a priori database is developed for Global Precipitation Measurement Microwave Imager (GMI) snow water retrievals over ocean. The initial snow water content profiles are derived from CloudSat Cloud Profiling Radar (CPR) measurements. A radiative transfer model in which the single-scattering properties of nonspherical snowflakes are based on the discrete dipole approximate results is employed to simulate brightness temperatures and their gradients. Snow water content profiles are then optimized through a one-dimensional variational (1D-Var) method. The standard deviations of the difference between observed and simulated brightness temperatures are in a similar magnitude to the observation errors defined for observation error covariance matrix after the 1D-Var optimization, indicating that this variational method is successful. This optimized database is applied in a Bayesian retrieval snow water algorithm. The retrieval results indicated that the 1D-Var approach has a positive impact on the GMI retrieved snow water content profiles by improving the physical consistency between snow water content profiles and observed brightness temperatures. Global distribution of snow water contents retrieved from the a priori database is compared with CloudSat CPR estimates. Results showed that the two estimates have a similar pattern of global distribution, and the difference of their global means is small. In addition, we investigate the impact of using physical parameters to subset the database on snow water retrievals. It is shown that using total precipitable water to subset the database with 1D-Var optimization is beneficial for snow water retrievals.

  20. Uncertainty Analysis via Failure Domain Characterization: Polynomial Requirement Functions

    NASA Technical Reports Server (NTRS)

    Crespo, Luis G.; Munoz, Cesar A.; Narkawicz, Anthony J.; Kenny, Sean P.; Giesy, Daniel P.

    2011-01-01

    This paper proposes an uncertainty analysis framework based on the characterization of the uncertain parameter space. This characterization enables the identification of worst-case uncertainty combinations and the approximation of the failure and safe domains with a high level of accuracy. Because these approximations are comprised of subsets of readily computable probability, they enable the calculation of arbitrarily tight upper and lower bounds to the failure probability. A Bernstein expansion approach is used to size hyper-rectangular subsets while a sum of squares programming approach is used to size quasi-ellipsoidal subsets. These methods are applicable to requirement functions whose functional dependency on the uncertainty is a known polynomial. Some of the most prominent features of the methodology are the substantial desensitization of the calculations from the uncertainty model assumed (i.e., the probability distribution describing the uncertainty) as well as the accommodation for changes in such a model with a practically insignificant amount of computational effort.

  1. Satellite Level 3 & 4 Data Subsetting at NASA GES DISC

    NASA Technical Reports Server (NTRS)

    Huwe, Paul; Su, Jian; Loeser, Carlee; Ostrenga, Dana; Rui, Hualan; Vollmer, Bruce

    2017-01-01

    Earth Science data are available in many file formats (NetCDF, HDF, GRB, etc.) and in a wide range of sizes, from kilobytes to gigabytes. These properties have become a challenge to users if they are not familiar with these formats or only want a small region of interest (ROI) from a specific dataset. At NASA Goddard Earth Sciences Data and Information Services Center (GES DISC), we have developed and implemented a multipurpose subset service to ease user access to Earth Science data. Our Level 3 & 4 Regridder is capable of subsetting across multiple parameters (spatially, temporally, by level, and by variable) as well as having additional beneficial features (temporal means, regridding to target grids, and file conversion to other data formats). In this presentation, we will demonstrate how users can use this service to better access only the data they need in the form they require.

  2. Satellite Level 3 & 4 Data Subsetting at NASA GES DISC

    NASA Astrophysics Data System (ADS)

    Huwe, P.; Su, J.; Loeser, C. F.; Ostrenga, D.; Rui, H.; Vollmer, B.

    2017-12-01

    Earth Science data are available in many file formats (NetCDF, HDF, GRB, etc.) and in a wide range of sizes, from kilobytes to gigabytes. These properties have become a challenge to users if they are not familiar with these formats or only want a small region of interest (ROI) from a specific dataset. At NASA Goddard Earth Sciences Data and Information Services Center (GES DISC), we have developed and implemented a multipurpose subset service to ease user access to Earth Science data. Our Level 3 & 4 Regridder is capable of subsetting across multiple parameters (spatially, temporally, by level, and by variable) as well as having additional beneficial features (temporal means, regridding to target grids, and file conversion to other data formats). In this presentation, we will demonstrate how users can use this service to better access only the data they need in the form they require.

  3. How Much Do We Know about Adult-onset Primary Tics? Prevalence, Epidemiology, and Clinical Features.

    PubMed

    Robakis, Daphne

    2017-01-01

    Tic disorders are generally considered to be of pediatric onset; however, reports of adult-onset tics exist in the literature. Tics can be categorized as either primary or secondary, with the latter being the larger group in adults. Primary or idiopathic tics that arise in adulthood make up a subset of tic disorders whose epidemiologic and clinical features have not been well delineated. Articles to be included in this review were identified by searching PubMed, SCOPUS, and Web of Science using the terms adult- and late-onset tics, which resulted in 120 unique articles. Duplicates were removed. Citing references were identified using Google Scholar; all references were reviewed for relevance. The epidemiologic characteristics, clinical phenomenology, and optimal treatment of adult-onset tics have not been ascertained. Twenty-six patients with adult-onset, primary tics were identified from prior case reports. The frequency of psychiatric comorbidities may be lower in adults than in children, and obsessive compulsive disorder was the most common comorbidity. Adult-onset primary tics tend to wax and wane, occur predominantly in males, are often both motor and phonic in the same individual, and are characterized by a poor response to treatment. We know little about adult-onset tic disorders, particularly ones without a secondary association or cause. They are not common, and from the limited data available, appear to share some but not all features with childhood tics. Further research will be important in gaining a better understanding of the epidemiology and clinical manifestations of this disorder.

  4. How Much Do We Know about Adult-onset Primary Tics? Prevalence, Epidemiology, and Clinical Features

    PubMed Central

    Robakis, Daphne

    2017-01-01

    Background Tic disorders are generally considered to be of pediatric onset; however, reports of adult-onset tics exist in the literature. Tics can be categorized as either primary or secondary, with the latter being the larger group in adults. Primary or idiopathic tics that arise in adulthood make up a subset of tic disorders whose epidemiologic and clinical features have not been well delineated. Methods Articles to be included in this review were identified by searching PubMed, SCOPUS, and Web of Science using the terms adult- and late-onset tics, which resulted in 120 unique articles. Duplicates were removed. Citing references were identified using Google Scholar; all references were reviewed for relevance. Results The epidemiologic characteristics, clinical phenomenology, and optimal treatment of adult-onset tics have not been ascertained. Twenty-six patients with adult-onset, primary tics were identified from prior case reports. The frequency of psychiatric comorbidities may be lower in adults than in children, and obsessive compulsive disorder was the most common comorbidity. Adult-onset primary tics tend to wax and wane, occur predominantly in males, are often both motor and phonic in the same individual, and are characterized by a poor response to treatment. Discussion We know little about adult-onset tic disorders, particularly ones without a secondary association or cause. They are not common, and from the limited data available, appear to share some but not all features with childhood tics. Further research will be important in gaining a better understanding of the epidemiology and clinical manifestations of this disorder. PMID:28546883

  5. Feature selection for elderly faller classification based on wearable sensors.

    PubMed

    Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D

    2017-05-30

    Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.

  6. EFS: an ensemble feature selection tool implemented as R-package and web-application.

    PubMed

    Neumann, Ursula; Genze, Nikita; Heider, Dominik

    2017-01-01

    Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.

  7. RNA2D3D: a program for generating, viewing, and comparing 3-dimensional models of RNA.

    PubMed

    Martinez, Hugo M; Maizel, Jacob V; Shapiro, Bruce A

    2008-06-01

    Using primary and secondary structure information of an RNA molecule, the program RNA2D3D automatically and rapidly produces a first-order approximation of a 3-dimensional conformation consistent with this information. Applicable to structures of arbitrary branching complexity and pseudoknot content, it features efficient interactive graphical editing for the removal of any overlaps introduced by the initial generating procedure and for making conformational changes favorable to targeted features and subsequent refinement. With emphasis on fast exploration of alternative 3D conformations, one may interactively add or delete base-pairs, adjacent stems can be coaxially stacked or unstacked, single strands can be shaped to accommodate special constraints, and arbitrary subsets can be defined and manipulated as rigid bodies. Compaction, whereby base stacking within stems is optimally extended into connecting single strands, is also available as a means of strategically making the structures more compact and revealing folding motifs. Subsequent refinement of the first-order approximation, of modifications, and for the imposing of tertiary constraints is assisted with standard energy refinement techniques. Previously determined coordinates for any part of the molecule are readily incorporated, and any part of the modeled structure can be output as a PDB or XYZ file. Illustrative applications in the areas of ribozymes, viral kissing loops, viral internal ribosome entry sites, and nanobiology are presented.

  8. Application of recurrence quantification analysis to automatically estimate infant sleep states using a single channel of respiratory data.

    PubMed

    Terrill, Philip I; Wilson, Stephen J; Suresh, Sadasivam; Cooper, David M; Dakin, Carolyn

    2012-08-01

    Previous work has identified that non-linear variables calculated from respiratory data vary between sleep states, and that variables derived from the non-linear analytical tool recurrence quantification analysis (RQA) are accurate infant sleep state discriminators. This study aims to apply these discriminators to automatically classify 30 s epochs of infant sleep as REM, non-REM and wake. Polysomnograms were obtained from 25 healthy infants at 2 weeks, 3, 6 and 12 months of age, and manually sleep staged as wake, REM and non-REM. Inter-breath interval data were extracted from the respiratory inductive plethysmograph, and RQA applied to calculate radius, determinism and laminarity. Time-series statistic and spectral analysis variables were also calculated. A nested cross-validation method was used to identify the optimal feature subset, and to train and evaluate a linear discriminant analysis-based classifier. The RQA features radius and laminarity and were reliably selected. Mean agreement was 79.7, 84.9, 84.0 and 79.2 % at 2 weeks, 3, 6 and 12 months, and the classifier performed better than a comparison classifier not including RQA variables. The performance of this sleep-staging tool compares favourably with inter-human agreement rates, and improves upon previous systems using only respiratory data. Applications include diagnostic screening and population-based sleep research.

  9. Optimizing tertiary storage organization and access for spatio-temporal datasets

    NASA Technical Reports Server (NTRS)

    Chen, Ling Tony; Rotem, Doron; Shoshani, Arie; Drach, Bob; Louis, Steve; Keating, Meridith

    1994-01-01

    We address in this paper data management techniques for efficiently retrieving requested subsets of large datasets stored on mass storage devices. This problem represents a major bottleneck that can negate the benefits of fast networks, because the time to access a subset from a large dataset stored on a mass storage system is much greater that the time to transmit that subset over a network. This paper focuses on very large spatial and temporal datasets generated by simulation programs in the area of climate modeling, but the techniques developed can be applied to other applications that deal with large multidimensional datasets. The main requirement we have addressed is the efficient access of subsets of information contained within much larger datasets, for the purpose of analysis and interactive visualization. We have developed data partitioning techniques that partition datasets into 'clusters' based on analysis of data access patterns and storage device characteristics. The goal is to minimize the number of clusters read from mass storage systems when subsets are requested. We emphasize in this paper proposed enhancements to current storage server protocols to permit control over physical placement of data on storage devices. We also discuss in some detail the aspects of the interface between the application programs and the mass storage system, as well as a workbench to help scientists to design the best reorganization of a dataset for anticipated access patterns.

  10. Optimizing the use of the thermal integrity system for evaluating auger-cast piles : final report.

    DOT National Transportation Integrated Search

    2016-07-01

    Auger-cast piles (sometimes called auger-cast-in-place, or ACIP, piles) are a subset of the larger : category of deep foundation elements known as bored piles. Although similar to drilled shafts at first : glance, ACIP piles differ in the constructio...

  11. A probabilistic Sperner's theorem, with applications to the problem of retrieving information from a data base

    NASA Technical Reports Server (NTRS)

    Baumert, L. D.; Mceliece, R. J.; Rodemich, E. R.; Rumsey, H., Jr.

    1978-01-01

    The design of an optimal merged keycode data base information retrieval system is detailed. A probability distribution of n-bit binary words that minimized false drops was developed for the case where the set of desired records was a subset of tagged records.

  12. Spatial Context Learning Survives Interference from Working Memory Load

    ERIC Educational Resources Information Center

    Vickery, Timothy J.; Sussman, Rachel S.; Jiang, Yuhong V.

    2010-01-01

    The human visual system is constantly confronted with an overwhelming amount of information, only a subset of which can be processed in complete detail. Attention and implicit learning are two important mechanisms that optimize vision. This study addressed the relationship between these two mechanisms. Specifically we asked, Is implicit learning…

  13. Selective principal component regression analysis of fluorescence hyperspectral image to assess aflatoxin contamination in corn

    USDA-ARS?s Scientific Manuscript database

    Selective principal component regression analysis (SPCR) uses a subset of the original image bands for principal component transformation and regression. For optimal band selection before the transformation, this paper used genetic algorithms (GA). In this case, the GA process used the regression co...

  14. CD21 -/low B cells: A Snapshot of a Unique B Cell Subset in Health and Disease.

    PubMed

    Thorarinsdottir, K; Camponeschi, A; Gjertsson, I; Mårtensson, I-L

    2015-09-01

    B cells represent one of the cellular components of the immune system that protects the individual from invading pathogens. In response to the invader, these cells differentiate into plasma cells and produce large amounts of antibodies that bind to and eliminate the pathogen. A hallmark of autoimmune diseases is the production of autoantibodies i.e. antibodies that recognize self. Those that are considered pathogenic can damage tissues and organs, either by direct binding or when deposited as immune complexes. For decades, B cells have been considered to play a major role in autoimmune diseases by antibody production. However, as pathogenic autoantibodies appear to derive mainly from T cell dependent responses, T cells have been the focus for many years. The successful treatment of patients with autoimmune diseases with either B cell depletion therapy (rituximab) or inhibition of B cell survival (belimumab), suggested that not only the autoantibodies but also other B cell features are important. This has caused a surge of interest in B cells and their biology resulting in the identification of various subsets e.g. regulatory B cells, several memory B cell subsets etc. Also, in other conditions such as chronic viral infections and primary immunodeficiency, several B cell subsets with unique characteristics have been identified. In this review, we will discuss one of these subsets, a subset that is expanded in conditions characterized by chronic immune stimulation. This B cell subset lacks, or expresses low, surface levels of the complement receptor 2 (CD21) and has therefore been termed CD21(-/low) B cells. © 2015 The Foundation for the Scandinavian Journal of Immunology.

  15. Identification of novel gammadelta T-cell subsets following bacterial infection in the absence of Vgamma1+ T cells: homeostatic control of gammadelta T-cell responses to pathogen infection by Vgamma1+ T cells.

    PubMed

    Newton, Darren J; Andrew, Elizabeth M; Dalton, Jane E; Mears, Rainy; Carding, Simon R

    2006-02-01

    Although gammadelta T cells are a common feature of many pathogen-induced immune responses, the factors that influence, promote, or regulate the response of individual gammadelta T-cell subsets to infection is unknown. Here we show that in the absence of Vgamma1+ T cells, novel subsets of gammadelta T cells, expressing T-cell receptor (TCR)-Vgamma chains that normally define TCRgammadelta+ dendritic epidermal T cells (DETCs) (Vgamma5+), intestinal intraepithelial lymphocytes (iIELs) (Vgamma7+), and lymphocytes associated with the vaginal epithelia (Vgamma6+), are recruited to the spleen in response to bacterial infection in TCR-Vgamma1-/- mice. By comparison of phenotype and structure of TCR-Vgamma chains and/or -Vdelta chains expressed by these novel subsets with those of their epithelium-associated counterparts, the Vgamma6+ T cells elicited in infected Vgamma1-/- mice were shown to be identical to those found in the reproductive tract, from where they are presumably recruited in the absence of Vgamma1+ T cells. By contrast, Vgamma5+ and Vgamma7+ T cells found in infected Vgamma1-/- mice were distinct from Vgamma5+ DETCs and Vgamma7+ iIELs. Functional analyses of the novel gammadelta T-cell subsets identified for infected Vgamma1-/- mice showed that whereas the Vgamma5+ and Vgamma7+ subsets may compensate for the absence of Vgamma1+ T cells by producing similar cytokines, they do not possess cytocidal activity and they cannot replace the macrophage homeostasis function of Vgamma1+ T cells. Collectively, these findings identify novel subsets of gammadelta T cells, the recruitment and activity of which is under the control of Vgamma1+ T cells.

  16. Systems Level Analysis of Systemic Sclerosis Shows a Network of Immune and Profibrotic Pathways Connected with Genetic Polymorphisms

    PubMed Central

    Mahoney, J. Matthew; Taroni, Jaclyn; Martyanov, Viktor; Wood, Tammara A.; Greene, Casey S.; Pioli, Patricia A.; Hinchcliff, Monique E.; Whitfield, Michael L.

    2015-01-01

    Systemic sclerosis (SSc) is a rare systemic autoimmune disease characterized by skin and organ fibrosis. The pathogenesis of SSc and its progression are poorly understood. The SSc intrinsic gene expression subsets (inflammatory, fibroproliferative, normal-like, and limited) are observed in multiple clinical cohorts of patients with SSc. Analysis of longitudinal skin biopsies suggests that a patient's subset assignment is stable over 6–12 months. Genetically, SSc is multi-factorial with many genetic risk loci for SSc generally and for specific clinical manifestations. Here we identify the genes consistently associated with the intrinsic subsets across three independent cohorts, show the relationship between these genes using a gene-gene interaction network, and place the genetic risk loci in the context of the intrinsic subsets. To identify gene expression modules common to three independent datasets from three different clinical centers, we developed a consensus clustering procedure based on mutual information of partitions, an information theory concept, and performed a meta-analysis of these genome-wide gene expression datasets. We created a gene-gene interaction network of the conserved molecular features across the intrinsic subsets and analyzed their connections with SSc-associated genetic polymorphisms. The network is composed of distinct, but interconnected, components related to interferon activation, M2 macrophages, adaptive immunity, extracellular matrix remodeling, and cell proliferation. The network shows extensive connections between the inflammatory- and fibroproliferative-specific genes. The network also shows connections between these subset-specific genes and 30 SSc-associated polymorphic genes including STAT4, BLK, IRF7, NOTCH4, PLAUR, CSK, IRAK1, and several human leukocyte antigen (HLA) genes. Our analyses suggest that the gene expression changes underlying the SSc subsets may be long-lived, but mechanistically interconnected and related to a patients underlying genetic risk. PMID:25569146

  17. Mucosal Barrier Depletion and Loss of Bacterial Diversity are Primary Abnormalities in Paediatric Ulcerative Colitis

    PubMed Central

    Alipour, Misagh; Zaidi, Deenaz; Valcheva, Rosica; Jovel, Juan; Martínez, Inés; Sergi, Consolato; Walter, Jens; Mason, Andrew L.; Wong, Gane Ka-Shu; Dieleman, Levinus A.; Carroll, Matthew W.; Huynh, Hien Q.

    2016-01-01

    Background and Aims: Ulcerative colitis [UC] is associated with colonic mucosa barrier defects and bacterial dysbiosis, but these features may simply be the result of inflammation. Therefore, we sought to assess whether these features are inherently abrogated in the terminal ileum [TI] of UC patients, where inflammation is absent. Methods: TI biopsies from paediatric inflammatory bowel disease [IBD] subsets [Crohn’s disease [CD; n = 13] and UC [n = 10

  18. Wrappers for Performance Enhancement and Oblivious Decision Graphs

    DTIC Science & Technology

    1995-09-01

    always select all relevant features. We test di erent search engines to search the space of feature subsets and introduce compound operators to speed...distinct instances from the original dataset appearing in the test set is thus 0:632m. The 0i accuracy estimate is derived by using bootstrap sample...i for training and the rest of the instances for testing . Given a number b, the number of bootstrap samples, let 0i be the accuracy estimate for

  19. A diagnosis model for early Tourette syndrome children based on brain structural network characteristics

    NASA Astrophysics Data System (ADS)

    Wen, Hongwei; Liu, Yue; Wang, Jieqiong; Zhang, Jishui; Peng, Yun; He, Huiguang

    2016-03-01

    Tourette syndrome (TS) is a childhood-onset neurobehavioral disorder characterized by the presence of multiple motor and vocal tics. Tic generation has been linked to disturbed networks of brain areas involved in planning, controlling and execution of action. The aim of our work is to select topological characteristics of structural network which were most efficient for estimating the classification models to identify early TS children. Here we employed the diffusion tensor imaging (DTI) and deterministic tractography to construct the structural networks of 44 TS children and 48 age and gender matched healthy children. We calculated four different connection matrices (fiber number, mean FA, averaged fiber length weighted and binary matrices) and then applied graph theoretical methods to extract the regional nodal characteristics of structural network. For each weighted or binary network, nodal degree, nodal efficiency and nodal betweenness were selected as features. Support Vector Machine Recursive Feature Extraction (SVM-RFE) algorithm was used to estimate the best feature subset for classification. The accuracy of 88.26% evaluated by a nested cross validation was achieved on combing best feature subset of each network characteristic. The identified discriminative brain nodes mostly located in the basal ganglia and frontal cortico-cortical networks involved in TS children which was associated with tic severity. Our study holds promise for early identification and predicting prognosis of TS children.

  20. Histological features of localized scleroderma 'en coup de sabre': a study of 16 cases.

    PubMed

    Taniguchi, T; Asano, Y; Tamaki, Z; Akamata, K; Aozasa, N; Noda, S; Takahashi, T; Ichimura, Y; Toyama, T; Sugita, M; Sumida, H; Kuwano, Y; Miyazaki, M; Yanaba, K; Sato, S

    2014-12-01

    Early lesions of localized scleroderma are histologically characterized by perivascular lymphocytic infiltrate in the reticular dermis and swollen endothelial cells. However, there have been few information regarding histological features other than these findings in localized scleroderma. Since en coup de sabre (ECDS) is a certain subset of localized scleroderma with a relatively uniform clinical manifestation, we focused on this disease subset and evaluated its histopathological features. A total of 16 patients with ECDS were retrospectively evaluated on the basis of clinical and histological findings. Regardless of clinical manifestations, vacuolar degeneration was found in all of the ECDS patients. Importantly, keratinocyte necroses were restricted to early and active ECDS lesions. In early ECDS patients (disease duration of <3 years), moderate to severe perivascular and/or periappendageal lymphocytic infiltrate and vacuolar changes in follicular epithelium were more prominent, whereas epidermal atrophy was less frequently observed, than in late ECDS patients (disease duration of ≥6 years). Vacuolar degeneration at the dermoepidermal junction is a common histological feature in ECDS and perivascular and/or periappendageal lymphocytic infiltrate and vacuolar degeneration of follicular epithelium are characteristic especially in early ECDS, further supporting a canonical idea that the elimination of mutated epidermal cells by immune surveillance contributes to tissue damage and resultant fibrosis in localized scleroderma. © 2013 European Academy of Dermatology and Venereology.

  1. Data Mining Feature Subset Weighting and Selection Using Genetic Algorithms

    DTIC Science & Technology

    2002-03-01

    seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf- spot, diaporthe-pod-&-stem-blight, cyst - nematode , 2-4-d-injury...seed-discolor: absent,present,?. 33. seed-size: norm,lt-norm,?. 34. shriveling: absent,present,?. 35. roots: norm,rotted,galls- cysts

  2. Correlative feature analysis on FFDM

    PubMed Central

    Yuan, Yading; Giger, Maryellen L.; Li, Hui; Sennett, Charlene

    2008-01-01

    Identifying the corresponding images of a lesion in different views is an essential step in improving the diagnostic ability of both radiologists and computer-aided diagnosis (CAD) systems. Because of the nonrigidity of the breasts and the 2D projective property of mammograms, this task is not trivial. In this pilot study, we present a computerized framework that differentiates between corresponding images of the same lesion in different views and noncorresponding images, i.e., images of different lesions. A dual-stage segmentation method, which employs an initial radial gradient index (RGI) based segmentation and an active contour model, is applied to extract mass lesions from the surrounding parenchyma. Then various lesion features are automatically extracted from each of the two views of each lesion to quantify the characteristics of density, size, texture and the neighborhood of the lesion, as well as its distance to the nipple. A two-step scheme is employed to estimate the probability that the two lesion images from different mammographic views are of the same physical lesion. In the first step, a correspondence metric for each pairwise feature is estimated by a Bayesian artificial neural network (BANN). Then, these pairwise correspondence metrics are combined using another BANN to yield an overall probability of correspondence. Receiver operating characteristic (ROC) analysis was used to evaluate the performance of the individual features and the selected feature subset in the task of distinguishing corresponding pairs from noncorresponding pairs. Using a FFDM database with 123 corresponding image pairs and 82 noncorresponding pairs, the distance feature yielded an area under the ROC curve (AUC) of 0.81±0.02 with leave-one-out (by physical lesion) evaluation, and the feature metric subset, which included distance, gradient texture, and ROI-based correlation, yielded an AUC of 0.87±0.02. The improvement by using multiple feature metrics was statistically significant compared to single feature performance. PMID:19175108

  3. Formal Semantics and Implementation of BPMN 2.0 Inclusive Gateways

    NASA Astrophysics Data System (ADS)

    Christiansen, David Raymond; Carbone, Marco; Hildebrandt, Thomas

    We present the first direct formalization of the semantics of inclusive gateways as described in the Business Process Modeling Notation (BPMN) 2.0 Beta 1 specification. The formal semantics is given for a minimal subset of BPMN 2.0 containing just the inclusive and exclusive gateways and the start and stop events. By focusing on this subset we achieve a simple graph model that highlights the particular non-local features of the inclusive gateway semantics. We sketch two ways of implementing the semantics using algorithms based on incrementally updated data structures and also discuss distributed communication-based implementations of the two algorithms.

  4. Hyperspectral image classification based on local binary patterns and PCANet

    NASA Astrophysics Data System (ADS)

    Yang, Huizhen; Gao, Feng; Dong, Junyu; Yang, Yang

    2018-04-01

    Hyperspectral image classification has been well acknowledged as one of the challenging tasks of hyperspectral data processing. In this paper, we propose a novel hyperspectral image classification framework based on local binary pattern (LBP) features and PCANet. In the proposed method, linear prediction error (LPE) is first employed to select a subset of informative bands, and LBP is utilized to extract texture features. Then, spectral and texture features are stacked into a high dimensional vectors. Next, the extracted features of a specified position are transformed to a 2-D image. The obtained images of all pixels are fed into PCANet for classification. Experimental results on real hyperspectral dataset demonstrate the effectiveness of the proposed method.

  5. SU-D-BRA-07: A Phantom Study to Assess the Variability in Radiomics Features Extracted From Cone-Beam CT Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fave, X; Fried, D; UT Health Science Center Graduate School of Biomedical Sciences, Houston, TX

    2015-06-15

    Purpose: Several studies have demonstrated the prognostic potential for texture features extracted from CT images of non-small cell lung cancer (NSCLC) patients. The purpose of this study was to determine if these features could be extracted with high reproducibility from cone-beam CT (CBCT) images in order for features to be easily tracked throughout a patient’s treatment. Methods: Two materials in a radiomics phantom, designed to approximate NSCLC tumor texture, were used to assess the reproducibility of 26 features. This phantom was imaged on 9 CBCT scanners, including Elekta and Varian machines. Thoracic and head imaging protocols were acquired on eachmore » machine. CBCT images from 27 NSCLC patients imaged using the thoracic protocol on Varian machines were obtained for comparison. The variance for each texture measured from these patients was compared to the variance in phantom values for different manufacturer/protocol subsets. Levene’s test was used to identify features which had a significantly smaller variance in the phantom scans versus the patient data. Results: Approximately half of the features (13/26 for material1 and 15/26 for material2) had a significantly smaller variance (p<0.05) between Varian thoracic scans of the phantom compared to patient scans. Many of these same features remained significant for the head scans on Varian (12/26 and 8/26). However, when thoracic scans from Elekta and Varian were combined, only a few features were still significant (4/26 and 5/26). Three features (skewness, coarsely filtered mean and standard deviation) were significant in almost all manufacturer/protocol subsets. Conclusion: Texture features extracted from CBCT images of a radiomics phantom are reproducible and show significantly less variation than the same features measured from patient images when images from the same manufacturer or with similar parameters are used. Reproducibility between CBCT scanners may be high enough to allow the extraction of meaningful texture values for patients. This project was funded in part by the Cancer Prevention Research Institute of Texas (CPRIT). Xenia Fave is a recipient of the American Association of Physicists in Medicine Graduate Fellowship.« less

  6. Neuromorphic VLSI Models of Selective Attention: From Single Chip Vision Sensors to Multi-chip Systems

    PubMed Central

    Indiveri, Giacomo

    2008-01-01

    Biological organisms perform complex selective attention operations continuously and effortlessly. These operations allow them to quickly determine the motor actions to take in response to combinations of external stimuli and internal states, and to pay attention to subsets of sensory inputs suppressing non salient ones. Selective attention strategies are extremely effective in both natural and artificial systems which have to cope with large amounts of input data and have limited computational resources. One of the main computational primitives used to perform these selection operations is the Winner-Take-All (WTA) network. These types of networks are formed by arrays of coupled computational nodes that selectively amplify the strongest input signals, and suppress the weaker ones. Neuromorphic circuits are an optimal medium for constructing WTA networks and for implementing efficient hardware models of selective attention systems. In this paper we present an overview of selective attention systems based on neuromorphic WTA circuits ranging from single-chip vision sensors for selecting and tracking the position of salient features, to multi-chip systems implement saliency-map based models of selective attention. PMID:27873818

  7. Neuromorphic VLSI Models of Selective Attention: From Single Chip Vision Sensors to Multi-chip Systems.

    PubMed

    Indiveri, Giacomo

    2008-09-03

    Biological organisms perform complex selective attention operations continuously and effortlessly. These operations allow them to quickly determine the motor actions to take in response to combinations of external stimuli and internal states, and to pay attention to subsets of sensory inputs suppressing non salient ones. Selective attention strategies are extremely effective in both natural and artificial systems which have to cope with large amounts of input data and have limited computational resources. One of the main computational primitives used to perform these selection operations is the Winner-Take-All (WTA) network. These types of networks are formed by arrays of coupled computational nodes that selectively amplify the strongest input signals, and suppress the weaker ones. Neuromorphic circuits are an optimal medium for constructing WTA networks and for implementing efficient hardware models of selective attention systems. In this paper we present an overview of selective attention systems based on neuromorphic WTA circuits ranging from single-chip vision sensors for selecting and tracking the position of salient features, to multi-chip systems implement saliency-map based models of selective attention.

  8. Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

    PubMed

    Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H

    2017-03-01

    For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type. Copyright © 2017. Published by Elsevier Inc.

  9. Analyzing the "CareGap": assessing gaps in adherence to clinical guidelines in adult soft tissue sarcoma.

    PubMed

    Waks, Zeev; Goldbraich, Esther; Farkash, Ariel; Torresani, Michele; Bertulli, Rossella; Restifo, Nicola; Locatelli, Paolo; Casali, Paolo; Carmeli, Boaz

    2013-01-01

    Clinical decision support systems (CDSSs) are gaining popularity as tools that assist physicians in optimizing medical care. These systems typically comply with evidence-based medicine and are designed with input from domain experts. Nonetheless, deviations from CDSS recommendations are abundant across a broad spectrum of disorders, raising the question as to why this phenomenon exists. Here, we analyze this gap in adherence to a clinical guidelines-based CDSS by examining the physician treatment decisions for 1329 adult soft tissue sarcoma patients in northern Italy using patient-specific parameters. Dubbing this analysis "CareGap", we find that deviations correlate strongly with certain disease features such as local versus metastatic clinical presentation. We also notice that deviations from the guideline-based CDSS suggestions occur more frequently for patients with shorter survival time. Such observations can direct physicians' attention to distinct patient cohorts that are prone to higher deviation levels from clinical practice guidelines. This illustrates the value of CareGap analysis in assessing quality of care for subsets of patients within a larger pathology.

  10. Query engine optimization for the EHR4CR protocol feasibility scenario.

    PubMed

    Soto-Rey, Iñaki; Bache, Richard; Dugas, Martin; Fritz, Fleur

    2013-01-01

    An essential step when recruiting patients for a Clinical Trial (CT) is to determine the number of patients that satisfy the Eligibility Criteria (ECs) for that trial. An innovative feature of the Electronic Health Records for Clinical Research (EHR4CR) platform is that when automatically determining patient counts, it also allows the user to view counts for subsets of the ECs. This is helpful because some combinations of ECs may be so restrictive that they yield very few or zero patients. If we wanted to show all possible combinations of ECs, the number of queries we would have to execute would be of 2n, where n is the total number of ECs. Assuming that an average study has between 20 and 30 ECs, the program would have to execute between 220 (1,048,576) and 230 (1,073,741,824) queries. This is not only computationally expensive but also impractical to visualise. The purpose of our research is to reduce possible combinationsto a manageable number.

  11. Targeting Super-Enhancers for Disease Treatment and Diagnosis.

    PubMed

    Shin, Ha Youn

    2018-05-10

    The transcriptional regulation of genes determines the fate of animal cell differentiation and subsequent organ development. With the recent progress in genome-wide technologies, the genomic landscapes of enhancers have been broadly explored in mammalian genomes, which led to the discovery of novel specific subsets of enhancers, termed superenhancers. Super-enhancers are large clusters of enhancers covering the long region of regulatory DNA and are densely occupied by transcription factors, active histone marks, and co-activators. Accumulating evidence points to the critical role that super-enhancers play in cell type-specific development and differentiation, as well as in the development of various diseases. Here, I provide a comprehensive description of the optimal approach for identifying functional units of superenhancers and their unique chromatin features in normal development and in diseases, including cancers. I also review the recent updated knowledge on novel approaches of targeting super-enhancers for the treatment of specific diseases, such as small-molecule inhibitors and potential gene therapy. This review will provide perspectives on using superenhancers as biomarkers to develop novel disease diagnostic tools and establish new directions in clinical therapeutic strategies.

  12. Identifying the Presence of Prostate Cancer in Individuals with PSA Levels <20 ng ml-1 Using Computational Data Extraction Analysis of High Dimensional Peripheral Blood Flow Cytometric Phenotyping Data.

    PubMed

    Cosma, Georgina; McArdle, Stéphanie E; Reeder, Stephen; Foulds, Gemma A; Hood, Simon; Khan, Masood; Pockley, A Graham

    2017-01-01

    Determining whether an asymptomatic individual with Prostate-Specific Antigen (PSA) levels below 20 ng ml -1 has prostate cancer in the absence of definitive, biopsy-based evidence continues to present a significant challenge to clinicians who must decide whether such individuals with low PSA values have prostate cancer. Herein, we present an advanced computational data extraction approach which can identify the presence of prostate cancer in men with PSA levels <20 ng ml -1 on the basis of peripheral blood immune cell profiles that have been generated using multi-parameter flow cytometry. Statistical analysis of immune phenotyping datasets relating to the presence and prevalence of key leukocyte populations in the peripheral blood, as generated from individuals undergoing routine tests for prostate cancer (including tissue biopsy) using multi-parametric flow cytometric analysis, was unable to identify significant relationships between leukocyte population profiles and the presence of benign disease (no prostate cancer) or prostate cancer. By contrast, a Genetic Algorithm computational approach identified a subset of five flow cytometry features ( CD 8 + CD 45 RA - CD 27 - CD 28 - ( CD 8 + Effector Memory cells); CD 4 + CD 45 RA - CD 27 - CD 28 - ( CD 4 + Terminally Differentiated Effector Memory Cells re-expressing CD45RA); CD 3 - CD 19 + (B cells); CD 3 + CD 56 + CD 8 + CD 4 + (NKT cells)) from a set of twenty features, which could potentially discriminate between benign disease and prostate cancer. These features were used to construct a prostate cancer prediction model using the k-Nearest-Neighbor classification algorithm. The proposed model, which takes as input the set of flow cytometry features, outperformed the predictive model which takes PSA values as input. Specifically, the flow cytometry-based model achieved Accuracy = 83.33%, AUC = 83.40%, and optimal ROC points of FPR = 16.13%, TPR = 82.93%, whereas the PSA-based model achieved Accuracy = 77.78%, AUC = 76.95%, and optimal ROC points of FPR = 29.03%, TPR = 82.93%. Combining PSA and flow cytometry predictors achieved Accuracy = 79.17%, AUC = 78.17% and optimal ROC points of FPR = 29.03%, TPR = 85.37%. The results demonstrate the value of computational intelligence-based approaches for interrogating immunophenotyping datasets and that combining peripheral blood phenotypic profiling with PSA levels improves diagnostic accuracy compared to using PSA test alone. These studies also demonstrate that the presence of cancer is reflected in changes in the peripheral blood immune phenotype profile which can be identified using computational analysis and interpretation of complex flow cytometry datasets.

  13. Computational Prediction of Protein Epsilon Lysine Acetylation Sites Based on a Feature Selection Method.

    PubMed

    Gao, JianZhao; Tao, Xue-Wen; Zhao, Jia; Feng, Yuan-Ming; Cai, Yu-Dong; Zhang, Ning

    2017-01-01

    Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Sensitivity Analysis for Atmospheric Infrared Sounder (AIRS) CO2 Retrieval

    NASA Technical Reports Server (NTRS)

    Gat, Ilana

    2012-01-01

    The Atmospheric Infrared Sounder (AIRS) is a thermal infrared sensor able to retrieve the daily atmospheric state globally for clear as well as partially cloudy field-of-views. The AIRS spectrometer has 2378 channels sensing from 15.4 micrometers to 3.7 micrometers, of which a small subset in the 15 micrometers region has been selected, to date, for CO2 retrieval. To improve upon the current retrieval method, we extended the retrieval calculations to include a prior estimate component and developed a channel ranking system to optimize the channels and number of channels used. The channel ranking system uses a mathematical formalism to rapidly process and assess the retrieval potential of large numbers of channels. Implementing this system, we identifed a larger optimized subset of AIRS channels that can decrease retrieval errors and minimize the overall sensitivity to other iridescent contributors, such as water vapor, ozone, and atmospheric temperature. This methodology selects channels globally by accounting for the latitudinal, longitudinal, and seasonal dependencies of the subset. The new methodology increases accuracy in AIRS CO2 as well as other retrievals and enables the extension of retrieved CO2 vertical profiles to altitudes ranging from the lower troposphere to upper stratosphere. The extended retrieval method for CO2 vertical profile estimation using a maximum-likelihood estimation method. We use model data to demonstrate the beneficial impact of the extended retrieval method using the new channel ranking system on CO2 retrieval.

  15. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds.

    PubMed

    Mounet, Nicolas; Gibertini, Marco; Schwaller, Philippe; Campi, Davide; Merkys, Andrius; Marrazzo, Antimo; Sohier, Thibault; Castelli, Ivano Eligio; Cepellotti, Andrea; Pizzi, Giovanni; Marzari, Nicola

    2018-03-01

    Two-dimensional (2D) materials have emerged as promising candidates for next-generation electronic and optoelectronic applications. Yet, only a few dozen 2D materials have been successfully synthesized or exfoliated. Here, we search for 2D materials that can be easily exfoliated from their parent compounds. Starting from 108,423 unique, experimentally known 3D compounds, we identify a subset of 5,619 compounds that appear layered according to robust geometric and bonding criteria. High-throughput calculations using van der Waals density functional theory, validated against experimental structural data and calculated random phase approximation binding energies, further allowed the identification of 1,825 compounds that are either easily or potentially exfoliable. In particular, the subset of 1,036 easily exfoliable cases provides novel structural prototypes and simple ternary compounds as well as a large portfolio of materials to search from for optimal properties. For a subset of 258 compounds, we explore vibrational, electronic, magnetic and topological properties, identifying 56 ferromagnetic and antiferromagnetic systems, including half-metals and half-semiconductors.

  16. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds

    NASA Astrophysics Data System (ADS)

    Mounet, Nicolas; Gibertini, Marco; Schwaller, Philippe; Campi, Davide; Merkys, Andrius; Marrazzo, Antimo; Sohier, Thibault; Castelli, Ivano Eligio; Cepellotti, Andrea; Pizzi, Giovanni; Marzari, Nicola

    2018-02-01

    Two-dimensional (2D) materials have emerged as promising candidates for next-generation electronic and optoelectronic applications. Yet, only a few dozen 2D materials have been successfully synthesized or exfoliated. Here, we search for 2D materials that can be easily exfoliated from their parent compounds. Starting from 108,423 unique, experimentally known 3D compounds, we identify a subset of 5,619 compounds that appear layered according to robust geometric and bonding criteria. High-throughput calculations using van der Waals density functional theory, validated against experimental structural data and calculated random phase approximation binding energies, further allowed the identification of 1,825 compounds that are either easily or potentially exfoliable. In particular, the subset of 1,036 easily exfoliable cases provides novel structural prototypes and simple ternary compounds as well as a large portfolio of materials to search from for optimal properties. For a subset of 258 compounds, we explore vibrational, electronic, magnetic and topological properties, identifying 56 ferromagnetic and antiferromagnetic systems, including half-metals and half-semiconductors.

  17. Innate lymphoid cells in graft-versus-host disease.

    PubMed

    Konya, V; Mjösberg, J

    2015-11-01

    Innate lymphoid cells (ILC) are lymphocytes lacking rearranged antigen receptors such as those expressed by T and B cells. ILC are important effector and regulatory cells of the innate immune system, controlling lymphoid organogenesis, tissue inflammation, and homeostasis. The family of ILC consists of cytotoxic NK cells and the more recently described noncytotoxic group 1, 2, and 3 ILC. The classification of noncytotoxic ILC-in many aspects-mirrors that of T helper cells, which is based on the expression of master transcription factors and signature cytokines specific for each subset. The IL-22 producing RORγt(+) ILC3 subset was recently found to be critical in the prevention of intestinal graft-versus-host disease (GVHD) following allogeneic hematopoietic cell transplantation (HCT) via strengthening the intestinal mucosal barrier. In this review, we summarize the current view of the immunological functions of human noncytotoxic ILC subsets and discuss the potentially beneficial features of IL-22 producing ILC3 in improving allo-HCT efficacy by attenuating susceptibility to GVHD. In addition, we explore the possibility of other ILC subsets playing a role in GVHD. © 2015 The Authors. American Journal of Transplantation published by Wiley Periodicals, Inc. on behalf of American Society of Transplant Surgeons.

  18. A Single HIV-1 Cluster and a Skewed Immune Homeostasis Drive the Early Spread of HIV among Resting CD4+ Cell Subsets within One Month Post-Infection

    PubMed Central

    Avettand-Fenoël, Véronique; Nembot, Georges; Mélard, Adeline; Blanc, Catherine; Lascoux-Combe, Caroline; Slama, Laurence; Allegre, Thierry; Allavena, Clotilde; Yazdanpanah, Yazdan; Duvivier, Claudine; Katlama, Christine; Goujard, Cécile; Seksik, Bao Chau Phung; Leplatois, Anne; Molina, Jean-Michel; Meyer, Laurence; Autran, Brigitte; Rouzioux, Christine

    2013-01-01

    Optimizing therapeutic strategies for an HIV cure requires better understanding the characteristics of early HIV-1 spread among resting CD4+ cells within the first month of primary HIV-1 infection (PHI). We studied the immune distribution, diversity, and inducibility of total HIV-DNA among the following cell subsets: monocytes, peripheral blood activated and resting CD4 T cells, long-lived (naive [TN] and central-memory [TCM]) and short-lived (transitional-memory [TTM] and effector-memory cells [TEM]) resting CD4+T cells from 12 acutely-infected individuals recruited at a median 36 days from infection. Cells were sorted for total HIV-DNA quantification, phylogenetic analysis and inducibility, all studied in relation to activation status and cell signaling. One month post-infection, a single CCR5-restricted viral cluster was massively distributed in all resting CD4+ subsets from 88% subjects, while one subject showed a slight diversity. High levels of total HIV-DNA were measured among TN (median 3.4 log copies/million cells), although 10-fold less (p = 0.0005) than in equally infected TCM (4.5), TTM (4.7) and TEM (4.6) cells. CD3−CD4+ monocytes harbored a low viral burden (median 2.3 log copies/million cells), unlike equally infected resting and activated CD4+ T cells (4.5 log copies/million cells). The skewed repartition of resting CD4 subsets influenced their contribution to the pool of resting infected CD4+T cells, two thirds of which consisted of short-lived TTM and TEM subsets, whereas long-lived TN and TCM subsets contributed the balance. Each resting CD4 subset produced HIV in vitro after stimulation with anti-CD3/anti-CD28+IL-2 with kinetics and magnitude varying according to subset differentiation, while IL-7 preferentially induced virus production from long-lived resting TN cells. In conclusion, within a month of infection, a clonal HIV-1 cluster is massively distributed among resting CD4 T-cell subsets with a flexible inducibility, suggesting that subset activation and skewed immune homeostasis determine the conditions of viral dissemination and early establishment of the HIV reservoir. PMID:23691172

  19. Multiscale Feature Analysis of Salivary Gland Branching Morphogenesis

    PubMed Central

    Baydil, Banu; Daley, William P.; Larsen, Melinda; Yener, Bülent

    2012-01-01

    Pattern formation in developing tissues involves dynamic spatio-temporal changes in cellular organization and subsequent evolution of functional adult structures. Branching morphogenesis is a developmental mechanism by which patterns are generated in many developing organs, which is controlled by underlying molecular pathways. Understanding the relationship between molecular signaling, cellular behavior and resulting morphological change requires quantification and categorization of the cellular behavior. In this study, tissue-level and cellular changes in developing salivary gland in response to disruption of ROCK-mediated signaling by are modeled by building cell-graphs to compute mathematical features capturing structural properties at multiple scales. These features were used to generate multiscale cell-graph signatures of untreated and ROCK signaling disrupted salivary gland organ explants. From confocal images of mouse submandibular salivary gland organ explants in which epithelial and mesenchymal nuclei were marked, a multiscale feature set capturing global structural properties, local structural properties, spectral, and morphological properties of the tissues was derived. Six feature selection algorithms and multiway modeling of the data was performed to identify distinct subsets of cell graph features that can uniquely classify and differentiate between different cell populations. Multiscale cell-graph analysis was most effective in classification of the tissue state. Cellular and tissue organization, as defined by a multiscale subset of cell-graph features, are both quantitatively distinct in epithelial and mesenchymal cell types both in the presence and absence of ROCK inhibitors. Whereas tensor analysis demonstrate that epithelial tissue was affected the most by inhibition of ROCK signaling, significant multiscale changes in mesenchymal tissue organization were identified with this analysis that were not identified in previous biological studies. We here show how to define and calculate a multiscale feature set as an effective computational approach to identify and quantify changes at multiple biological scales and to distinguish between different states in developing tissues. PMID:22403724

  20. Optimization of Self-Directed Target Coverage in Wireless Multimedia Sensor Network

    PubMed Central

    Yang, Yang; Wang, Yufei; Pi, Dechang; Wang, Ruchuan

    2014-01-01

    Video and image sensors in wireless multimedia sensor networks (WMSNs) have directed view and limited sensing angle. So the methods to solve target coverage problem for traditional sensor networks, which use circle sensing model, are not suitable for WMSNs. Based on the FoV (field of view) sensing model and FoV disk model proposed, how expected multimedia sensor covers the target is defined by the deflection angle between target and the sensor's current orientation and the distance between target and the sensor. Then target coverage optimization algorithms based on expected coverage value are presented for single-sensor single-target, multisensor single-target, and single-sensor multitargets problems distinguishingly. Selecting the orientation that sensor rotated to cover every target falling in the FoV disk of that sensor for candidate orientations and using genetic algorithm to multisensor multitargets problem, which has NP-complete complexity, then result in the approximated minimum subset of sensors which covers all the targets in networks. Simulation results show the algorithm's performance and the effect of number of targets on the resulting subset. PMID:25136667

  1. Boom Minimization Framework for Supersonic Aircraft Using CFD Analysis

    NASA Technical Reports Server (NTRS)

    Ordaz, Irian; Rallabhandi, Sriram K.

    2010-01-01

    A new framework is presented for shape optimization using analytical shape functions and high-fidelity computational fluid dynamics (CFD) via Cart3D. The focus of the paper is the system-level integration of several key enabling analysis tools and automation methods to perform shape optimization and reduce sonic boom footprint. A boom mitigation case study subject to performance, stability and geometrical requirements is presented to demonstrate a subset of the capabilities of the framework. Lastly, a design space exploration is carried out to assess the key parameters and constraints driving the design.

  2. Home and Community Environmental Features, Activity Performance, and Community Participation among Older Adults with Functional Limitations

    PubMed Central

    Yang, Hsiang-Yu; Sanford, Jon A.

    2012-01-01

    This paper describes relationships among home and community environmental features, activity performance in the home, and community participation potential to support aging in place. A subset of data on older adults with functional limitations (N = 122), sixty three (63) with mobility and 59 with other limitations, were utilized in this study from a larger project's subject pool. Results showed significant and positive correlations between environmental barriers, activity dependence and difficulty at home, and less community participation in the mobility limitation group. While kitchen and bathroom features were most limiting to home performance, bathtub or shower was the only home feature, and destination social environment was the only community feature, that explained community participation. Compared to environmental features, home performance explained much more community participation. Study results provide detailed information about environmental features as well as types of home activities that can be prioritized as interventions for aging in place. PMID:22162808

  3. Feature Selection for Ridge Regression with Provable Guarantees.

    PubMed

    Paul, Saurabh; Drineas, Petros

    2016-04-01

    We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.

  4. SVM-based feature extraction and classification of aflatoxin contaminated corn using fluorescence hyperspectral data

    USDA-ARS?s Scientific Manuscript database

    Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...

  5. Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest

    PubMed Central

    Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

    2018-01-01

    Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Zhenhong; Liu, Changzheng; Yin, Yafeng

    The objective of this study is to evaluate the opportunity for public charging for a subset of US cities by using available public parking lot data. The capacity of the parking lots weighted by the daily parking occupancy rate is used as a proxy for daily parking demand. The city s public charging opportunity is defined as the percentage of parking demand covered by chargers on the off-street parking network. We assess this opportunity under the scenario of optimal deployment of public chargers. We use the maximum coverage model to optimally locate those facilities on the public garage network. Wemore » compare the optimal results to the actual placement of chargers. These empirical findings are of great interest to policymakers as those showcase the potential of increasing opportunities for charging under optimal charging location planning.« less

  7. A genetic algorithm-based framework for wavelength selection on sample categorization.

    PubMed

    Anzanello, Michel J; Yamashita, Gabrielli; Marcelo, Marcelo; Fogliatto, Flávio S; Ortiz, Rafael S; Mariotti, Kristiane; Ferrão, Marco F

    2017-08-01

    In forensic and pharmaceutical scenarios, the application of chemometrics and optimization techniques has unveiled common and peculiar features of seized medicine and drug samples, helping investigative forces to track illegal operations. This paper proposes a novel framework aimed at identifying relevant subsets of attenuated total reflectance Fourier transform infrared (ATR-FTIR) wavelengths for classifying samples into two classes, for example authentic or forged categories in case of medicines, or salt or base form in cocaine analysis. In the first step of the framework, the ATR-FTIR spectra were partitioned into equidistant intervals and the k-nearest neighbour (KNN) classification technique was applied to each interval to insert samples into proper classes. In the next step, selected intervals were refined through the genetic algorithm (GA) by identifying a limited number of wavelengths from the intervals previously selected aimed at maximizing classification accuracy. When applied to Cialis®, Viagra®, and cocaine ATR-FTIR datasets, the proposed method substantially decreased the number of wavelengths needed to categorize, and increased the classification accuracy. From a practical perspective, the proposed method provides investigative forces with valuable information towards monitoring illegal production of drugs and medicines. In addition, focusing on a reduced subset of wavelengths allows the development of portable devices capable of testing the authenticity of samples during police checking events, avoiding the need for later laboratorial analyses and reducing equipment expenses. Theoretically, the proposed GA-based approach yields more refined solutions than the current methods relying on interval approaches, which tend to insert irrelevant wavelengths in the retained intervals. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. Early exposure to interleukin-21 limits rapidly generated anti-Epstein-Barr virus T-cell line differentiation.

    PubMed

    Orio, Julie; Carli, Cédric; Janelle, Valérie; Giroux, Martin; Taillefer, Julie; Goupil, Mathieu; Richaud, Manon; Roy, Denis-Claude; Delisle, Jean-Sébastien

    2015-04-01

    The adoptive transfer of ex vivo-expanded Epstein-Barr virus (EBV)-specific T-cell lines is an attractive strategy to treat EBV-related neoplasms. Current evidence suggests that for adoptive immunotherapy in general, clinical responses are superior if the transferred cells have not reached a late or terminal effector differentiation phenotype before infusion. The cytokine interleukin (IL)-21 has shown great promise at limiting late T-cell differentiation in vitro, but this remains to be demonstrated in anti-viral T-cell lines. We adapted a clinically validated protocol to rapidly generate EBV-specific T-cell lines in 12 to 14 days and tested whether the addition of IL-21 at the initiation of the culture would affect T-cell expansion and differentiation. We generated clinical-scale EBV-restricted T-cell line expansion with balanced T-cell subset ratios. The addition of IL-21 at the beginning of the culture decreased both T-cell expansion and effector memory T-cell accumulation, with a relative increase in less-differentiated T cells. Within CD4 T-cell subsets, exogenous IL-21 was notably associated with the cell surface expression of CD27 and high KLF2 transcript levels, further arguing for a role of IL-21 in the control of late T-cell differentiation. Our results show that IL-21 has profound effects on T-cell differentiation in a rapid T-cell line generation protocol and as such should be further explored as a novel approach to program anti-viral T cells with features associated with early differentiation and optimal therapeutic efficacy. Copyright © 2015 International Society for Cellular Therapy. Published by Elsevier Inc. All rights reserved.

  9. Optimization of Causative Factors for Landslide Susceptibility Evaluation Using Remote Sensing and GIS Data in Parts of Niigata, Japan.

    PubMed

    Dou, Jie; Tien Bui, Dieu; Yunus, Ali P; Jia, Kun; Song, Xuan; Revhaug, Inge; Xia, Huan; Zhu, Zhongfan

    2015-01-01

    This paper assesses the potentiality of certainty factor models (CF) for the best suitable causative factors extraction for landslide susceptibility mapping in the Sado Island, Niigata Prefecture, Japan. To test the applicability of CF, a landslide inventory map provided by National Research Institute for Earth Science and Disaster Prevention (NIED) was split into two subsets: (i) 70% of the landslides in the inventory to be used for building the CF based model; (ii) 30% of the landslides to be used for the validation purpose. A spatial database with fifteen landslide causative factors was then constructed by processing ALOS satellite images, aerial photos, topographical and geological maps. CF model was then applied to select the best subset from the fifteen factors. Using all fifteen factors and the best subset factors, landslide susceptibility maps were produced using statistical index (SI) and logistic regression (LR) models. The susceptibility maps were validated and compared using landslide locations in the validation data. The prediction performance of two susceptibility maps was estimated using the Receiver Operating Characteristics (ROC). The result shows that the area under the ROC curve (AUC) for the LR model (AUC = 0.817) is slightly higher than those obtained from the SI model (AUC = 0.801). Further, it is noted that the SI and LR models using the best subset outperform the models using the fifteen original factors. Therefore, we conclude that the optimized factor model using CF is more accurate in predicting landslide susceptibility and obtaining a more homogeneous classification map. Our findings acknowledge that in the mountainous regions suffering from data scarcity, it is possible to select key factors related to landslide occurrence based on the CF models in a GIS platform. Hence, the development of a scenario for future planning of risk mitigation is achieved in an efficient manner.

  10. Optimization of Causative Factors for Landslide Susceptibility Evaluation Using Remote Sensing and GIS Data in Parts of Niigata, Japan

    PubMed Central

    Dou, Jie; Tien Bui, Dieu; P. Yunus, Ali; Jia, Kun; Song, Xuan; Revhaug, Inge; Xia, Huan; Zhu, Zhongfan

    2015-01-01

    This paper assesses the potentiality of certainty factor models (CF) for the best suitable causative factors extraction for landslide susceptibility mapping in the Sado Island, Niigata Prefecture, Japan. To test the applicability of CF, a landslide inventory map provided by National Research Institute for Earth Science and Disaster Prevention (NIED) was split into two subsets: (i) 70% of the landslides in the inventory to be used for building the CF based model; (ii) 30% of the landslides to be used for the validation purpose. A spatial database with fifteen landslide causative factors was then constructed by processing ALOS satellite images, aerial photos, topographical and geological maps. CF model was then applied to select the best subset from the fifteen factors. Using all fifteen factors and the best subset factors, landslide susceptibility maps were produced using statistical index (SI) and logistic regression (LR) models. The susceptibility maps were validated and compared using landslide locations in the validation data. The prediction performance of two susceptibility maps was estimated using the Receiver Operating Characteristics (ROC). The result shows that the area under the ROC curve (AUC) for the LR model (AUC = 0.817) is slightly higher than those obtained from the SI model (AUC = 0.801). Further, it is noted that the SI and LR models using the best subset outperform the models using the fifteen original factors. Therefore, we conclude that the optimized factor model using CF is more accurate in predicting landslide susceptibility and obtaining a more homogeneous classification map. Our findings acknowledge that in the mountainous regions suffering from data scarcity, it is possible to select key factors related to landslide occurrence based on the CF models in a GIS platform. Hence, the development of a scenario for future planning of risk mitigation is achieved in an efficient manner. PMID:26214691

  11. Optimal Estimation of Clock Values and Trends from Finite Data

    NASA Technical Reports Server (NTRS)

    Greenhall, Charles

    2005-01-01

    We show how to solve two problems of optimal linear estimation from a finite set of phase data. Clock noise is modeled as a stochastic process with stationary dth increments. The covariance properties of such a process are contained in the generalized autocovariance function (GACV). We set up two principles for optimal estimation: with the help of the GACV, these principles lead to a set of linear equations for the regression coefficients and some auxiliary parameters. The mean square errors of the estimators are easily calculated. The method can be used to check the results of other methods and to find good suboptimal estimators based on a small subset of the available data.

  12. Nonlinear optimization simplified by hypersurface deformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stillinger, F.H.; Weber, T.A.

    1988-09-01

    A general strategy is advanced for simplifying nonlinear optimization problems, the ant-lion method. This approach exploits shape modifications of the cost-function hypersurface which distend basins surrounding low-lying minima (including global minima). By intertwining hypersurface deformations with steepest-descent displacements, the search is concentrated on a small relevant subset of all minima. Specific calculations demonstrating the value of this method are reported for the partitioning of two classes of irregular but nonrandom graphs, the prime-factor graphs and the pi graphs. We also indicate how this approach can be applied to the traveling salesman problem and to design layout optimization, and that itmore » may be useful in combination with simulated annealing strategies.« less

  13. Hierarchical Kohonenen net for anomaly detection in network security.

    PubMed

    Sarasamma, Suseela T; Zhu, Qiuming A; Huff, Julie

    2005-04-01

    A novel multilevel hierarchical Kohonen Net (K-Map) for an intrusion detection system is presented. Each level of the hierarchical map is modeled as a simple winner-take-all K-Map. One significant advantage of this multilevel hierarchical K-Map is its computational efficiency. Unlike other statistical anomaly detection methods such as nearest neighbor approach, K-means clustering or probabilistic analysis that employ distance computation in the feature space to identify the outliers, our approach does not involve costly point-to-point computation in organizing the data into clusters. Another advantage is the reduced network size. We use the classification capability of the K-Map on selected dimensions of data set in detecting anomalies. Randomly selected subsets that contain both attacks and normal records from the KDD Cup 1999 benchmark data are used to train the hierarchical net. We use a confidence measure to label the clusters. Then we use the test set from the same KDD Cup 1999 benchmark to test the hierarchical net. We show that a hierarchical K-Map in which each layer operates on a small subset of the feature space is superior to a single-layer K-Map operating on the whole feature space in detecting a variety of attacks in terms of detection rate as well as false positive rate.

  14. OPTIMAL NETWORK TOPOLOGY DESIGN

    NASA Technical Reports Server (NTRS)

    Yuen, J. H.

    1994-01-01

    This program was developed as part of a research study on the topology design and performance analysis for the Space Station Information System (SSIS) network. It uses an efficient algorithm to generate candidate network designs (consisting of subsets of the set of all network components) in increasing order of their total costs, and checks each design to see if it forms an acceptable network. This technique gives the true cost-optimal network, and is particularly useful when the network has many constraints and not too many components. It is intended that this new design technique consider all important performance measures explicitly and take into account the constraints due to various technical feasibilities. In the current program, technical constraints are taken care of by the user properly forming the starting set of candidate components (e.g. nonfeasible links are not included). As subsets are generated, they are tested to see if they form an acceptable network by checking that all requirements are satisfied. Thus the first acceptable subset encountered gives the cost-optimal topology satisfying all given constraints. The user must sort the set of "feasible" link elements in increasing order of their costs. The program prompts the user for the following information for each link: 1) cost, 2) connectivity (number of stations connected by the link), and 3) the stations connected by that link. Unless instructed to stop, the program generates all possible acceptable networks in increasing order of their total costs. The program is written only to generate topologies that are simply connected. Tests on reliability, delay, and other performance measures are discussed in the documentation, but have not been incorporated into the program. This program is written in PASCAL for interactive execution and has been implemented on an IBM PC series computer operating under PC DOS. The disk contains source code only. This program was developed in 1985.

  15. Atlas ranking and selection for automatic segmentation of the esophagus from CT scans

    NASA Astrophysics Data System (ADS)

    Yang, Jinzhong; Haas, Benjamin; Fang, Raymond; Beadle, Beth M.; Garden, Adam S.; Liao, Zhongxing; Zhang, Lifei; Balter, Peter; Court, Laurence

    2017-12-01

    In radiation treatment planning, the esophagus is an important organ-at-risk that should be spared in patients with head and neck cancer or thoracic cancer who undergo intensity-modulated radiation therapy. However, automatic segmentation of the esophagus from CT scans is extremely challenging because of the structure’s inconsistent intensity, low contrast against the surrounding tissues, complex and variable shape and location, and random air bubbles. The goal of this study is to develop an online atlas selection approach to choose a subset of optimal atlases for multi-atlas segmentation to the delineate esophagus automatically. We performed atlas selection in two phases. In the first phase, we used the correlation coefficient of the image content in a cubic region between each atlas and the new image to evaluate their similarity and to rank the atlases in an atlas pool. A subset of atlases based on this ranking was selected, and deformable image registration was performed to generate deformed contours and deformed images in the new image space. In the second phase of atlas selection, we used Kullback-Leibler divergence to measure the similarity of local-intensity histograms between the new image and each of the deformed images, and the measurements were used to rank the previously selected atlases. Deformed contours were overlapped sequentially, from the most to the least similar, and the overlap ratio was examined. We further identified a subset of optimal atlases by analyzing the variation of the overlap ratio versus the number of atlases. The deformed contours from these optimal atlases were fused together using a modified simultaneous truth and performance level estimation algorithm to produce the final segmentation. The approach was validated with promising results using both internal data sets (21 head and neck cancer patients and 15 thoracic cancer patients) and external data sets (30 thoracic patients).

  16. Evaluation of verifiability in HAL/S. [programming language for aerospace computers

    NASA Technical Reports Server (NTRS)

    Young, W. D.; Tripathi, A. R.; Good, D. I.; Browne, J. C.

    1979-01-01

    The ability of HAL/S to write verifiable programs, a characteristic which is highly desirable in aerospace applications, is lacking since many of the features of HAL/S do not lend themselves to existing verification techniques. The methods of language evaluation are described along with the means in which language features are evaluated for verifiability. These methods are applied in this study to various features of HAL/S to identify specific areas in which the language fails with respect to verifiability. Some conclusions are drawn for the design of programming languages for aerospace applications and ongoing work to identify a verifiable subset of HAL/S is described.

  17. Detection of stress/anxiety state from EEG features during video watching.

    PubMed

    Giannakakis, Giorgos; Grigoriadis, Dimitris; Tsiknakis, Manolis

    2015-01-01

    This paper studies the effect of stress/anxiety states on EEG signals during video sessions. The levels of arousal and valence that are induced to each subject while watching each video are self rated. These levels are mapped in stress and relaxed states and subjects that fufill criteria of adequate anxiety/stress scale were chosen leading to a subset of 18 subjects. Then, temporal, spectral and non linear EEG features are evaluated for being able to represent accurately states under investigation. Feature selection schemes choose the most significant of them in order to provide increased discrimination ability between relaxed and anxiety/stress states.

  18. Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization.

    PubMed

    Fang, Chunying; Li, Haifeng; Ma, Lin; Zhang, Mancai

    2017-01-01

    Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S -transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S -transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F -score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.

  19. An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data.

    PubMed

    Liu, Jian; Cheng, Yuhu; Wang, Xuesong; Zhang, Lin; Liu, Hui

    2017-08-17

    It is urgent to diagnose colorectal cancer in the early stage. Some feature genes which are important to colorectal cancer development have been identified. However, for the early stage of colorectal cancer, less is known about the identity of specific cancer genes that are associated with advanced clinical stage. In this paper, we conducted a feature extraction method named Optimal Mean based Block Robust Feature Extraction method (OMBRFE) to identify feature genes associated with advanced colorectal cancer in clinical stage by using the integrated colorectal cancer data. Firstly, based on the optimal mean and L 2,1 -norm, a novel feature extraction method called Optimal Mean based Robust Feature Extraction method (OMRFE) is proposed to identify feature genes. Then the OMBRFE method which introduces the block ideology into OMRFE method is put forward to process the colorectal cancer integrated data which includes multiple genomic data: copy number alterations, somatic mutations, methylation expression alteration, as well as gene expression changes. Experimental results demonstrate that the OMBRFE is more effective than previous methods in identifying the feature genes. Moreover, genes identified by OMBRFE are verified to be closely associated with advanced colorectal cancer in clinical stage.

  20. Mixed phenotype (T/B/myeloid) extramedullary blast crisis as an initial presentation of chronic myelogenous leukemia.

    PubMed

    Qing, Xin; Qing, Annie; Ji, Ping; French, Samuel W; Mason, Holli

    2018-04-01

    Chronic myelogenous leukemia (CML) is a myeloproliferative disorder characterized by the Philadelphia (Ph) chromosome generated by the reciprocal translocation t(9,22)(q34;q11). The natural progression of the disease follows a biphasic or triphasic course. Most cases of CML are diagnosed in the chronic phase. Extramedullary blast crisis rarely occurs during the course of CML, and is extremely rare as the initial presentation of CML. Here, we report the case of a 32-year-old female with enlarged neck lymph nodes and fatigue. She was diagnosed with B-lymphoblastic leukemia/lymphoma with possible mixed phenotype (B/myeloid) by right neck lymph node biopsy at an outside hospital. However, review of her peripheral blood smear and her bone marrow aspirate and biopsy showed features consistent with CML, which was confirmed by PCR and karyotyping. An ultrasound-guided right cervical lymph node core biopsy showed a diffuse infiltrate of blasts, near totally replacing the normal lymph node tissue, admixed with some hematopoietic cells including megakaryocytes, erythroid precursors and maturing myeloid cells. By flow cytometry and immunohistochemistry, the blasts expressed CD2, cytoplasmic CD3, CD5, CD7, CD56, TdT, CD10 (weak, subset), CD19 (subset), CD79a, PAX-5 (subset), CD34, CD38, CD117 (subset), HLA-DR (subset), CD11b, CD13 (subset), CD33 (subset), and weak cytoplasmic myeloperoxidase, without co-expression of surface CD3, CD4, CD8, CD20, CD22, CD14, CD15, CD16 and CD64, consistent with blasts with mixed phenotype (T/B/myeloid). A diagnosis of extramedullary blast crisis of CML was made. Chromosomal analysis performed on the lymph node biopsy tissue revealed multiple numerical and structural abnormalities including the Ph chromosome (46-49,XX,add(1)(p34),add(3)(p25),add(5)(q13),-6,t(9;22)(q34;q11.2),+10,-15,add(17)(p11.2),+19, +der(22)t(9;22),+mar[cp8]). After completion of one cycle of combined chemotherapy plus dasatinib treatment, she was transferred to City of Hope National Cancer Institute for bone marrow transplantation. Diagnosis of extramedullary blast crisis should be suspected in patients with leukocytosis and extramedullary blast proliferation. In this case study, we diagnosed extramedullary blast crisis accompanied by chronic phase of CML in the bone marrow. To our knowledge, this is the first reported case of extramedullary blast crisis as the initial presentation of CML with T/B/myeloid mixed phenotype. Other unusual features associated with this case are also discussed. Copyright © 2018 Elsevier Inc. All rights reserved.

  1. Discovering biclusters in gene expression data based on high-dimensional linear geometries

    PubMed Central

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2008-01-01

    Background In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. Results In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. Conclusion We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis. PMID:18433477

  2. Simulation of millimeter-wave body images and its application to biometric recognition

    NASA Astrophysics Data System (ADS)

    Moreno-Moreno, Miriam; Fierrez, Julian; Vera-Rodriguez, Ruben; Parron, Josep

    2012-06-01

    One of the emerging applications of the millimeter-wave imaging technology is its use in biometric recognition. This is mainly due to some properties of the millimeter-waves such as their ability to penetrate through clothing and other occlusions, their low obtrusiveness when collecting the image and the fact that they are harmless to health. In this work we first describe the generation of a database comprising 1200 synthetic images at 94 GHz obtained from the body of 50 people. Then we extract a small set of distance-based features from each image and select the best feature subsets for person recognition using the SFFS feature selection algorithm. Finally these features are used in body geometry authentication obtaining promising results.

  3. geoknife: Reproducible web-processing of large gridded datasets

    USGS Publications Warehouse

    Read, Jordan S.; Walker, Jordan I.; Appling, Alison P.; Blodgett, David L.; Read, Emily K.; Winslow, Luke A.

    2016-01-01

    Geoprocessing of large gridded data according to overlap with irregular landscape features is common to many large-scale ecological analyses. The geoknife R package was created to facilitate reproducible analyses of gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers. Outputs from geoknife include spatial and temporal data subsets, spatially-averaged time series values filtered by user-specified areas of interest, and categorical coverage fractions for various land-use types.

  4. Multi-task feature selection in microarray data by binary integer programming.

    PubMed

    Lan, Liang; Vucetic, Slobodan

    2013-12-20

    A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.

  5. Spatial features register: toward standardization of spatial features

    USGS Publications Warehouse

    Cascio, Janette

    1994-01-01

    As the need to share spatial data increases, more than agreement on a common format is needed to ensure that the data is meaningful to both the importer and the exporter. Effective data transfer also requires common definitions of spatial features. To achieve this, part 2 of the Spatial Data Transfer Standard (SDTS) provides a model for a spatial features data content specification and a glossary of features and attributes that fit this model. The model provides a foundation for standardizing spatial features. The glossary now contains only a limited subset of hydrographic and topographic features. For it to be useful, terms and definitions must be included for other categories, such as base cartographic, bathymetric, cadastral, cultural and demographic, geodetic, geologic, ground transportation, international boundaries, soils, vegetation, water, and wetlands, and the set of hydrographic and topographic features must be expanded. This paper will review the philosophy of the SDTS part 2 and the current plans for creating a national spatial features register as one mechanism for maintaining part 2.

  6. Landscape epidemiology in urban environments: The example of rodent-borne Trypanosoma in Niamey, Niger.

    PubMed

    Rossi, Jean-Pierre; Kadaouré, Ibrahima; Godefroid, Martin; Dobigny, Gauthier

    2017-10-05

    Trypanosomes are protozoan parasites found worldwide, infecting humans and animals. In the past decade, the number of reports on atypical human cases due to Trypanosoma lewisi or T. lewisi-like has increased urging to investigate the multiple factors driving the disease dynamics, particularly in cities where rodents and humans co-exist at high densities. In the present survey, we used a species distribution model, Maxent, to assess the spatial pattern of Trypanosoma-positive rodents in the city of Niamey. The explanatory variables were landscape metrics describing urban landscape composition and physiognomy computed from 8 land-cover classes. We computed the metrics around each data location using a set of circular buffers of increasing radii (20m, 40m, 60m, 80m and 100m). For each spatial resolution, we determined the optimal combination of feature class and regularization multipliers by fitting Maxent with the full dataset. Since our dataset was small (114 occurrences) we expected an important uncertainty associated to data partitioning into calibration and evaluation datasets. We thus performed 350 independent model runs with a training dataset representing a random subset of 80% of the occurrences and the optimal Maxent parameters. Each model yielded a map of habitat suitability over Niamey, which was transformed into a binary map implementing a threshold maximizing the sensitivity and the specificity. The resulting binary maps were combined to display the proportion of models that indicated a good environmental suitability for Trypanosoma-positive rodents. Maxent performed better with landscape metrics derived from buffers of 80m. Habitat suitability for Trypanosoma-positive rodents exhibited large patches linked to urban features such as patch richness and the proportion of landscape covered by concrete or tarred areas. Such inferences could be helpful in assessing areas at risk, setting of monitoring programs, public and medical staff awareness or even vaccination campaigns. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Origin and Role of a Subset of Tumor-Associated Neutrophils with Antigen-Presenting Cell Features in Early-Stage Human Lung Cancer.

    PubMed

    Singhal, Sunil; Bhojnagarwala, Pratik S; O'Brien, Shaun; Moon, Edmund K; Garfall, Alfred L; Rao, Abhishek S; Quatromoni, Jon G; Stephen, Tom Li; Litzky, Leslie; Deshpande, Charuhas; Feldman, Michael D; Hancock, Wayne W; Conejo-Garcia, Jose R; Albelda, Steven M; Eruslanov, Evgeniy B

    2016-07-11

    Based on studies in mouse tumor models, granulocytes appear to play a tumor-promoting role. However, there are limited data about the phenotype and function of tumor-associated neutrophils (TANs) in humans. Here, we identify a subset of TANs that exhibited characteristics of both neutrophils and antigen-presenting cells (APCs) in early-stage human lung cancer. These APC-like "hybrid neutrophils," which originate from CD11b(+)CD15(hi)CD10(-)CD16(low) immature progenitors, are able to cross-present antigens, as well as trigger and augment anti-tumor T cell responses. Interferon-γ and granulocyte-macrophage colony-stimulating factor are requisite factors in the tumor that, working through the Ikaros transcription factor, synergistically exert their APC-promoting effects on the progenitors. Overall, these data demonstrate the existence of a specialized TAN subset with anti-tumor capabilities in human cancer. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Analysis of Normal Human Mammary Epigenomes Reveals Cell-Specific Active Enhancer States and Associated Transcription Factor Networks.

    PubMed

    Pellacani, Davide; Bilenky, Misha; Kannan, Nagarajan; Heravi-Moussavi, Alireza; Knapp, David J H F; Gakkhar, Sitanshu; Moksa, Michelle; Carles, Annaick; Moore, Richard; Mungall, Andrew J; Marra, Marco A; Jones, Steven J M; Aparicio, Samuel; Hirst, Martin; Eaves, Connie J

    2016-11-15

    The normal adult human mammary gland is a continuous bilayered epithelial system. Bipotent and myoepithelial progenitors are prominent and unique components of the outer (basal) layer. The inner (luminal) layer includes both luminal-restricted progenitors and a phenotypically separable fraction that lacks progenitor activity. We now report an epigenomic comparison of these three subsets with one another, with their associated stromal cells, and with three immortalized, non-tumorigenic human mammary cell lines. Each genome-wide analysis contains profiles for six histone marks, methylated DNA, and RNA transcripts. Analysis of these datasets shows that each cell type has unique features, primarily within genomic regulatory regions, and that the cell lines group together. Analyses of the promoter and enhancer profiles place the luminal progenitors in between the basal cells and the non-progenitor luminal subset. Integrative analysis reveals networks of subset-specific transcription factors. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  9. Texture analysis based on the Hermite transform for image classification and segmentation

    NASA Astrophysics Data System (ADS)

    Estudillo-Romero, Alfonso; Escalante-Ramirez, Boris; Savage-Carmona, Jesus

    2012-06-01

    Texture analysis has become an important task in image processing because it is used as a preprocessing stage in different research areas including medical image analysis, industrial inspection, segmentation of remote sensed imaginary, multimedia indexing and retrieval. In order to extract visual texture features a texture image analysis technique is presented based on the Hermite transform. Psychovisual evidence suggests that the Gaussian derivatives fit the receptive field profiles of mammalian visual systems. The Hermite transform describes locally basic texture features in terms of Gaussian derivatives. Multiresolution combined with several analysis orders provides detection of patterns that characterizes every texture class. The analysis of the local maximum energy direction and steering of the transformation coefficients increase the method robustness against the texture orientation. This method presents an advantage over classical filter bank design because in the latter a fixed number of orientations for the analysis has to be selected. During the training stage, a subset of the Hermite analysis filters is chosen in order to improve the inter-class separability, reduce dimensionality of the feature vectors and computational cost during the classification stage. We exhaustively evaluated the correct classification rate of real randomly selected training and testing texture subsets using several kinds of common used texture features. A comparison between different distance measurements is also presented. Results of the unsupervised real texture segmentation using this approach and comparison with previous approaches showed the benefits of our proposal.

  10. Computer vision-based method for classification of wheat grains using artificial neural network.

    PubMed

    Sabanci, Kadir; Kayabasi, Ahmet; Toktas, Abdurrahim

    2017-06-01

    A simplified computer vision-based application using artificial neural network (ANN) depending on multilayer perceptron (MLP) for accurately classifying wheat grains into bread or durum is presented. The images of 100 bread and 100 durum wheat grains are taken via a high-resolution camera and subjected to pre-processing. The main visual features of four dimensions, three colors and five textures are acquired using image-processing techniques (IPTs). A total of 21 visual features are reproduced from the 12 main features to diversify the input population for training and testing the ANN model. The data sets of visual features are considered as input parameters of the ANN model. The ANN with four different input data subsets is modelled to classify the wheat grains into bread or durum. The ANN model is trained with 180 grains and its accuracy tested with 20 grains from a total of 200 wheat grains. Seven input parameters that are most effective on the classifying results are determined using the correlation-based CfsSubsetEval algorithm to simplify the ANN model. The results of the ANN model are compared in terms of accuracy rate. The best result is achieved with a mean absolute error (MAE) of 9.8 × 10 -6 by the simplified ANN model. This shows that the proposed classifier based on computer vision can be successfully exploited to automatically classify a variety of grains. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  11. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study.

    PubMed

    AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku

    2014-05-27

    The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.

  12. TU-AB-BRA-04: Quantitative Radiomics: Sensitivity of PET Textural Features to Image Acquisition and Reconstruction Parameters Implies the Need for Standards

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nyflot, MJ; Yang, F; Byrd, D

    Purpose: Despite increased use of heterogeneity metrics for PET imaging, standards for metrics such as textural features have yet to be developed. We evaluated the quantitative variability caused by image acquisition and reconstruction parameters on PET textural features. Methods: PET images of the NEMA IQ phantom were simulated with realistic image acquisition noise. 35 features based on intensity histograms (IH), co-occurrence matrices (COM), neighborhood-difference matrices (NDM), and zone-size matrices (ZSM) were evaluated within lesions (13, 17, 22, 28, 33 mm diameter). Variability in metrics across 50 independent images was evaluated as percent difference from mean for three phantom girths (850,more » 1030, 1200 mm) and two OSEM reconstructions (2 iterations, 28 subsets, 5 mm FWHM filtration vs 6 iterations, 28 subsets, 8.6 mm FWHM filtration). Also, patient sample size to detect a clinical effect of 30% with Bonferroni-corrected α=0.001 and 95% power was estimated. Results: As a class, NDM features demonstrated greatest sensitivity in means (5–50% difference for medium girth and reconstruction comparisons and 10–100% for large girth comparisons). Some IH features (standard deviation, energy, entropy) had variability below 10% for all sensitivity studies, while others (kurtosis, skewness) had variability above 30%. COM and ZSM features had complex sensitivities; correlation, energy, entropy (COM) and zone percentage, short-zone emphasis, zone-size non-uniformity (ZSM) had variability less than 5% while other metrics had differences up to 30%. Trends were similar for sample size estimation; for example, coarseness, contrast, and strength required 12, 38, and 52 patients to detect a 30% effect for the small girth case but 38, 88, and 128 patients in the large girth case. Conclusion: The sensitivity of PET textural features to image acquisition and reconstruction parameters is large and feature-dependent. Standards are needed to ensure that prospective trials which incorporate textural features are properly designed to detect clinical endpoints. Supported by NIH grants R01 CA169072, U01 CA148131, NCI Contract (SAIC-Frederick) 24XS036-004, and a research contract from GE Healthcare.« less

  13. Characterizing Smartphone Engagement for Schizophrenia: Results of a Naturalist Mobile Health Study.

    PubMed

    Torous, John; Staples, Patrick; Slaters, Linda; Adams, Jared; Sandoval, Luis; Onnela, J P; Keshavan, Matcheri

    2017-08-04

    Despite growing interest in smartphone apps for schizophrenia, little is known about how these apps are utilized in the real world. Understanding how app users are engaging with these tools outside of the confines of traditional clinical studies offers an important information on who is most likely to use apps and what type of data they are willing to share. The Schizophrenia and Related Disorders Alliance of America, in partnership with Self Care Catalyst, has created a smartphone app for schizophrenia that is free and publically available on both Apple iTunes and Google Android Play stores. We analyzed user engagement data from this app across its medication tracking, mood tracking, and symptom tracking features from August 16 th 2015 to January 1 st 2017 using the R programming language. We included all registered app users in our analysis with reported ages less than 100. We analyzed a total of 43,451 mood, medication and symptom entries from 622 registered users, and excluded a single patient with a reported age of 114. Seventy one percent of the 622 users tried the mood-tracking feature at least once, 49% the symptom tracking feature, and 36% the medication-tracking feature. The mean number of uses of the mood feature was two, the symptom feature 10, and the medication feature 14. However, a small subset of users were very engaged with the app and the top 10 users for each feature accounted for 35% or greater of all entries for that feature. We find that user engagement follows a power law distribution for each feature, and this fit was largely invariant when stratifying for age or gender. Engagement with this app for schizophrenia was overall low, but similar to prior naturalistic studies for mental health app use in other diseases. The low rate of engagement in naturalistic settings, compared to higher rates of use in clinical studies, suggests the importance of clinical involvement as one factor in driving engagement for mental health apps. Power law relationships suggest strongly skewed user engagement, with a small subset of users accounting for the majority of substantial engagements. There is a need for further research on app engagement in schizophrenia.

  14. The Role of Model Complexity in Determining Patterns of Chlorophyll Variability in the Coastal Northwest North Atlantic

    NASA Astrophysics Data System (ADS)

    Kuhn, A. M.; Fennel, K.; Bianucci, L.

    2016-02-01

    A key feature of the North Atlantic Ocean's biological dynamics is the annual phytoplankton spring bloom. In the region comprising the continental shelf and adjacent deep ocean of the northwest North Atlantic, we identified two patterns of bloom development: 1) locations with cold temperatures and deep winter mixed layers, where the spring bloom peaks around April and the annual chlorophyll cycle has a large amplitude, and 2) locations with warmer temperatures and shallow winter mixed layers, where the spring bloom peaks earlier in the year, sometimes indiscernible from the fall bloom. These patterns result from a combination of limiting environmental factors and interactions among planktonic groups with different optimal requirements. Simple models that represent the ecosystem with a single phytoplankton (P) and a single zooplankton (Z) group are challenged to reproduce these ecological interactions. Here we investigate the effect that added complexity has on determining spatio-temporal chlorophyll. We compare two ecosystem models, one that contains one P and one Z group, and one with two P and three Z groups. We consider three types of changes in complexity: 1) added dependencies among variables (e.g., temperature dependent rates), 2) modified structural pathways, and 3) added pathways. Subsets of the most sensitive parameters are optimized in each model to replicate observations in the region. For computational efficiency, the parameter optimization is performed using 1D surrogates of a 3D model. We evaluate how model complexity affects model skill, and whether the optimized parameter sets found for each model modify the interpretation of ecosystem functioning. Spatial differences in the parameter sets that best represent different areas hint at the existence of different ecological communities or at physical-biological interactions that are not represented in the simplest model. Our methodology emphasizes the combined use of observations, 1D models to help identifying patterns, and 3D models able to simulate the environment modre realistically, as a means to acquire predictive understanding of the ocean's ecology.

  15. System, apparatus and methods to implement high-speed network analyzers

    DOEpatents

    Ezick, James; Lethin, Richard; Ros-Giralt, Jordi; Szilagyi, Peter; Wohlford, David E

    2015-11-10

    Systems, apparatus and methods for the implementation of high-speed network analyzers are provided. A set of high-level specifications is used to define the behavior of the network analyzer emitted by a compiler. An optimized inline workflow to process regular expressions is presented without sacrificing the semantic capabilities of the processing engine. An optimized packet dispatcher implements a subset of the functions implemented by the network analyzer, providing a fast and slow path workflow used to accelerate specific processing units. Such dispatcher facility can also be used as a cache of policies, wherein if a policy is found, then packet manipulations associated with the policy can be quickly performed. An optimized method of generating DFA specifications for network signatures is also presented. The method accepts several optimization criteria, such as min-max allocations or optimal allocations based on the probability of occurrence of each signature input bit.

  16. Acute myeloid/T-lymphoblastic leukaemia (AMTL): a distinct category of acute leukaemias with common pathogenesis in need of improved therapy.

    PubMed

    Gutierrez, Alejandro; Kentsis, Alex

    2018-03-01

    Advances in the classification of acute leukaemias have led to improved outcomes for a substantial fraction of patients. However, chemotherapy resistance remains a major problem for specific subsets of acute leukaemias. Here, we propose that a molecularly distinct subtype of acute leukaemia with shared myeloid and T cell lymphoblastic features, which we term acute myeloid/T-lymphoblastic leukaemia (AMTL), is divided across 3 diagnostic categories owing to variable expression of markers deemed to be defining of myeloid and T-lymphoid lineages, such as myeloperoxidase and CD3. This proposed diagnostic group is supported by (i) retained myeloid differentiation potential during early T cell lymphoid development, (ii) recognition that some cases of acute myeloid leukaemia (AML) harbour hallmarks of T cell development, such as T-cell receptor gene rearrangements and (iii) common gene mutations in subsets of AML and T cell acute lymphoblastic leukaemia (T-ALL), including WT1, PHF6, RUNX1 and BCL11B. This proposed diagnostic entity overlaps with early T cell precursor (ETP) T-ALL and T cell/myeloid mixed phenotype acute leukaemias (MPALs), and also includes a subset of leukaemias currently classified as AML with features of T-lymphoblastic development. The proposed classification of AMTL as a distinct entity would enable more precise prospective diagnosis and permit the development of improved therapies for patients whose treatment is inadequate with current approaches. © 2018 John Wiley & Sons Ltd.

  17. Automatic sleep scoring: a search for an optimal combination of measures.

    PubMed

    Krakovská, Anna; Mezeiová, Kristína

    2011-09-01

    The objective of this study is to find the best set of characteristics of polysomnographic signals for the automatic classification of sleep stages. A selection was made from 74 measures, including linear spectral measures, interdependency measures, and nonlinear measures of complexity that were computed for the all-night polysomnographic recordings of 20 healthy subjects. The adopted multidimensional analysis involved quadratic discriminant analysis, forward selection procedure, and selection by the best subset procedure. Two situations were considered: the use of four polysomnographic signals (EEG, EMG, EOG, and ECG) and the use of the EEG alone. For the given database, the best automatic sleep classifier achieved approximately an 81% agreement with the hypnograms of experts. The classifier was based on the next 14 features of polysomnographic signals: the ratio of powers in the beta and delta frequency range (EEG, channel C3), the fractal exponent (EMG), the variance (EOG), the absolute power in the sigma 1 band (EEG, C3), the relative power in the delta 2 band (EEG, O2), theta/gamma (EEG, C3), theta/alpha (EEG, O1), sigma/gamma (EEG, C4), the coherence in the delta 1 band (EEG, O1-O2), the entropy (EMG), the absolute theta 2 (EEG, Fp1), theta/alpha (EEG, Fp1), the sigma 2 coherence (EEG, O1-C3), and the zero-crossing rate (ECG); however, even with only four features, we could perform sleep scoring with a 74% accuracy, which is comparable to the inter-rater agreement between two independent specialists. We have shown that 4-14 carefully selected polysomnographic features were sufficient for successful sleep scoring. The efficiency of the corresponding automatic classifiers was verified and conclusively demonstrated on all-night recordings from healthy adults. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Sequential projection pursuit for optimised vibration-based damage detection in an experimental wind turbine blade

    NASA Astrophysics Data System (ADS)

    Hoell, Simon; Omenzetter, Piotr

    2018-02-01

    To advance the concept of smart structures in large systems, such as wind turbines (WTs), it is desirable to be able to detect structural damage early while using minimal instrumentation. Data-driven vibration-based damage detection methods can be competitive in that respect because global vibrational responses encompass the entire structure. Multivariate damage sensitive features (DSFs) extracted from acceleration responses enable to detect changes in a structure via statistical methods. However, even though such DSFs contain information about the structural state, they may not be optimised for the damage detection task. This paper addresses the shortcoming by exploring a DSF projection technique specialised for statistical structural damage detection. High dimensional initial DSFs are projected onto a low-dimensional space for improved damage detection performance and simultaneous computational burden reduction. The technique is based on sequential projection pursuit where the projection vectors are optimised one by one using an advanced evolutionary strategy. The approach is applied to laboratory experiments with a small-scale WT blade under wind-like excitations. Autocorrelation function coefficients calculated from acceleration signals are employed as DSFs. The optimal numbers of projection vectors are identified with the help of a fast forward selection procedure. To benchmark the proposed method, selections of original DSFs as well as principal component analysis scores from these features are additionally investigated. The optimised DSFs are tested for damage detection on previously unseen data from the healthy state and a wide range of damage scenarios. It is demonstrated that using selected subsets of the initial and transformed DSFs improves damage detectability compared to the full set of features. Furthermore, superior results can be achieved by projecting autocorrelation coefficients onto just a single optimised projection vector.

  19. Clinical features of symptomatic patellofemoral joint osteoarthritis

    PubMed Central

    2012-01-01

    Introduction Patellofemoral joint osteoarthritis (OA) is common and leads to pain and disability. However, current classification criteria do not distinguish between patellofemoral and tibiofemoral joint OA. The objective of this study was to provide empirical evidence of the clinical features of patellofemoral joint OA (PFJOA) and to explore the potential for making a confident clinical diagnosis in the community setting. Methods This was a population-based cross-sectional study of 745 adults aged ≥50 years with knee pain. Information on risk factors and clinical signs and symptoms was gathered by a self-complete questionnaire, and standardised clinical interview and examination. Three radiographic views of the knee were obtained (weight-bearing semi-flexed posteroanterior, supine skyline and lateral) and individuals were classified into four subsets (no radiographic OA, isolated PFJOA, isolated tibiofemoral joint OA, combined patellofemoral/tibiofemoral joint OA) according to two different cut-offs: 'any OA' and 'moderate to severe OA'. A series of binary logistic and multinomial regression functions were performed to compare the clinical features of each subset and their ability in combination to discriminate PFJOA from other subsets. Results Distinctive clinical features of moderate to severe isolated PFJOA included a history of dramatic swelling, valgus deformity, markedly reduced quadriceps strength, and pain on patellofemoral joint compression. Mild isolated PFJOA was barely distinguished from no radiographic OA (AUC 0.71, 95% CI 0.66, 0.76) with only difficulty descending stairs and coarse crepitus marginally informative over age, sex and body mass index. Other cardinal signs of knee OA - the presence of effusion, bony enlargement, reduced flexion range of movement, mediolateral instability and varus deformity - were indicators of tibiofemoral joint OA. Conclusions Early isolated PFJOA is clinically manifest in symptoms and self-reported functional limitation but has fewer clear clinical signs. More advanced disease is indicated by a small number of simple-to-assess signs and the relative absence of classic signs of knee OA, which are predominantly manifestations of tibiofemoral joint OA. Confident diagnosis of even more advanced PFJOA may be limited in the community setting. PMID:22417687

  20. Multiple-Objective Optimal Designs for Studying the Dose Response Function and Interesting Dose Levels

    PubMed Central

    Hyun, Seung Won; Wong, Weng Kee

    2016-01-01

    We construct an optimal design to simultaneously estimate three common interesting features in a dose-finding trial with possibly different emphasis on each feature. These features are (1) the shape of the dose-response curve, (2) the median effective dose and (3) the minimum effective dose level. A main difficulty of this task is that an optimal design for a single objective may not perform well for other objectives. There are optimal designs for dual objectives in the literature but we were unable to find optimal designs for 3 or more objectives to date with a concrete application. A reason for this is that the approach for finding a dual-objective optimal design does not work well for a 3 or more multiple-objective design problem. We propose a method for finding multiple-objective optimal designs that estimate the three features with user-specified higher efficiencies for the more important objectives. We use the flexible 4-parameter logistic model to illustrate the methodology but our approach is applicable to find multiple-objective optimal designs for other types of objectives and models. We also investigate robustness properties of multiple-objective optimal designs to mis-specification in the nominal parameter values and to a variation in the optimality criterion. We also provide computer code for generating tailor made multiple-objective optimal designs. PMID:26565557

  1. Multiple-Objective Optimal Designs for Studying the Dose Response Function and Interesting Dose Levels.

    PubMed

    Hyun, Seung Won; Wong, Weng Kee

    2015-11-01

    We construct an optimal design to simultaneously estimate three common interesting features in a dose-finding trial with possibly different emphasis on each feature. These features are (1) the shape of the dose-response curve, (2) the median effective dose and (3) the minimum effective dose level. A main difficulty of this task is that an optimal design for a single objective may not perform well for other objectives. There are optimal designs for dual objectives in the literature but we were unable to find optimal designs for 3 or more objectives to date with a concrete application. A reason for this is that the approach for finding a dual-objective optimal design does not work well for a 3 or more multiple-objective design problem. We propose a method for finding multiple-objective optimal designs that estimate the three features with user-specified higher efficiencies for the more important objectives. We use the flexible 4-parameter logistic model to illustrate the methodology but our approach is applicable to find multiple-objective optimal designs for other types of objectives and models. We also investigate robustness properties of multiple-objective optimal designs to mis-specification in the nominal parameter values and to a variation in the optimality criterion. We also provide computer code for generating tailor made multiple-objective optimal designs.

  2. FSMRank: feature selection algorithm for learning to rank.

    PubMed

    Lai, Han-Jiang; Pan, Yan; Tang, Yong; Yu, Rong

    2013-06-01

    In recent years, there has been growing interest in learning to rank. The introduction of feature selection into different learning problems has been proven effective. These facts motivate us to investigate the problem of feature selection for learning to rank. We propose a joint convex optimization formulation which minimizes ranking errors while simultaneously conducting feature selection. This optimization formulation provides a flexible framework in which we can easily incorporate various importance measures and similarity measures of the features. To solve this optimization problem, we use the Nesterov's approach to derive an accelerated gradient algorithm with a fast convergence rate O(1/T(2)). We further develop a generalization bound for the proposed optimization problem using the Rademacher complexities. Extensive experimental evaluations are conducted on the public LETOR benchmark datasets. The results demonstrate that the proposed method shows: 1) significant ranking performance gain compared to several feature selection baselines for ranking, and 2) very competitive performance compared to several state-of-the-art learning-to-rank algorithms.

  3. An Optimal Basketball Free Throw

    ERIC Educational Resources Information Center

    Seppala-Holtzman, David

    2012-01-01

    A basketball player attempting a free throw has two parameters under his or her control: the angle of elevation and the force with which the ball is thrown. We compute upper and lower bounds for the initial velocity for suitable values of the angle of elevation, generating a subset of the configuration space of all successful free throws. A…

  4. Controlled and Uncontrolled Subject Descriptions in the CF Database: A Comparison of Optimal Cluster-Based Retrieval Results.

    ERIC Educational Resources Information Center

    Shaw, W. M., Jr.

    1993-01-01

    Describes a study conducted on the cystic fibrosis (CF) database, a subset of MEDLINE, that investigated clustering structure and the effectiveness of cluster-based retrieval as a function of the exhaustivity of the uncontrolled subject descriptions. Results are compared to calculations for controlled descriptions based on Medical Subject Headings…

  5. Social Media: Menagerie of Metrics

    DTIC Science & Technology

    2010-01-27

    intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm . An EA...Cloning - 22 Animals were cloned to date; genetic algorithms can help prediction (e.g. “elitism” - attempts to ensure selection by including performers...28, 2010 Evolutionary Algorithm • Evolutionary algorithm From Wikipedia, the free encyclopedia Artificial intelligence portal In artificial

  6. Age at Assessment a Critical Factor When Monitoring Early Communicative Skills in Children with Galactosaemia

    ERIC Educational Resources Information Center

    Lewis, Fiona M.; DeJonge, Shannon M.; Coman, David J.

    2014-01-01

    Sub-optimal language development is associated with the metabolic disorder galactosaemia (GAL). Some children with GAL are identified with language impairment from the initial stages of language learning, but a subset of children may exhibit disrupted developmental gains in speech and language skill after a period of age-appropriate skill…

  7. Water quality parameter measurement using spectral signatures

    NASA Technical Reports Server (NTRS)

    White, P. E.

    1973-01-01

    Regression analysis is applied to the problem of measuring water quality parameters from remote sensing spectral signature data. The equations necessary to perform regression analysis are presented and methods of testing the strength and reliability of a regression are described. An efficient algorithm for selecting an optimal subset of the independent variables available for a regression is also presented.

  8. Investigating Optimal Learning Moments in U.S. and Finnish Science Classes

    ERIC Educational Resources Information Center

    Schneider, Barbara; Krajcik, Joseph; Lavonen, Jari; Salmela-Aro, Katariina; Broda, Michael; Spicer, Justina; Bruner, Justin; Moeller, Julia; Linnansaari, Janna; Juuti, Kalle; Viljaranta, Jaana

    2016-01-01

    This study explores how often students are engaged in their science classes and their affective states during these times, using an innovative methodology that records these experiences "in situ". Sampling a subset of high schools in the U.S. and Finland, we collected over 7,000 momentary responses from 344 students over the course of a…

  9. Pareto-optimal multi-objective dimensionality reduction deep auto-encoder for mammography classification.

    PubMed

    Taghanaki, Saeid Asgari; Kawahara, Jeremy; Miles, Brandon; Hamarneh, Ghassan

    2017-07-01

    Feature reduction is an essential stage in computer aided breast cancer diagnosis systems. Multilayer neural networks can be trained to extract relevant features by encoding high-dimensional data into low-dimensional codes. Optimizing traditional auto-encoders works well only if the initial weights are close to a proper solution. They are also trained to only reduce the mean squared reconstruction error (MRE) between the encoder inputs and the decoder outputs, but do not address the classification error. The goal of the current work is to test the hypothesis that extending traditional auto-encoders (which only minimize reconstruction error) to multi-objective optimization for finding Pareto-optimal solutions provides more discriminative features that will improve classification performance when compared to single-objective and other multi-objective approaches (i.e. scalarized and sequential). In this paper, we introduce a novel multi-objective optimization of deep auto-encoder networks, in which the auto-encoder optimizes two objectives: MRE and mean classification error (MCE) for Pareto-optimal solutions, rather than just MRE. These two objectives are optimized simultaneously by a non-dominated sorting genetic algorithm. We tested our method on 949 X-ray mammograms categorized into 12 classes. The results show that the features identified by the proposed algorithm allow a classification accuracy of up to 98.45%, demonstrating favourable accuracy over the results of state-of-the-art methods reported in the literature. We conclude that adding the classification objective to the traditional auto-encoder objective and optimizing for finding Pareto-optimal solutions, using evolutionary multi-objective optimization, results in producing more discriminative features. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Ant-cuckoo colony optimization for feature selection in digital mammogram.

    PubMed

    Jona, J B; Nagaveni, N

    2014-01-15

    Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.

  11. Identifying marker genes in transcription profiling data using a mixture of feature relevance experts.

    PubMed

    Chow, M L; Moler, E J; Mian, I S

    2001-03-08

    Transcription profiling experiments permit the expression levels of many genes to be measured simultaneously. Given profiling data from two types of samples, genes that most distinguish the samples (marker genes) are good candidates for subsequent in-depth experimental studies and developing decision support systems for diagnosis, prognosis, and monitoring. This work proposes a mixture of feature relevance experts as a method for identifying marker genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevance expert implements an algorithm that calculates how well a gene distinguishes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to determine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset consisting of the top 50 genes distinguished ALL from AML samples as completely as all 7,070 genes. The 125 genes at the union of the top 50s are plausible markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersection of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T cell/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W may discriminate T cells from B cells. Results from analysis of transcription profiling data from tumor/nontumor colon adenocarcinoma samples support the general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating feature relevance experts, and the impact of potentially mislabeled samples on marker identification (feature selection) are discussed.

  12. A reference dataset for deformable image registration spatial accuracy evaluation using the COPDgene study archive

    NASA Astrophysics Data System (ADS)

    Castillo, Richard; Castillo, Edward; Fuentes, David; Ahmad, Moiz; Wood, Abbie M.; Ludwig, Michelle S.; Guerrero, Thomas

    2013-05-01

    Landmark point-pairs provide a strategy to assess deformable image registration (DIR) accuracy in terms of the spatial registration of the underlying anatomy depicted in medical images. In this study, we propose to augment a publicly available database (www.dir-lab.com) of medical images with large sets of manually identified anatomic feature pairs between breath-hold computed tomography (BH-CT) images for DIR spatial accuracy evaluation. Ten BH-CT image pairs were randomly selected from the COPDgene study cases. Each patient had received CT imaging of the entire thorax in the supine position at one-fourth dose normal expiration and maximum effort full dose inspiration. Using dedicated in-house software, an imaging expert manually identified large sets of anatomic feature pairs between images. Estimates of inter- and intra-observer spatial variation in feature localization were determined by repeat measurements of multiple observers over subsets of randomly selected features. 7298 anatomic landmark features were manually paired between the 10 sets of images. Quantity of feature pairs per case ranged from 447 to 1172. Average 3D Euclidean landmark displacements varied substantially among cases, ranging from 12.29 (SD: 6.39) to 30.90 (SD: 14.05) mm. Repeat registration of uniformly sampled subsets of 150 landmarks for each case yielded estimates of observer localization error, which ranged in average from 0.58 (SD: 0.87) to 1.06 (SD: 2.38) mm for each case. The additions to the online web database (www.dir-lab.com) described in this work will broaden the applicability of the reference data, providing a freely available common dataset for targeted critical evaluation of DIR spatial accuracy performance in multiple clinical settings. Estimates of observer variance in feature localization suggest consistent spatial accuracy for all observers across both four-dimensional CT and COPDgene patient cohorts.

  13. Significant CD4, CD8, and CD19 lymphopenia in peripheral blood of sarcoidosis patients correlates with severe disease manifestations.

    PubMed

    Sweiss, Nadera J; Salloum, Rafah; Gandhi, Seema; Ghandi, Seema; Alegre, Maria-Luisa; Sawaqed, Ray; Badaracco, Maria; Pursell, Kenneth; Pitrak, David; Baughman, Robert P; Moller, David R; Garcia, Joe G N; Niewold, Timothy B

    2010-02-05

    Sarcoidosis is a poorly understood chronic inflammatory condition. Infiltration of affected organs by lymphocytes is characteristic of sarcoidosis, however previous reports suggest that circulating lymphocyte counts are low in some patients with the disease. The goal of this study was to evaluate lymphocyte subsets in peripheral blood in a cohort of sarcoidosis patients to determine the prevalence, severity, and clinical features associated with lymphopenia in major lymphocyte subsets. Lymphocyte subsets in 28 sarcoid patients were analyzed using flow cytometry to determine the percentage of CD4, CD8, and CD19 positive cells. Greater than 50% of patients had abnormally low CD4, CD8, or CD19 counts (p<4x10(-10)). Lymphopenia was profound in some cases, and five of the patients had absolute CD4 counts below 200. CD4, CD8, and CD19 lymphocyte subset counts were significantly correlated (Spearman's rho 0.57, p = 0.0017), and 10 patients had low counts in all three subsets. Patients with severe organ system involvement including neurologic, cardiac, ocular, and advanced pulmonary disease had lower lymphocyte subset counts as a group than those patients with less severe manifestations (CD4 p = 0.0043, CD8 p = 0.026, CD19 p = 0.033). No significant relationships were observed between various medical therapies and lymphocyte counts, and lymphopenia was present in patients who were not receiving any medical therapy. Significant lymphopenia involving CD4, CD8, and CD19 positive cells was common in sarcoidosis patients and correlated with disease severity. Our findings suggest that lymphopenia relates more to disease pathology than medical treatment.

  14. Significant CD4, CD8, and CD19 Lymphopenia in Peripheral Blood of Sarcoidosis Patients Correlates with Severe Disease Manifestations

    PubMed Central

    Sweiss, Nadera J.; Salloum, Rafah; Ghandi, Seema; Alegre, Maria-Luisa; Sawaqed, Ray; Badaracco, Maria; Pursell, Kenneth; Pitrak, David; Baughman, Robert P.; Moller, David R.; Garcia, Joe G. N.; Niewold, Timothy B.

    2010-01-01

    Background Sarcoidosis is a poorly understood chronic inflammatory condition. Infiltration of affected organs by lymphocytes is characteristic of sarcoidosis, however previous reports suggest that circulating lymphocyte counts are low in some patients with the disease. The goal of this study was to evaluate lymphocyte subsets in peripheral blood in a cohort of sarcoidosis patients to determine the prevalence, severity, and clinical features associated with lymphopenia in major lymphocyte subsets. Methodology/Principal Findings Lymphocyte subsets in 28 sarcoid patients were analyzed using flow cytometry to determine the percentage of CD4, CD8, and CD19 positive cells. Greater than 50% of patients had abnormally low CD4, CD8, or CD19 counts (p<4×10−10). Lymphopenia was profound in some cases, and five of the patients had absolute CD4 counts below 200. CD4, CD8, and CD19 lymphocyte subset counts were significantly correlated (Spearman's rho 0.57, p = 0.0017), and 10 patients had low counts in all three subsets. Patients with severe organ system involvement including neurologic, cardiac, ocular, and advanced pulmonary disease had lower lymphocyte subset counts as a group than those patients with less severe manifestations (CD4 p = 0.0043, CD8 p = 0.026, CD19 p = 0.033). No significant relationships were observed between various medical therapies and lymphocyte counts, and lymphopenia was present in patients who were not receiving any medical therapy. Conclusions/Significance Significant lymphopenia involving CD4, CD8, and CD19 positive cells was common in sarcoidosis patients and correlated with disease severity. Our findings suggest that lymphopenia relates more to disease pathology than medical treatment. PMID:20140091

  15. Targeted depletion of a MDSC subset unmasks pancreatic ductal adenocarcinoma to adaptive immunity

    PubMed Central

    Stromnes, Ingunn M.; Brockenbrough, Scott; Izeradjene, Kamel; Carlson, Markus A.; Cuevas, Carlos; Simmons, Randi M.; Greenberg, Philip D.; Hingorani, Sunil R.

    2015-01-01

    Objective Pancreatic ductal adenocarcinoma (PDA) is characterized by a robust desmoplasia, including the notable accumulation of immunosuppressive cells that shield neoplastic cells from immune detection. Immune evasion may be further enhanced if the malignant cells fail to express high levels of antigens that are sufficiently immunogenic to engender an effector T cell response. In this report, we investigate the predominant subsets of immunosuppressive cancer-conditioned myeloid cells that chronicle and shape pancreas cancer progression. We show that selective depletion of one subset of myeloid-derived suppressor cells (MDSC) in an autochthonous, genetically engineered mouse model (GEMM) of PDA unmasks the ability of the adaptive immune response to engage and target tumor epithelial cells. Methods A combination of in vivo and in vitro studies were performed employing a GEMM that faithfully recapitulates the cardinal features of human PDA. The predominant cancer-conditioned myeloid cell subpopulation was specifically targeted in vivo and the biological outcomes determined. Results PDA orchestrates the induction of distinct subsets of cancer-associated myeloid cells through the production of factors known to influence myelopoeisis. These immature myeloid cells inhibit the proliferation and induce apoptosis of activated T cells. Targeted depletion of granulocytic MDSC (Gr-MDSC) in autochthonous PDA increases the intratumoral accumulation of activated CD8 T cells and apoptosis of tumor epithelial cells, and also remodels the tumor stroma. Conclusions Neoplastic ductal cells of the pancreas induce distinct myeloid cell subsets that promote tumor cell survival and accumulation. Targeted depletion of a single myeloid subset, the Gr-MDSC, can unmask an endogenous T cell response, revealing an unexpected latent immunity and invoking targeting of Gr-MDSC as a potential strategy to exploit for treating this highly lethal disease. PMID:24555999

  16. Enhanced use of CLIPS at the Los Alamos National Laboratory

    NASA Technical Reports Server (NTRS)

    Duerre, K. H.; Parkinson, W. J.; Osowski, J. J.

    1991-01-01

    Early efforts for producing expert systems for engineering applications used a limited subset of C Language Integrated Production System (CLIPS) features. The implementation details of previous expert systems and of the current expert system, which is used for training operators in the control of the Isotope Separation System, are discussed.

  17. Commercial tree species discrimination using airborne AISA Eagle hyperspectral imagery and partial least squares discriminant analysis (PLS-DA) in KwaZulu-Natal, South Africa

    NASA Astrophysics Data System (ADS)

    Peerbhay, Kabir Yunus; Mutanga, Onisimo; Ismail, Riyad

    2013-05-01

    Discriminating commercial tree species using hyperspectral remote sensing techniques is critical in monitoring the spatial distributions and compositions of commercial forests. However, issues related to data dimensionality and multicollinearity limit the successful application of the technology. The aim of this study was to examine the utility of the partial least squares discriminant analysis (PLS-DA) technique in accurately classifying six exotic commercial forest species (Eucalyptus grandis, Eucalyptus nitens, Eucalyptus smithii, Pinus patula, Pinus elliotii and Acacia mearnsii) using airborne AISA Eagle hyperspectral imagery (393-900 nm). Additionally, the variable importance in the projection (VIP) method was used to identify subsets of bands that could successfully discriminate the forest species. Results indicated that the PLS-DA model that used all the AISA Eagle bands (n = 230) produced an overall accuracy of 80.61% and a kappa value of 0.77, with user's and producer's accuracies ranging from 50% to 100%. In comparison, incorporating the optimal subset of VIP selected wavebands (n = 78) in the PLS-DA model resulted in an improved overall accuracy of 88.78% and a kappa value of 0.87, with user's and producer's accuracies ranging from 70% to 100%. Bands located predominantly within the visible region of the electromagnetic spectrum (393-723 nm) showed the most capability in terms of discriminating between the six commercial forest species. Overall, the research has demonstrated the potential of using PLS-DA for reducing the dimensionality of hyperspectral datasets as well as determining the optimal subset of bands to produce the highest classification accuracies.

  18. A method of 3D object recognition and localization in a cloud of points

    NASA Astrophysics Data System (ADS)

    Bielicki, Jerzy; Sitnik, Robert

    2013-12-01

    The proposed method given in this article is prepared for analysis of data in the form of cloud of points directly from 3D measurements. It is designed for use in the end-user applications that can directly be integrated with 3D scanning software. The method utilizes locally calculated feature vectors (FVs) in point cloud data. Recognition is based on comparison of the analyzed scene with reference object library. A global descriptor in the form of a set of spatially distributed FVs is created for each reference model. During the detection process, correlation of subsets of reference FVs with FVs calculated in the scene is computed. Features utilized in the algorithm are based on parameters, which qualitatively estimate mean and Gaussian curvatures. Replacement of differentiation with averaging in the curvatures estimation makes the algorithm more resistant to discontinuities and poor quality of the input data. Utilization of the FV subsets allows to detect partially occluded and cluttered objects in the scene, while additional spatial information maintains false positive rate at a reasonably low level.

  19. An automated decision-tree approach to predicting protein interaction hot spots.

    PubMed

    Darnell, Steven J; Page, David; Mitchell, Julie C

    2007-09-01

    Protein-protein interactions can be altered by mutating one or more "hot spots," the subset of residues that account for most of the interface's binding free energy. The identification of hot spots requires a significant experimental effort, highlighting the practical value of hot spot predictions. We present two knowledge-based models that improve the ability to predict hot spots: K-FADE uses shape specificity features calculated by the Fast Atomic Density Evaluation (FADE) program, and K-CON uses biochemical contact features. The combined K-FADE/CON (KFC) model displays better overall predictive accuracy than computational alanine scanning (Robetta-Ala). In addition, because these methods predict different subsets of known hot spots, a large and significant increase in accuracy is achieved by combining KFC and Robetta-Ala. The KFC analysis is applied to the calmodulin (CaM)/smooth muscle myosin light chain kinase (smMLCK) interface, and to the bone morphogenetic protein-2 (BMP-2)/BMP receptor-type I (BMPR-IA) interface. The results indicate a strong correlation between KFC hot spot predictions and mutations that significantly reduce the binding affinity of the interface. 2007 Wiley-Liss, Inc.

  20. RNA-seq Analysis Reveals Unique Transcriptome Signatures in Systemic Lupus Erythematosus Patients with Distinct Autoantibody Specificities

    PubMed Central

    Rai, Richa; Chauhan, Sudhir Kumar; Singh, Vikas Vikram; Rai, Madhukar; Rai, Geeta

    2016-01-01

    Systemic lupus erythematosus (SLE) patients exhibit immense heterogeneity which is challenging from the diagnostic perspective. Emerging high throughput sequencing technologies have been proved to be a useful platform to understand the complex and dynamic disease processes. SLE patients categorised based on autoantibody specificities are reported to have differential immuno-regulatory mechanisms. Therefore, we performed RNA-seq analysis to identify transcriptomics of SLE patients with distinguished autoantibody specificities. The SLE patients were segregated into three subsets based on the type of autoantibodies present in their sera (anti-dsDNA+ group with anti-dsDNA autoantibody alone; anti-ENA+ group having autoantibodies against extractable nuclear antigens (ENA) only, and anti-dsDNA+ENA+ group having autoantibodies to both dsDNA and ENA). Global transcriptome profiling for each SLE patients subsets was performed using Illumina® Hiseq-2000 platform. The biological relevance of dysregulated transcripts in each SLE subsets was assessed by ingenuity pathway analysis (IPA) software. We observed that dysregulation in the transcriptome expression pattern was clearly distinct in each SLE patients subsets. IPA analysis of transcripts uniquely expressed in different SLE groups revealed specific biological pathways to be affected in each SLE subsets. Multiple cytokine signaling pathways were specifically dysregulated in anti-dsDNA+ patients whereas Interferon signaling was predominantly dysregulated in anti-ENA+ patients. In anti-dsDNA+ENA+ patients regulation of actin based motility by Rho pathway was significantly affected. The granulocyte gene signature was a common feature to all SLE subsets; however, anti-dsDNA+ group showed relatively predominant expression of these genes. Dysregulation of Plasma cell related transcripts were higher in anti-dsDNA+ and anti-ENA+ patients as compared to anti-dsDNA+ ENA+. Association of specific canonical pathways with the uniquely expressed transcripts in each SLE subgroup indicates that specific immunological disease mechanisms are operative in distinct SLE patients’ subsets. This ‘sub-grouping’ approach could further be useful for clinical evaluation of SLE patients and devising targeted therapeutics. PMID:27835693

Top