optimal feature set: Topics by Science.gov

Sample records for optimal feature set

Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques.

PubMed

Mathieson, Luke; Mendes, Alexandre; Marsden, John; Pond, Jeffrey; Moscato, Pablo

2017-01-01

This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686-690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.
Local Feature Selection for Data Classification.

PubMed

Armanfard, Narges; Reilly, James P; Komeili, Majid

2016-06-01

Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.
A primitive study of voxel feature generation by multiple stacked denoising autoencoders for detecting cerebral aneurysms on MRA

NASA Astrophysics Data System (ADS)

Nemoto, Mitsutaka; Hayashi, Naoto; Hanaoka, Shouhei; Nomura, Yukihiro; Miki, Soichiro; Yoshikawa, Takeharu; Ohtomo, Kuni

2016-03-01

The purpose of this study is to evaluate the feasibility of a novel feature generation, which is based on multiple deep neural networks (DNNs) with boosting, for computer-assisted detection (CADe). It is hard and time-consuming to optimize the hyperparameters for DNNs such as stacked denoising autoencoder (SdA). The proposed method allows using SdA based features without the burden of the hyperparameter setting. The proposed method was evaluated by an application for detecting cerebral aneurysms on magnetic resonance angiogram (MRA). A baseline CADe process included four components; scaling, candidate area limitation, candidate detection, and candidate classification. Proposed feature generation method was applied to extract the optimal features for candidate classification. Proposed method only required setting range of the hyperparameters for SdA. The optimal feature set was selected from a large quantity of SdA based features by multiple SdAs, each of which was trained using different hyperparameter set. The feature selection was operated through ada-boost ensemble learning method. Training of the baseline CADe process and proposed feature generation were operated with 200 MRA cases, and the evaluation was performed with 100 MRA cases. Proposed method successfully provided SdA based features just setting the range of some hyperparameters for SdA. The CADe process by using both previous voxel features and SdA based features had the best performance with 0.838 of an area under ROC curve and 0.312 of ANODE score. The results showed that proposed method was effective in the application for detecting cerebral aneurysms on MRA.
On a distinctive feature of problems of calculating time-average characteristics of nuclear reactor optimal control sets

NASA Astrophysics Data System (ADS)

Trifonenkov, A. V.; Trifonenkov, V. P.

2017-01-01

This article deals with a feature of problems of calculating time-average characteristics of nuclear reactor optimal control sets. The operation of a nuclear reactor during threatened period is considered. The optimal control search problem is analysed. The xenon poisoning causes limitations on the variety of statements of the problem of calculating time-average characteristics of a set of optimal reactor power off controls. The level of xenon poisoning is limited. There is a problem of choosing an appropriate segment of the time axis to ensure that optimal control problem is consistent. Two procedures of estimation of the duration of this segment are considered. Two estimations as functions of the xenon limitation were plot. Boundaries of the interval of averaging are defined more precisely.
Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization.

PubMed

Fang, Chunying; Li, Haifeng; Ma, Lin; Zhang, Mancai

2017-01-01

Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S -transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S -transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F -score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.
Optimizing Nanoscale Quantitative Optical Imaging of Subfield Scattering Targets

PubMed Central

Henn, Mark-Alexander; Barnes, Bryan M.; Zhou, Hui; Sohn, Martin; Silver, Richard M.

2016-01-01

The full 3-D scattered field above finite sets of features has been shown to contain a continuum of spatial frequency information, and with novel optical microscopy techniques and electromagnetic modeling, deep-subwavelength geometrical parameters can be determined. Similarly, by using simulations, scattering geometries and experimental conditions can be established to tailor scattered fields that yield lower parametric uncertainties while decreasing the number of measurements and the area of such finite sets of features. Such optimized conditions are reported through quantitative optical imaging in 193 nm scatterfield microscopy using feature sets up to four times smaller in area than state-of-the-art critical dimension targets. PMID:27805660
Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

PubMed

Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

2016-01-01

Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.
Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

PubMed Central

Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

2016-01-01

Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

PubMed

Li, Der-Chiang; Liu, Chiao-Wen; Hu, Susan C

2011-05-01

Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small. This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper. This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance. This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches. Copyright © 2011 Elsevier B.V. All rights reserved.
Classification of large-scale fundus image data sets: a cloud-computing framework.

PubMed

Roychowdhury, Sohini

2016-08-01

Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
Neonatal Seizure Detection Using Deep Convolutional Neural Networks.

PubMed

Ansari, Amir H; Cherian, Perumpillichira J; Caicedo, Alexander; Naulaers, Gunnar; De Vos, Maarten; Van Huffel, Sabine

2018-04-02

Identifying a core set of features is one of the most important steps in the development of an automated seizure detector. In most of the published studies describing features and seizure classifiers, the features were hand-engineered, which may not be optimal. The main goal of the present paper is using deep convolutional neural networks (CNNs) and random forest to automatically optimize feature selection and classification. The input of the proposed classifier is raw multi-channel EEG and the output is the class label: seizure/nonseizure. By training this network, the required features are optimized, while fitting a nonlinear classifier on the features. After training the network with EEG recordings of 26 neonates, five end layers performing the classification were replaced with a random forest classifier in order to improve the performance. This resulted in a false alarm rate of 0.9 per hour and seizure detection rate of 77% using a test set of EEG recordings of 22 neonates that also included dubious seizures. The newly proposed CNN classifier outperformed three data-driven feature-based approaches and performed similar to a previously developed heuristic method.
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.

PubMed

Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco

2017-04-01

In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
Geospatial Analytics in Retail Site Selection and Sales Prediction.

PubMed

Ting, Choo-Yee; Ho, Chiung Ching; Yee, Hui Jia; Matsah, Wan Razali

2018-03-01

Studies have shown that certain features from geography, demography, trade area, and environment can play a vital role in retail site selection, largely due to the impact they asserted on retail performance. Although the relevant features could be elicited by domain experts, determining the optimal feature set can be intractable and labor-intensive exercise. The challenges center around (1) how to determine features that are important to a particular retail business and (2) how to estimate retail sales performance given a new location? The challenges become apparent when the features vary across time. In this light, this study proposed a nonintervening approach by employing feature selection algorithms and subsequently sales prediction through similarity-based methods. The results of prediction were validated by domain experts. In this study, data sets from different sources were transformed and aggregated before an analytics data set that is ready for analysis purpose could be obtained. The data sets included data about feature location, population count, property type, education status, and monthly sales from 96 branches of a telecommunication company in Malaysia. The finding suggested that (1) optimal retail performance can only be achieved through fulfillment of specific location features together with the surrounding trade area characteristics and (2) similarity-based method can provide solution to retail sales prediction.
The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

PubMed

Cheng, Qiang; Zhou, Hongbo; Cheng, Jie

2011-06-01

Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.
Feature instructions improve face-matching accuracy

PubMed Central

Bindemann, Markus

2018-01-01

Identity comparisons of photographs of unfamiliar faces are prone to error but important for applied settings, such as person identification at passport control. Finding techniques to improve face-matching accuracy is therefore an important contemporary research topic. This study investigated whether matching accuracy can be improved by instruction to attend to specific facial features. Experiment 1 showed that instruction to attend to the eyebrows enhanced matching accuracy for optimized same-day same-race face pairs but not for other-race faces. By contrast, accuracy was unaffected by instruction to attend to the eyes, and declined with instruction to attend to ears. Experiment 2 replicated the eyebrow-instruction improvement with a different set of same-race faces, comprising both optimized same-day and more challenging different-day face pairs. These findings suggest that instruction to attend to specific features can enhance face-matching accuracy, but feature selection is crucial and generalization across face sets may be limited. PMID:29543822
Remote sensing imagery classification using multi-objective gravitational search algorithm

NASA Astrophysics Data System (ADS)

Zhang, Aizhu; Sun, Genyun; Wang, Zhenjie

2016-10-01

Simultaneous optimization of different validity measures can capture different data characteristics of remote sensing imagery (RSI) and thereby achieving high quality classification results. In this paper, two conflicting cluster validity indices, the Xie-Beni (XB) index and the fuzzy C-means (FCM) (Jm) measure, are integrated with a diversity-enhanced and memory-based multi-objective gravitational search algorithm (DMMOGSA) to present a novel multi-objective optimization based RSI classification method. In this method, the Gabor filter method is firstly implemented to extract texture features of RSI. Then, the texture features are syncretized with the spectral features to construct the spatial-spectral feature space/set of the RSI. Afterwards, cluster of the spectral-spatial feature set is carried out on the basis of the proposed method. To be specific, cluster centers are randomly generated initially. After that, the cluster centers are updated and optimized adaptively by employing the DMMOGSA. Accordingly, a set of non-dominated cluster centers are obtained. Therefore, numbers of image classification results of RSI are produced and users can pick up the most promising one according to their problem requirements. To quantitatively and qualitatively validate the effectiveness of the proposed method, the proposed classification method was applied to classifier two aerial high-resolution remote sensing imageries. The obtained classification results are compared with that produced by two single cluster validity index based and two state-of-the-art multi-objective optimization algorithms based classification results. Comparison results show that the proposed method can achieve more accurate RSI classification.
Computer-aided diagnosis of psoriasis skin images with HOS, texture and color features: A first comparative study of its kind.

PubMed

Shrivastava, Vimal K; Londhe, Narendra D; Sonawane, Rajendra S; Suri, Jasjit S

2016-04-01

Psoriasis is an autoimmune skin disease with red and scaly plaques on skin and affecting about 125 million people worldwide. Currently, dermatologist use visual and haptic methods for diagnosis the disease severity. This does not help them in stratification and risk assessment of the lesion stage and grade. Further, current methods add complexity during monitoring and follow-up phase. The current diagnostic tools lead to subjectivity in decision making and are unreliable and laborious. This paper presents a first comparative performance study of its kind using principal component analysis (PCA) based CADx system for psoriasis risk stratification and image classification utilizing: (i) 11 higher order spectra (HOS) features, (ii) 60 texture features, and (iii) 86 color feature sets and their seven combinations. Aggregate 540 image samples (270 healthy and 270 diseased) from 30 psoriasis patients of Indian ethnic origin are used in our database. Machine learning using PCA is used for dominant feature selection which is then fed to support vector machine classifier (SVM) to obtain optimized performance. Three different protocols are implemented using three kinds of feature sets. Reliability index of the CADx is computed. Among all feature combinations, the CADx system shows optimal performance of 100% accuracy, 100% sensitivity and specificity, when all three sets of feature are combined. Further, our experimental result with increasing data size shows that all feature combinations yield high reliability index throughout the PCA-cutoffs except color feature set and combination of color and texture feature sets. HOS features are powerful in psoriasis disease classification and stratification. Even though, independently, all three set of features HOS, texture, and color perform competitively, but when combined, the machine learning system performs the best. The system is fully automated, reliable and accurate. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Evolutionary optimization of radial basis function classifiers for data mining applications.

PubMed

Buchtala, Oliver; Klimek, Manuel; Sick, Bernhard

2005-10-01

In many data mining applications that address classification problems, feature and model selection are considered as key tasks. That is, appropriate input features of the classifier must be selected from a given (and often large) set of possible features and structure parameters of the classifier must be adapted with respect to these features and a given data set. This paper describes an evolutionary algorithm (EA) that performs feature and model selection simultaneously for radial basis function (RBF) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the EA significantly: hybrid training of RBF networks, lazy evaluation, consideration of soft constraints by means of penalty terms, and temperature-based adaptive control of the EA. The feasibility and the benefits of the approach are demonstrated by means of four data mining problems: intrusion detection in computer networks, biometric signature verification, customer acquisition with direct marketing methods, and optimization of chemical production processes. It is shown that, compared to earlier EA-based RBF optimization techniques, the runtime is reduced by up to 99% while error rates are lowered by up to 86%, depending on the application. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms.
Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms.

PubMed

Lin, Kuan-Cheng; Hsieh, Yi-Hsiu

2015-10-01

The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.
Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes

NASA Astrophysics Data System (ADS)

Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie

2017-03-01

Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).

Genetic algorithm for the optimization of features and neural networks in ECG signals classification

NASA Astrophysics Data System (ADS)

Li, Hongqiang; Yuan, Danyang; Ma, Xiangdong; Cui, Dianyin; Cao, Lu

2017-01-01

Feature extraction and classification of electrocardiogram (ECG) signals are necessary for the automatic diagnosis of cardiac diseases. In this study, a novel method based on genetic algorithm-back propagation neural network (GA-BPNN) for classifying ECG signals with feature extraction using wavelet packet decomposition (WPD) is proposed. WPD combined with the statistical method is utilized to extract the effective features of ECG signals. The statistical features of the wavelet packet coefficients are calculated as the feature sets. GA is employed to decrease the dimensions of the feature sets and to optimize the weights and biases of the back propagation neural network (BPNN). Thereafter, the optimized BPNN classifier is applied to classify six types of ECG signals. In addition, an experimental platform is constructed for ECG signal acquisition to supply the ECG data for verifying the effectiveness of the proposed method. The GA-BPNN method with the MIT-BIH arrhythmia database achieved a dimension reduction of nearly 50% and produced good classification results with an accuracy of 97.78%. The experimental results based on the established acquisition platform indicated that the GA-BPNN method achieved a high classification accuracy of 99.33% and could be efficiently applied in the automatic identification of cardiac arrhythmias.
Real estate value prediction using multivariate regression models

NASA Astrophysics Data System (ADS)

Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

2017-11-01

The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Hybrid feature selection for supporting lightweight intrusion detection systems

NASA Astrophysics Data System (ADS)

Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin

2017-08-01

Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.
Combining heterogeneous features for colonic polyp detection in CTC based on semi-definite programming

NASA Astrophysics Data System (ADS)

Wang, Shijun; Yao, Jianhua; Petrick, Nicholas A.; Summers, Ronald M.

2009-02-01

Colon cancer is the second leading cause of cancer-related deaths in the United States. Computed tomographic colonography (CTC) combined with a computer aided detection system provides a feasible combination for improving colonic polyps detection and increasing the use of CTC for colon cancer screening. To distinguish true polyps from false positives, various features extracted from polyp candidates have been proposed. Most of these features try to capture the shape information of polyp candidates or neighborhood knowledge about the surrounding structures (fold, colon wall, etc.). In this paper, we propose a new set of shape descriptors for polyp candidates based on statistical curvature information. These features, called histogram of curvature features, are rotation, translation and scale invariant and can be treated as complementing our existing feature set. Then in order to make full use of the traditional features (defined as group A) and the new features (group B) which are highly heterogeneous, we employed a multiple kernel learning method based on semi-definite programming to identify an optimized classification kernel based on the combined set of features. We did leave-one-patient-out test on a CTC dataset which contained scans from 50 patients (with 90 6-9mm polyp detections). Experimental results show that a support vector machine (SVM) based on the combined feature set and the semi-definite optimization kernel achieved higher FROC performance compared to SVMs using the two groups of features separately. At a false positive per patient rate of 7, the sensitivity on 6-9mm polyps using the combined features improved from 0.78 (Group A) and 0.73 (Group B) to 0.82 (p<=0.01).
Intersubject variability and intrasubject reproducibility of 12-lead ECG metrics: Implications for human verification.

PubMed

Jekova, Irena; Krasteva, Vessela; Leber, Remo; Schmid, Ramun; Twerenbold, Raphael; Müller, Christian; Reichlin, Tobias; Abächerli, Roger

Electrocardiogram (ECG) biometrics is an advanced technology, not yet covered by guidelines on criteria, features and leads for maximal authentication accuracy. This study aims to define the minimal set of morphological metrics in 12-lead ECG by optimization towards high reliability and security, and validation in a person verification model across a large population. A standard 12-lead resting ECG database from 574 non-cardiac patients with two remote recordings (>1year apart) was used. A commercial ECG analysis module (Schiller AG) measured 202 morphological features, including lead-specific amplitudes, durations, ST-metrics, and axes. Coefficient of variation (CV, intersubject variability) and percent-mean-absolute-difference (PMAD, intrasubject reproducibility) defined the optimization (PMAD/CV→min) and restriction (CV<30%) criteria for selection of the most stable and distinctive features. Linear discriminant analysis (LDA) validated the non-redundant feature set for person verification. Maximal LDA verification sensitivity (85.3%) and specificity (86.4%) were validated for 11 optimal features: R-amplitude (I,II,V1,V2,V3,V5), S-amplitude (V1,V2), Tnegative-amplitude (aVR), and R-duration (aVF,V1). Copyright © 2016 Elsevier Inc. All rights reserved.
Optimal number of features as a function of sample size for various classification rules.

PubMed

Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R

2005-04-15

Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.
Filter Selection for Optimizing the Spectral Sensitivity of Broadband Multispectral Cameras Based on Maximum Linear Independence.

PubMed

Li, Sui-Xian

2018-05-07

Previous research has shown that the effectiveness of selecting filter sets from among a large set of commercial broadband filters by a vector analysis method based on maximum linear independence (MLI). However, the traditional MLI approach is suboptimal due to the need to predefine the first filter of the selected filter set to be the maximum ℓ₂ norm among all available filters. An exhaustive imaging simulation with every single filter serving as the first filter is conducted to investigate the features of the most competent filter set. From the simulation, the characteristics of the most competent filter set are discovered. Besides minimization of the condition number, the geometric features of the best-performed filter set comprise a distinct transmittance peak along the wavelength axis of the first filter, a generally uniform distribution for the peaks of the filters and substantial overlaps of the transmittance curves of the adjacent filters. Therefore, the best-performed filter sets can be recognized intuitively by simple vector analysis and just a few experimental verifications. A practical two-step framework for selecting optimal filter set is recommended, which guarantees a significant enhancement of the performance of the systems. This work should be useful for optimizing the spectral sensitivity of broadband multispectral imaging sensors.
An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications.

PubMed

Ye, Fei; Lou, Xin Yuan; Sun, Lin Fu

2017-01-01

This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm's performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.
An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications

PubMed Central

Lou, Xin Yuan; Sun, Lin Fu

2017-01-01

This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm’s performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem. PMID:28369096
Fishing for Features

ScienceCinema

Heredia-Langner, Alejandro; Cort, John; Bailey, Vanessa

2018-01-16

The Fishing for Features Signature Discovery project developed a framework for discovering signature features in challenging environments involving large and complex data sets or where phenomena may be poorly characterized or understood. Researchers at PNNL have applied the framework to the optimization of biofuels blending and to discover signatures of climate change on microbial soil communities.
Fishing for Features

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heredia-Langner, Alejandro; Cort, John; Bailey, Vanessa

2016-07-21

The Fishing for Features Signature Discovery project developed a framework for discovering signature features in challenging environments involving large and complex data sets or where phenomena may be poorly characterized or understood. Researchers at PNNL have applied the framework to the optimization of biofuels blending and to discover signatures of climate change on microbial soil communities.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

PubMed

Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

2014-01-01

Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

PubMed Central

Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

2014-01-01

Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
Fast detection of vascular plaque in optical coherence tomography images using a reduced feature set

NASA Astrophysics Data System (ADS)

Prakash, Ammu; Ocana Macias, Mariano; Hewko, Mark; Sowa, Michael; Sherif, Sherif

2018-03-01

Optical coherence tomography (OCT) images are capable of detecting vascular plaque by using the full set of 26 Haralick textural features and a standard K-means clustering algorithm. However, the use of the full set of 26 textural features is computationally expensive and may not be feasible for real time implementation. In this work, we identified a reduced set of 3 textural feature which characterizes vascular plaque and used a generalized Fuzzy C-means clustering algorithm. Our work involves three steps: 1) the reduction of a full set 26 textural feature to a reduced set of 3 textural features by using genetic algorithm (GA) optimization method 2) the implementation of an unsupervised generalized clustering algorithm (Fuzzy C-means) on the reduced feature space, and 3) the validation of our results using histology and actual photographic images of vascular plaque. Our results show an excellent match with histology and actual photographic images of vascular tissue. Therefore, our results could provide an efficient pre-clinical tool for the detection of vascular plaque in real time OCT imaging.
Toward optimal feature and time segment selection by divergence method for EEG signals classification.

PubMed

Wang, Jie; Feng, Zuren; Lu, Na; Luo, Jing

2018-06-01

Feature selection plays an important role in the field of EEG signals based motor imagery pattern classification. It is a process that aims to select an optimal feature subset from the original set. Two significant advantages involved are: lowering the computational burden so as to speed up the learning procedure and removing redundant and irrelevant features so as to improve the classification performance. Therefore, feature selection is widely employed in the classification of EEG signals in practical brain-computer interface systems. In this paper, we present a novel statistical model to select the optimal feature subset based on the Kullback-Leibler divergence measure, and automatically select the optimal subject-specific time segment. The proposed method comprises four successive stages: a broad frequency band filtering and common spatial pattern enhancement as preprocessing, features extraction by autoregressive model and log-variance, the Kullback-Leibler divergence based optimal feature and time segment selection and linear discriminate analysis classification. More importantly, this paper provides a potential framework for combining other feature extraction models and classification algorithms with the proposed method for EEG signals classification. Experiments on single-trial EEG signals from two public competition datasets not only demonstrate that the proposed method is effective in selecting discriminative features and time segment, but also show that the proposed method yields relatively better classification results in comparison with other competitive methods. Copyright © 2018 Elsevier Ltd. All rights reserved.
Computational Prediction of Protein Epsilon Lysine Acetylation Sites Based on a Feature Selection Method.

PubMed

Gao, JianZhao; Tao, Xue-Wen; Zhao, Jia; Feng, Yuan-Ming; Cai, Yu-Dong; Zhang, Ning

2017-01-01

Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Hybrid ANN optimized artificial fish swarm algorithm based classifier for classification of suspicious lesions in breast DCE-MRI

NASA Astrophysics Data System (ADS)

Janaki Sathya, D.; Geetha, K.

2017-12-01

Automatic mass or lesion classification systems are developed to aid in distinguishing between malignant and benign lesions present in the breast DCE-MR images, the systems need to improve both the sensitivity and specificity of DCE-MR image interpretation in order to be successful for clinical use. A new classifier (a set of features together with a classification method) based on artificial neural networks trained using artificial fish swarm optimization (AFSO) algorithm is proposed in this paper. The basic idea behind the proposed classifier is to use AFSO algorithm for searching the best combination of synaptic weights for the neural network. An optimal set of features based on the statistical textural features is presented. The investigational outcomes of the proposed suspicious lesion classifier algorithm therefore confirm that the resulting classifier performs better than other such classifiers reported in the literature. Therefore this classifier demonstrates that the improvement in both the sensitivity and specificity are possible through automated image analysis.
A Unified Fisher's Ratio Learning Method for Spatial Filter Optimization.

PubMed

Li, Xinyang; Guan, Cuntai; Zhang, Haihong; Ang, Kai Keng

To detect the mental task of interest, spatial filtering has been widely used to enhance the spatial resolution of electroencephalography (EEG). However, the effectiveness of spatial filtering is undermined due to the significant nonstationarity of EEG. Based on regularization, most of the conventional stationary spatial filter design methods address the nonstationarity at the cost of the interclass discrimination. Moreover, spatial filter optimization is inconsistent with feature extraction when EEG covariance matrices could not be jointly diagonalized due to the regularization. In this paper, we propose a novel framework for a spatial filter design. With Fisher's ratio in feature space directly used as the objective function, the spatial filter optimization is unified with feature extraction. Given its ratio form, the selection of the regularization parameter could be avoided. We evaluate the proposed method on a binary motor imagery data set of 16 subjects, who performed the calibration and test sessions on different days. The experimental results show that the proposed method yields improvement in classification performance for both single broadband and filter bank settings compared with conventional nonunified methods. We also provide a systematic attempt to compare different objective functions in modeling data nonstationarity with simulation studies.To detect the mental task of interest, spatial filtering has been widely used to enhance the spatial resolution of electroencephalography (EEG). However, the effectiveness of spatial filtering is undermined due to the significant nonstationarity of EEG. Based on regularization, most of the conventional stationary spatial filter design methods address the nonstationarity at the cost of the interclass discrimination. Moreover, spatial filter optimization is inconsistent with feature extraction when EEG covariance matrices could not be jointly diagonalized due to the regularization. In this paper, we propose a novel framework for a spatial filter design. With Fisher's ratio in feature space directly used as the objective function, the spatial filter optimization is unified with feature extraction. Given its ratio form, the selection of the regularization parameter could be avoided. We evaluate the proposed method on a binary motor imagery data set of 16 subjects, who performed the calibration and test sessions on different days. The experimental results show that the proposed method yields improvement in classification performance for both single broadband and filter bank settings compared with conventional nonunified methods. We also provide a systematic attempt to compare different objective functions in modeling data nonstationarity with simulation studies.
Mapping of High Value Crops Through AN Object-Based Svm Model Using LIDAR Data and Orthophoto in Agusan del Norte Philippines

NASA Astrophysics Data System (ADS)

Candare, Rudolph Joshua; Japitana, Michelle; Cubillas, James Earl; Ramirez, Cherry Bryan

2016-06-01

This research describes the methods involved in the mapping of different high value crops in Agusan del Norte Philippines using LiDAR. This project is part of the Phil-LiDAR 2 Program which aims to conduct a nationwide resource assessment using LiDAR. Because of the high resolution data involved, the methodology described here utilizes object-based image analysis and the use of optimal features from LiDAR data and Orthophoto. Object-based classification was primarily done by developing rule-sets in eCognition. Several features from the LiDAR data and Orthophotos were used in the development of rule-sets for classification. Generally, classes of objects can't be separated by simple thresholds from different features making it difficult to develop a rule-set. To resolve this problem, the image-objects were subjected to Support Vector Machine learning. SVMs have gained popularity because of their ability to generalize well given a limited number of training samples. However, SVMs also suffer from parameter assignment issues that can significantly affect the classification results. More specifically, the regularization parameter C in linear SVM has to be optimized through cross validation to increase the overall accuracy. After performing the segmentation in eCognition, the optimization procedure as well as the extraction of the equations of the hyper-planes was done in Matlab. The learned hyper-planes separating one class from another in the multi-dimensional feature-space can be thought of as super-features which were then used in developing the classifier rule set in eCognition. In this study, we report an overall classification accuracy of greater than 90% in different areas.
Fish swarm intelligent to optimize real time monitoring of chips drying using machine vision

NASA Astrophysics Data System (ADS)

Hendrawan, Y.; Hawa, L. C.; Damayanti, R.

2018-03-01

This study attempted to apply machine vision-based chips drying monitoring system which is able to optimise the drying process of cassava chips. The objective of this study is to propose fish swarm intelligent (FSI) optimization algorithms to find the most significant set of image features suitable for predicting water content of cassava chips during drying process using artificial neural network model (ANN). Feature selection entails choosing the feature subset that maximizes the prediction accuracy of ANN. Multi-Objective Optimization (MOO) was used in this study which consisted of prediction accuracy maximization and feature-subset size minimization. The results showed that the best feature subset i.e. grey mean, L(Lab) Mean, a(Lab) energy, red entropy, hue contrast, and grey homogeneity. The best feature subset has been tested successfully in ANN model to describe the relationship between image features and water content of cassava chips during drying process with R2 of real and predicted data was equal to 0.9.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heredia-Langner, Alejandro; Amidan, Brett G.; Matzner, Shari

We present results from the optimization of a re-identification process using two sets of biometric data obtained from the Civilian American and European Surface Anthropometry Resource Project (CAESAR) database. The datasets contain real measurements of features for 2378 individuals in a standing (43 features) and seated (16 features) position. A genetic algorithm (GA) was used to search a large combinatorial space where different features are available between the probe (seated) and gallery (standing) datasets. Results show that optimized model predictions obtained using less than half of the 43 gallery features and data from roughly 16% of the individuals available producemore » better re-identification rates than two other approaches that use all the information available.« less
Set covering algorithm, a subprogram of the scheduling algorithm for mission planning and logistic evaluation

NASA Technical Reports Server (NTRS)

Chang, H.

1976-01-01

A computer program using Lemke, Salkin and Spielberg's Set Covering Algorithm (SCA) to optimize a traffic model problem in the Scheduling Algorithm for Mission Planning and Logistics Evaluation (SAMPLE) was documented. SCA forms a submodule of SAMPLE and provides for input and output, subroutines, and an interactive feature for performing the optimization and arranging the results in a readily understandable form for output.
Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE.

PubMed

Chen, Qi; Meng, Zhaopeng; Liu, Xinyi; Jin, Qianguo; Su, Ran

2018-06-15

Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.
Curved planar reformation and optimal path tracing (CROP) method for false positive reduction in computer-aided detection of pulmonary embolism in CTPA

NASA Astrophysics Data System (ADS)

Zhou, Chuan; Chan, Heang-Ping; Guo, Yanhui; Wei, Jun; Chughtai, Aamer; Hadjiiski, Lubomir M.; Sundaram, Baskaran; Patel, Smita; Kuriakose, Jean W.; Kazerooni, Ella A.

2013-03-01

The curved planar reformation (CPR) method re-samples the vascular structures along the vessel centerline to generate longitudinal cross-section views. The CPR technique has been commonly used in coronary CTA workstation to facilitate radiologists' visual assessment of coronary diseases, but has not yet been used for pulmonary vessel analysis in CTPA due to the complicated tree structures and the vast network of pulmonary vasculature. In this study, a new curved planar reformation and optimal path tracing (CROP) method was developed to facilitate feature extraction and false positive (FP) reduction and improve our PE detection system. PE candidates are first identified in the segmented pulmonary vessels at prescreening. Based on Dijkstra's algorithm, the optimal path (OP) is traced from the pulmonary trunk bifurcation point to each PE candidate. The traced vessel is then straightened and a reformatted volume is generated using CPR. Eleven new features that characterize the intensity, gradient, and topology are extracted from the PE candidate in the CPR volume and combined with the previously developed 9 features to form a new feature space for FP classification. With IRB approval, CTPA of 59 PE cases were retrospectively collected from our patient files (UM set) and 69 PE cases from the PIOPED II data set with access permission. 595 and 800 PEs were manually marked by experienced radiologists as reference standard for the UM and PIOPED set, respectively. At a test sensitivity of 80%, the average FP rate was improved from 18.9 to 11.9 FPs/case with the new method for the PIOPED set when the UM set was used for training. The FP rate was improved from 22.6 to 14.2 FPs/case for the UM set when the PIOPED set was used for training. The improvement in the free response receiver operating characteristic (FROC) curves was statistically significant (p<0.05) by JAFROC analysis, indicating that the new features extracted from the CROP method are useful for FP reduction.
Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

PubMed

Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

2013-09-06

Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
[Research on spectra recognition method for cabbages and weeds based on PCA and SIMCA].

PubMed

Zu, Qin; Deng, Wei; Wang, Xiu; Zhao, Chun-Jiang

2013-10-01

In order to improve the accuracy and efficiency of weed identification, the difference of spectral reflectance was employed to distinguish between crops and weeds. Firstly, the different combinations of Savitzky-Golay (SG) convolutional derivation and multiplicative scattering correction (MSC) method were applied to preprocess the raw spectral data. Then the clustering analysis of various types of plants was completed by using principal component analysis (PCA) method, and the feature wavelengths which were sensitive for classifying various types of plants were extracted according to the corresponding loading plots of the optimal principal components in PCA results. Finally, setting the feature wavelengths as the input variables, the soft independent modeling of class analogy (SIMCA) classification method was used to identify the various types of plants. The experimental results of classifying cabbages and weeds showed that on the basis of the optimal pretreatment by a synthetic application of MSC and SG convolutional derivation with SG's parameters set as 1rd order derivation, 3th degree polynomial and 51 smoothing points, 23 feature wavelengths were extracted in accordance with the top three principal components in PCA results. When SIMCA method was used for classification while the previously selected 23 feature wavelengths were set as the input variables, the classification rates of the modeling set and the prediction set were respectively up to 98.6% and 100%.
Impact of Chaos Functions on Modern Swarm Optimizers.

PubMed

Emary, E; Zawbaa, Hossam M

2016-01-01

Exploration and exploitation are two essential components for any optimization algorithm. Much exploration leads to oscillation and premature convergence while too much exploitation slows down the optimization algorithm and the optimizer may be stuck in local minima. Therefore, balancing the rates of exploration and exploitation at the optimization lifetime is a challenge. This study evaluates the impact of using chaos-based control of exploration/exploitation rates against using the systematic native control. Three modern algorithms were used in the study namely grey wolf optimizer (GWO), antlion optimizer (ALO) and moth-flame optimizer (MFO) in the domain of machine learning for feature selection. Results on a set of standard machine learning data using a set of assessment indicators prove advance in optimization algorithm performance when using variational repeated periods of declined exploration rates over using systematically decreased exploration rates.
Application of quantum-behaved particle swarm optimization to motor imagery EEG classification.

PubMed

Hsu, Wei-Yen

2013-12-01

In this study, we propose a recognition system for single-trial analysis of motor imagery (MI) electroencephalogram (EEG) data. Applying event-related brain potential (ERP) data acquired from the sensorimotor cortices, the system chiefly consists of automatic artifact elimination, feature extraction, feature selection and classification. In addition to the use of independent component analysis, a similarity measure is proposed to further remove the electrooculographic (EOG) artifacts automatically. Several potential features, such as wavelet-fractal features, are then extracted for subsequent classification. Next, quantum-behaved particle swarm optimization (QPSO) is used to select features from the feature combination. Finally, selected sub-features are classified by support vector machine (SVM). Compared with without artifact elimination, feature selection using a genetic algorithm (GA) and feature classification with Fisher's linear discriminant (FLD) on MI data from two data sets for eight subjects, the results indicate that the proposed method is promising in brain-computer interface (BCI) applications.
Subpixel Mapping of Hyperspectral Image Based on Linear Subpixel Feature Detection and Object Optimization

NASA Astrophysics Data System (ADS)

Liu, Zhaoxin; Zhao, Liaoying; Li, Xiaorun; Chen, Shuhan

2018-04-01

Owing to the limitation of spatial resolution of the imaging sensor and the variability of ground surfaces, mixed pixels are widesperead in hyperspectral imagery. The traditional subpixel mapping algorithms treat all mixed pixels as boundary-mixed pixels while ignoring the existence of linear subpixels. To solve this question, this paper proposed a new subpixel mapping method based on linear subpixel feature detection and object optimization. Firstly, the fraction value of each class is obtained by spectral unmixing. Secondly, the linear subpixel features are pre-determined based on the hyperspectral characteristics and the linear subpixel feature; the remaining mixed pixels are detected based on maximum linearization index analysis. The classes of linear subpixels are determined by using template matching method. Finally, the whole subpixel mapping results are iteratively optimized by binary particle swarm optimization algorithm. The performance of the proposed subpixel mapping method is evaluated via experiments based on simulated and real hyperspectral data sets. The experimental results demonstrate that the proposed method can improve the accuracy of subpixel mapping.
Combining Statistical and Geometric Features for Colonic Polyp Detection in CTC Based on Multiple Kernel Learning

PubMed Central

Wang, Shijun; Yao, Jianhua; Petrick, Nicholas; Summers, Ronald M.

2010-01-01

Colon cancer is the second leading cause of cancer-related deaths in the United States. Computed tomographic colonography (CTC) combined with a computer aided detection system provides a feasible approach for improving colonic polyps detection and increasing the use of CTC for colon cancer screening. To distinguish true polyps from false positives, various features extracted from polyp candidates have been proposed. Most of these traditional features try to capture the shape information of polyp candidates or neighborhood knowledge about the surrounding structures (fold, colon wall, etc.). In this paper, we propose a new set of shape descriptors for polyp candidates based on statistical curvature information. These features called histograms of curvature features are rotation, translation and scale invariant and can be treated as complementing existing feature set. Then in order to make full use of the traditional geometric features (defined as group A) and the new statistical features (group B) which are highly heterogeneous, we employed a multiple kernel learning method based on semi-definite programming to learn an optimized classification kernel from the two groups of features. We conducted leave-one-patient-out test on a CTC dataset which contained scans from 66 patients. Experimental results show that a support vector machine (SVM) based on the combined feature set and the semi-definite optimization kernel achieved higher FROC performance compared to SVMs using the two groups of features separately. At a false positive per scan rate of 5, the sensitivity of the SVM using the combined features improved from 0.77 (Group A) and 0.73 (Group B) to 0.83 (p ≤ 0.01). PMID:20953299
Using multiscale texture and density features for near-term breast cancer risk analysis

PubMed Central

Sun, Wenqing; Tseng, Tzu-Liang (Bill); Qian, Wei; Zhang, Jianying; Saltzstein, Edward C.; Zheng, Bin; Lure, Fleming; Yu, Hui; Zhou, Shi

2015-01-01

Purpose: To help improve efficacy of screening mammography by eventually establishing a new optimal personalized screening paradigm, the authors investigated the potential of using the quantitative multiscale texture and density feature analysis of digital mammograms to predict near-term breast cancer risk. Methods: The authors’ dataset includes digital mammograms acquired from 340 women. Among them, 141 were positive and 199 were negative/benign cases. The negative digital mammograms acquired from the “prior” screening examinations were used in the study. Based on the intensity value distributions, five subregions at different scales were extracted from each mammogram. Five groups of features, including density and texture features, were developed and calculated on every one of the subregions. Sequential forward floating selection was used to search for the effective combinations. Using the selected features, a support vector machine (SVM) was optimized using a tenfold validation method to predict the risk of each woman having image-detectable cancer in the next sequential mammography screening. The area under the receiver operating characteristic curve (AUC) was used as the performance assessment index. Results: From a total number of 765 features computed from multiscale subregions, an optimal feature set of 12 features was selected. Applying this feature set, a SVM classifier yielded performance of AUC = 0.729 ± 0.021. The positive predictive value was 0.657 (92 of 140) and the negative predictive value was 0.755 (151 of 200). Conclusions: The study results demonstrated a moderately high positive association between risk prediction scores generated by the quantitative multiscale mammographic image feature analysis and the actual risk of a woman having an image-detectable breast cancer in the next subsequent examinations. PMID:26127038
Parameter Optimization for Feature and Hit Generation in a General Unknown Screening Method-Proof of Concept Study Using a Design of Experiment Approach for a High Resolution Mass Spectrometry Procedure after Data Independent Acquisition.

PubMed

Elmiger, Marco P; Poetzsch, Michael; Steuer, Andrea E; Kraemer, Thomas

2018-03-06

High resolution mass spectrometry and modern data independent acquisition (DIA) methods enable the creation of general unknown screening (GUS) procedures. However, even when DIA is used, its potential is far from being exploited, because often, the untargeted acquisition is followed by a targeted search. Applying an actual GUS (including untargeted screening) produces an immense amount of data that must be dealt with. An optimization of the parameters regulating the feature detection and hit generation algorithms of the data processing software could significantly reduce the amount of unnecessary data and thereby the workload. Design of experiment (DoE) approaches allow a simultaneous optimization of multiple parameters. In a first step, parameters are evaluated (crucial or noncrucial). Second, crucial parameters are optimized. The aim in this study was to reduce the number of hits, without missing analytes. The obtained parameter settings from the optimization were compared to the standard settings by analyzing a test set of blood samples spiked with 22 relevant analytes as well as 62 authentic forensic cases. The optimization lead to a marked reduction of workload (12.3 to 1.1% and 3.8 to 1.1% hits for the test set and the authentic cases, respectively) while simultaneously increasing the identification rate (68.2 to 86.4% and 68.8 to 88.1%, respectively). This proof of concept study emphasizes the great potential of DoE approaches to master the data overload resulting from modern data independent acquisition methods used for general unknown screening procedures by optimizing software parameters.
Advanced Interactive Display Formats for Terminal Area Traffic Control

NASA Technical Reports Server (NTRS)

Grunwald, Arthur J.; Shaviv, G. E.

1999-01-01

This research project deals with an on-line dynamic method for automated viewing parameter management in perspective displays. Perspective images are optimized such that a human observer will perceive relevant spatial geometrical features with minimal errors. In order to compute the errors at which observers reconstruct spatial features from perspective images, a visual spatial-perception model was formulated. The model was employed as the basis of an optimization scheme aimed at seeking the optimal projection parameter setting. These ideas are implemented in the context of an air traffic control (ATC) application. A concept, referred to as an active display system, was developed. This system uses heuristic rules to identify relevant geometrical features of the three-dimensional air traffic situation. Agile, on-line optimization was achieved by a specially developed and custom-tailored genetic algorithm (GA), which was to deal with the multi-modal characteristics of the objective function and exploit its time-evolving nature.
The effect of feature selection methods on computer-aided detection of masses in mammograms

NASA Astrophysics Data System (ADS)

Hupse, Rianne; Karssemeijer, Nico

2010-05-01

In computer-aided diagnosis (CAD) research, feature selection methods are often used to improve generalization performance of classifiers and shorten computation times. In an application that detects malignant masses in mammograms, we investigated the effect of using a selection criterion that is similar to the final performance measure we are optimizing, namely the mean sensitivity of the system in a predefined range of the free-response receiver operating characteristics (FROC). To obtain the generalization performance of the selected feature subsets, a cross validation procedure was performed on a dataset containing 351 abnormal and 7879 normal regions, each region providing a set of 71 mass features. The same number of noise features, not containing any information, were added to investigate the ability of the feature selection algorithms to distinguish between useful and non-useful features. It was found that significantly higher performances were obtained using feature sets selected by the general test statistic Wilks' lambda than using feature sets selected by the more specific FROC measure. Feature selection leads to better performance when compared to a system in which all features were used.
Learner, Patient, and Supervisor Features Are Associated With Different Types of Cognitive Load During Procedural Skills Training: Implications for Teaching and Instructional Design.

PubMed

Sewell, Justin L; Boscardin, Christy K; Young, John Q; Ten Cate, Olle; O'Sullivan, Patricia S

2017-11-01

Cognitive load theory, focusing on limits of the working memory, is relevant to medical education; however, factors associated with cognitive load during procedural skills training are not well characterized. The authors sought to determine how features of learners, patients/tasks, settings, and supervisors were associated with three types of cognitive load among learners performing a specific procedure, colonoscopy, to identify implications for procedural teaching. Data were collected through an electronically administered survey sent to 1,061 U.S. gastroenterology fellows during the 2014-2015 academic year; 477 (45.0%) participated. Participants completed the survey immediately following a colonoscopy. Using multivariable linear regression analyses, the authors identified sets of features associated with intrinsic, extraneous, and germane loads. Features associated with intrinsic load included learners (prior experience and year in training negatively associated, fatigue positively associated) and patient/tasks (procedural complexity positively associated, better patient tolerance negatively associated). Features associated with extraneous load included learners (fatigue positively associated), setting (queue order positively associated), and supervisors (supervisor engagement and confidence negatively associated). Only one feature, supervisor engagement, was (positively) associated with germane load. These data support practical recommendations for teaching procedural skills through the lens of cognitive load theory. To optimize intrinsic load, level of experience and competence of learners should be balanced with procedural complexity; part-task approaches and scaffolding may be beneficial. To reduce extraneous load, teachers should remain engaged, and factors within the procedural setting that may interfere with learning should be minimized. To optimize germane load, teachers should remain engaged.
SU-F-R-10: Selecting the Optimal Solution for Multi-Objective Radiomics Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Z; Folkert, M; Wang, J

2016-06-15

Purpose: To develop an evidential reasoning approach for selecting the optimal solution from a Pareto solution set obtained by a multi-objective radiomics model for predicting distant failure in lung SBRT. Methods: In the multi-objective radiomics model, both sensitivity and specificity are considered as the objective functions simultaneously. A Pareto solution set with many feasible solutions will be resulted from the multi-objective optimization. In this work, an optimal solution Selection methodology for Multi-Objective radiomics Learning model using the Evidential Reasoning approach (SMOLER) was proposed to select the optimal solution from the Pareto solution set. The proposed SMOLER method used the evidentialmore » reasoning approach to calculate the utility of each solution based on pre-set optimal solution selection rules. The solution with the highest utility was chosen as the optimal solution. In SMOLER, an optimal learning model coupled with clonal selection algorithm was used to optimize model parameters. In this study, PET, CT image features and clinical parameters were utilized for predicting distant failure in lung SBRT. Results: Total 126 solution sets were generated by adjusting predictive model parameters. Each Pareto set contains 100 feasible solutions. The solution selected by SMOLER within each Pareto set was compared to the manually selected optimal solution. Five-cross-validation was used to evaluate the optimal solution selection accuracy of SMOLER. The selection accuracies for five folds were 80.00%, 69.23%, 84.00%, 84.00%, 80.00%, respectively. Conclusion: An optimal solution selection methodology for multi-objective radiomics learning model using the evidential reasoning approach (SMOLER) was proposed. Experimental results show that the optimal solution can be found in approximately 80% cases.« less
EEG-based recognition of video-induced emotions: selecting subject-independent feature set.

PubMed

Kortelainen, Jukka; Seppänen, Tapio

2013-01-01

Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
Feature selection gait-based gender classification under different circumstances

NASA Astrophysics Data System (ADS)

Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

2014-05-01

This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

PubMed Central

Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

2016-01-01

Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.

PubMed

Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

2016-01-01

Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.

Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.

PubMed

Sriraam, N; Raghu, S

2017-09-02

Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p < 0.05) and z-score (-1.96 > z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.

PubMed

Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing

2009-03-06

Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.
Feature Selection for Chemical Sensor Arrays Using Mutual Information

PubMed Central

Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

2014-01-01

We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058
Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.

PubMed

Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W

2016-10-01

This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Control strategy optimization of HVAC plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Facci, Andrea Luigi; Zanfardino, Antonella; Martini, Fabrizio

In this paper we present a methodology to optimize the operating conditions of heating, ventilation and air conditioning (HVAC) plants to achieve a higher energy efficiency in use. Semi-empiric numerical models of the plant components are used to predict their performances as a function of their set-point and the environmental and occupied space conditions. The optimization is performed through a graph-based algorithm that finds the set-points of the system components that minimize energy consumption and/or energy costs, while matching the user energy demands. The resulting model can be used with systems of almost any complexity, featuring both HVAC components andmore » energy systems, and is sufficiently fast to make it applicable to real-time setting.« less
Convolutional neural network approach for enhanced capture of breast parenchymal complexity patterns associated with breast cancer risk

NASA Astrophysics Data System (ADS)

Oustimov, Andrew; Gastounioti, Aimilia; Hsieh, Meng-Kang; Pantalone, Lauren; Conant, Emily F.; Kontos, Despina

2017-03-01

We assess the feasibility of a parenchymal texture feature fusion approach, utilizing a convolutional neural network (ConvNet) architecture, to benefit breast cancer risk assessment. Hypothesizing that by capturing sparse, subtle interactions between localized motifs present in two-dimensional texture feature maps derived from mammographic images, a multitude of texture feature descriptors can be optimally reduced to five meta-features capable of serving as a basis on which a linear classifier, such as logistic regression, can efficiently assess breast cancer risk. We combine this methodology with our previously validated lattice-based strategy for parenchymal texture analysis and we evaluate the feasibility of this approach in a case-control study with 424 digital mammograms. In a randomized split-sample setting, we optimize our framework in training/validation sets (N=300) and evaluate its descriminatory performance in an independent test set (N=124). The discriminatory capacity is assessed in terms of the the area under the curve (AUC) of the receiver operator characteristic (ROC). The resulting meta-features exhibited strong classification capability in the test dataset (AUC = 0.90), outperforming conventional, non-fused, texture analysis which previously resulted in an AUC=0.85 on the same case-control dataset. Our results suggest that informative interactions between localized motifs exist and can be extracted and summarized via a fairly simple ConvNet architecture.
Discriminative spatial-frequency-temporal feature extraction and classification of motor imagery EEG: An sparse regression and Weighted Naïve Bayesian Classifier-based approach.

PubMed

Miao, Minmin; Zeng, Hong; Wang, Aimin; Zhao, Changsen; Liu, Feixiang

2017-02-15

Common spatial pattern (CSP) is most widely used in motor imagery based brain-computer interface (BCI) systems. In conventional CSP algorithm, pairs of the eigenvectors corresponding to both extreme eigenvalues are selected to construct the optimal spatial filter. In addition, an appropriate selection of subject-specific time segments and frequency bands plays an important role in its successful application. This study proposes to optimize spatial-frequency-temporal patterns for discriminative feature extraction. Spatial optimization is implemented by channel selection and finding discriminative spatial filters adaptively on each time-frequency segment. A novel Discernibility of Feature Sets (DFS) criteria is designed for spatial filter optimization. Besides, discriminative features located in multiple time-frequency segments are selected automatically by the proposed sparse time-frequency segment common spatial pattern (STFSCSP) method which exploits sparse regression for significant features selection. Finally, a weight determined by the sparse coefficient is assigned for each selected CSP feature and we propose a Weighted Naïve Bayesian Classifier (WNBC) for classification. Experimental results on two public EEG datasets demonstrate that optimizing spatial-frequency-temporal patterns in a data-driven manner for discriminative feature extraction greatly improves the classification performance. The proposed method gives significantly better classification accuracies in comparison with several competing methods in the literature. The proposed approach is a promising candidate for future BCI systems. Copyright © 2016 Elsevier B.V. All rights reserved.
Classification Influence of Features on Given Emotions and Its Application in Feature Selection

NASA Astrophysics Data System (ADS)

Xing, Yin; Chen, Chuang; Liu, Li-Long

2018-04-01

In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
Sparse Substring Pattern Set Discovery Using Linear Programming Boosting

NASA Astrophysics Data System (ADS)

Kashihara, Kazuaki; Hatano, Kohei; Bannai, Hideo; Takeda, Masayuki

In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.
Chaotic particle swarm optimization with mutation for classification.

PubMed

Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza

2015-01-01

In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms.
Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation.

PubMed

Hu, Weiming; Li, Wei; Zhang, Xiaoqin; Maybank, Stephen

2015-04-01

In this paper, we propose a tracking algorithm based on a multi-feature joint sparse representation. The templates for the sparse representation can include pixel values, textures, and edges. In the multi-feature joint optimization, noise or occlusion is dealt with using a set of trivial templates. A sparse weight constraint is introduced to dynamically select the relevant templates from the full set of templates. A variance ratio measure is adopted to adaptively adjust the weights of different features. The multi-feature template set is updated adaptively. We further propose an algorithm for tracking multi-objects with occlusion handling based on the multi-feature joint sparse reconstruction. The observation model based on sparse reconstruction automatically focuses on the visible parts of an occluded object by using the information in the trivial templates. The multi-object tracking is simplified into a joint Bayesian inference. The experimental results show the superiority of our algorithm over several state-of-the-art tracking algorithms.
Combining functional and structural tests improves the diagnostic accuracy of relevance vector machine classifiers

PubMed Central

Racette, Lyne; Chiou, Christine Y.; Hao, Jiucang; Bowd, Christopher; Goldbaum, Michael H.; Zangwill, Linda M.; Lee, Te-Won; Weinreb, Robert N.; Sample, Pamela A.

2009-01-01

Purpose To investigate whether combining optic disc topography and short-wavelength automated perimetry (SWAP) data improves the diagnostic accuracy of relevance vector machine (RVM) classifiers for detecting glaucomatous eyes compared to using each test alone. Methods One eye of 144 glaucoma patients and 68 healthy controls from the Diagnostic Innovations in Glaucoma Study were included. RVM were trained and tested with cross-validation on optimized (backward elimination) SWAP features (thresholds plus age; pattern deviation (PD); total deviation (TD)) and on Heidelberg Retina Tomograph II (HRT) optic disc topography features, independently and in combination. RVM performance was also compared to two HRT linear discriminant functions (LDF) and to SWAP mean deviation (MD) and pattern standard deviation (PSD). Classifier performance was measured by the area under the receiver operating characteristic curves (AUROCs) generated for each feature set and by the sensitivities at set specificities of 75%, 90% and 96%. Results RVM trained on combined HRT and SWAP thresholds plus age had significantly higher AUROC (0.93) than RVM trained on HRT (0.88) and SWAP (0.76) alone. AUROCs for the SWAP global indices (MD: 0.68; PSD: 0.72) offered no advantage over SWAP thresholds plus age, while the LDF AUROCs were significantly lower than RVM trained on the combined SWAP and HRT feature set and on HRT alone feature set. Conclusions Training RVM on combined optimized HRT and SWAP data improved diagnostic accuracy compared to training on SWAP and HRT parameters alone. Future research may identify other combinations of tests and classifiers that can also improve diagnostic accuracy. PMID:19528827
Impact of experimental design on PET radiomics in predicting somatic mutation status.

PubMed

Yip, Stephen S F; Parmar, Chintan; Kim, John; Huynh, Elizabeth; Mak, Raymond H; Aerts, Hugo J W L

2017-12-01

PET-based radiomic features have demonstrated great promises in predicting genetic data. However, various experimental parameters can influence the feature extraction pipeline, and hence, Here, we investigated how experimental settings affect the performance of radiomic features in predicting somatic mutation status in non-small cell lung cancer (NSCLC) patients. 348 NSCLC patients with somatic mutation testing and diagnostic PET images were included in our analysis. Radiomic feature extractions were analyzed for varying voxel sizes, filters and bin widths. 66 radiomic features were evaluated. The performance of features in predicting mutations status was assessed using the area under the receiver-operating-characteristic curve (AUC). The influence of experimental parameters on feature predictability was quantified as the relative difference between the minimum and maximum AUC (δ). The large majority of features (n=56, 85%) were significantly predictive for EGFR mutation status (AUC≥0.61). 29 radiomic features significantly predicted EGFR mutations and were robust to experimental settings with δ Overall <5%. The overall influence (δ Overall ) of the voxel size, filter and bin width for all features ranged from 5% to 15%, respectively. For all features, none of the experimental designs was predictive of KRAS+ from KRAS- (AUC≤0.56). The predictability of 29 radiomic features was robust to the choice of experimental settings; however, these settings need to be carefully chosen for all other features. The combined effect of the investigated processing methods could be substantial and must be considered. Optimized settings that will maximize the predictive performance of individual radiomic features should be investigated in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Implementation of Chaotic Gaussian Particle Swarm Optimization for Optimize Learning-to-Rank Software Defect Prediction Model Construction

NASA Astrophysics Data System (ADS)

Buchari, M. A.; Mardiyanto, S.; Hendradjaya, B.

2018-03-01

Finding the existence of software defect as early as possible is the purpose of research about software defect prediction. Software defect prediction activity is required to not only state the existence of defects, but also to be able to give a list of priorities which modules require a more intensive test. Therefore, the allocation of test resources can be managed efficiently. Learning to rank is one of the approach that can provide defect module ranking data for the purposes of software testing. In this study, we propose a meta-heuristic chaotic Gaussian particle swarm optimization to improve the accuracy of learning to rank software defect prediction approach. We have used 11 public benchmark data sets as experimental data. Our overall results has demonstrated that the prediction models construct using Chaotic Gaussian Particle Swarm Optimization gets better accuracy on 5 data sets, ties in 5 data sets and gets worse in 1 data sets. Thus, we conclude that the application of Chaotic Gaussian Particle Swarm Optimization in Learning-to-Rank approach can improve the accuracy of the defect module ranking in data sets that have high-dimensional features.
Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

NASA Astrophysics Data System (ADS)

Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

2014-07-01

Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.
Identifying the optimal segmentors for mass classification in mammograms

NASA Astrophysics Data System (ADS)

Zhang, Yu; Tomuro, Noriko; Furst, Jacob; Raicu, Daniela S.

2015-03-01

In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.
Voxel classification based airway tree segmentation

NASA Astrophysics Data System (ADS)

Lo, Pechin; de Bruijne, Marleen

2008-03-01

This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
Group sparse multiview patch alignment framework with view consistency for image classification.

PubMed

Gui, Jie; Tao, Dacheng; Sun, Zhenan; Luo, Yong; You, Xinge; Tang, Yuan Yan

2014-07-01

No single feature can satisfactorily characterize the semantic concepts of an image. Multiview learning aims to unify different kinds of features to produce a consensual and efficient representation. This paper redefines part optimization in the patch alignment framework (PAF) and develops a group sparse multiview patch alignment framework (GSM-PAF). The new part optimization considers not only the complementary properties of different views, but also view consistency. In particular, view consistency models the correlations between all possible combinations of any two kinds of view. In contrast to conventional dimensionality reduction algorithms that perform feature extraction and feature selection independently, GSM-PAF enjoys joint feature extraction and feature selection by exploiting l(2,1)-norm on the projection matrix to achieve row sparsity, which leads to the simultaneous selection of relevant features and learning transformation, and thus makes the algorithm more discriminative. Experiments on two real-world image data sets demonstrate the effectiveness of GSM-PAF for image classification.
DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

PubMed

Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

2018-01-05

DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.
DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

PubMed Central

Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

2018-01-01

DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743

RED: a set of molecular descriptors based on Renyi entropy.

PubMed

Delgado-Soler, Laura; Toral, Raul; Tomás, M Santos; Rubio-Martinez, Jaime

2009-11-01

New molecular descriptors, RED (Renyi entropy descriptors), based on the generalized entropies introduced by Renyi are presented. Topological descriptors based on molecular features have proven to be useful for describing molecular profiles. Renyi entropy is used as a variability measure to contract a feature-pair distribution composing the descriptor vector. The performance of RED descriptors was tested for the analysis of different sets of molecular distances, virtual screening, and pharmacological profiling. A free parameter of the Renyi entropy has been optimized for all the considered applications.
Chaotic Particle Swarm Optimization with Mutation for Classification

PubMed Central

Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza

2015-01-01

In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms. PMID:25709937
Urinary bladder cancer T-staging from T2-weighted MR images using an optimal biomarker approach

NASA Astrophysics Data System (ADS)

Wang, Chuang; Udupa, Jayaram K.; Tong, Yubing; Chen, Jerry; Venigalla, Sriram; Odhner, Dewey; Guzzo, Thomas J.; Christodouleas, John; Torigian, Drew A.

2018-02-01

Magnetic resonance imaging (MRI) is often used in clinical practice to stage patients with bladder cancer to help plan treatment. However, qualitative assessment of MR images is prone to inaccuracies, adversely affecting patient outcomes. In this paper, T2-weighted MR image-based quantitative features were extracted from the bladder wall in 65 patients with bladder cancer to classify them into two primary tumor (T) stage groups: group 1 - T stage < T2, with primary tumor locally confined to the bladder, and group 2 - T stage < T2, with primary tumor locally extending beyond the bladder. The bladder was divided into 8 sectors in the axial plane, where each sector has a corresponding reference standard T stage that is based on expert radiology qualitative MR image review and histopathologic results. The performance of the classification for correct assignment of T stage grouping was then evaluated at both the patient level and the sector level. Each bladder sector was divided into 3 shells (inner, middle, and outer), and 15,834 features including intensity features and texture features from local binary pattern and gray-level co-occurrence matrix were extracted from the 3 shells of each sector. An optimal feature set was selected from all features using an optimal biomarker approach. Nine optimal biomarker features were derived based on texture properties from the middle shell, with an area under the ROC curve of AUC value at the sector and patient level of 0.813 and 0.806, respectively.
Extracting physicochemical features to predict protein secondary structure.

PubMed

Huang, Yin-Fu; Chen, Shu-Ying

2013-01-01

We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.
Extracting Physicochemical Features to Predict Protein Secondary Structure

PubMed Central

Chen, Shu-Ying

2013-01-01

We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. PMID:23766688
Feature selection and classification of multiparametric medical images using bagging and SVM

NASA Astrophysics Data System (ADS)

Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

2008-03-01

This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.
Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking.

PubMed

Yu, Jun; Yang, Xiaokang; Gao, Fei; Tao, Dacheng

2017-12-01

How do we retrieve images accurately? Also, how do we rank a group of images precisely and efficiently for specific queries? These problems are critical for researchers and engineers to generate a novel image searching engine. First, it is important to obtain an appropriate description that effectively represent the images. In this paper, multimodal features are considered for describing images. The images unique properties are reflected by visual features, which are correlated to each other. However, semantic gaps always exist between images visual features and semantics. Therefore, we utilize click feature to reduce the semantic gap. The second key issue is learning an appropriate distance metric to combine these multimodal features. This paper develops a novel deep multimodal distance metric learning (Deep-MDML) method. A structured ranking model is adopted to utilize both visual and click features in distance metric learning (DML). Specifically, images and their related ranking results are first collected to form the training set. Multimodal features, including click and visual features, are collected with these images. Next, a group of autoencoders is applied to obtain initially a distance metric in different visual spaces, and an MDML method is used to assign optimal weights for different modalities. Next, we conduct alternating optimization to train the ranking model, which is used for the ranking of new queries with click features. Compared with existing image ranking methods, the proposed method adopts a new ranking model to use multimodal features, including click features and visual features in DML. We operated experiments to analyze the proposed Deep-MDML in two benchmark data sets, and the results validate the effects of the method.
Enhancement of multimodality texture-based prediction models via optimization of PET and MR image acquisition protocols: a proof of concept

NASA Astrophysics Data System (ADS)

Vallières, Martin; Laberge, Sébastien; Diamant, André; El Naqa, Issam

2017-11-01

Texture-based radiomic models constructed from medical images have the potential to support cancer treatment management via personalized assessment of tumour aggressiveness. While the identification of stable texture features under varying imaging settings is crucial for the translation of radiomics analysis into routine clinical practice, we hypothesize in this work that a complementary optimization of image acquisition parameters prior to texture feature extraction could enhance the predictive performance of texture-based radiomic models. As a proof of concept, we evaluated the possibility of enhancing a model constructed for the early prediction of lung metastases in soft-tissue sarcomas by optimizing PET and MR image acquisition protocols via computerized simulations of image acquisitions with varying parameters. Simulated PET images from 30 STS patients were acquired by varying the extent of axial data combined per slice (‘span’). Simulated T 1-weighted and T 2-weighted MR images were acquired by varying the repetition time and echo time in a spin-echo pulse sequence, respectively. We analyzed the impact of the variations of PET and MR image acquisition parameters on individual textures, and we investigated how these variations could enhance the global response and the predictive properties of a texture-based model. Our results suggest that it is feasible to identify an optimal set of image acquisition parameters to improve prediction performance. The model constructed with textures extracted from simulated images acquired with a standard clinical set of acquisition parameters reached an average AUC of 0.84 +/- 0.01 in bootstrap testing experiments. In comparison, the model performance significantly increased using an optimal set of image acquisition parameters (p = 0.04 ), with an average AUC of 0.89 +/- 0.01 . Ultimately, specific acquisition protocols optimized to generate superior radiomics measurements for a given clinical problem could be developed and standardized via dedicated computer simulations and thereafter validated using clinical scanners.
Decision-theoretic saliency: computational principles, biological plausibility, and implications for neurophysiology and psychophysics.

PubMed

Gao, Dashan; Vasconcelos, Nuno

2009-01-01

A decision-theoretic formulation of visual saliency, first proposed for top-down processing (object recognition) (Gao & Vasconcelos, 2005a), is extended to the problem of bottom-up saliency. Under this formulation, optimality is defined in the minimum probability of error sense, under a constraint of computational parsimony. The saliency of the visual features at a given location of the visual field is defined as the power of those features to discriminate between the stimulus at the location and a null hypothesis. For bottom-up saliency, this is the set of visual features that surround the location under consideration. Discrimination is defined in an information-theoretic sense and the optimal saliency detector derived for a class of stimuli that complies with known statistical properties of natural images. It is shown that under the assumption that saliency is driven by linear filtering, the optimal detector consists of what is usually referred to as the standard architecture of V1: a cascade of linear filtering, divisive normalization, rectification, and spatial pooling. The optimal detector is also shown to replicate the fundamental properties of the psychophysics of saliency: stimulus pop-out, saliency asymmetries for stimulus presence versus absence, disregard of feature conjunctions, and Weber's law. Finally, it is shown that the optimal saliency architecture can be applied to the solution of generic inference problems. In particular, for the class of stimuli studied, it performs the three fundamental operations of statistical inference: assessment of probabilities, implementation of Bayes decision rule, and feature selection.
Classification of malignant and benign liver tumors using a radiomics approach

NASA Astrophysics Data System (ADS)

Starmans, Martijn P. A.; Miclea, Razvan L.; van der Voort, Sebastian R.; Niessen, Wiro J.; Thomeer, Maarten G.; Klein, Stefan

2018-03-01

Correct diagnosis of the liver tumor phenotype is crucial for treatment planning, especially the distinction between malignant and benign lesions. Clinical practice includes manual scoring of the tumors on Magnetic Resonance (MR) images by a radiologist. As this is challenging and subjective, it is often followed by a biopsy. In this study, we propose a radiomics approach as an objective and non-invasive alternative for distinguishing between malignant and benign phenotypes. T2-weighted (T2w) MR sequences of 119 patients from multiple centers were collected. We developed an efficient semi-automatic segmentation method, which was used by a radiologist to delineate the tumors. Within these regions, features quantifying tumor shape, intensity, texture, heterogeneity and orientation were extracted. Patient characteristics and semantic features were added for a total of 424 features. Classification was performed using Support Vector Machines (SVMs). The performance was evaluated using internal random-split cross-validation. On the training set within each iteration, feature selection and hyperparameter optimization were performed. To this end, another cross validation was performed by splitting the training sets in training and validation parts. The optimal settings were evaluated on the independent test sets. Manual scoring by a radiologist was also performed. The radiomics approach resulted in 95% confidence intervals of the AUC of [0.75, 0.92], specificity [0.76, 0.96] and sensitivity [0.52, 0.82]. These approach the performance of the radiologist, which were an AUC of 0.93, specificity 0.70 and sensitivity 0.93. Hence, radiomics has the potential to predict the liver tumor benignity in an objective and non-invasive manner.
Feature Selection for Classification of Polar Regions Using a Fuzzy Expert System

NASA Technical Reports Server (NTRS)

Penaloza, Mauel A.; Welch, Ronald M.

1996-01-01

Labeling, feature selection, and the choice of classifier are critical elements for classification of scenes and for image understanding. This study examines several methods for feature selection in polar regions, including the list, of a fuzzy logic-based expert system for further refinement of a set of selected features. Six Advanced Very High Resolution Radiometer (AVHRR) Local Area Coverage (LAC) arctic scenes are classified into nine classes: water, snow / ice, ice cloud, land, thin stratus, stratus over water, cumulus over water, textured snow over water, and snow-covered mountains. Sixty-seven spectral and textural features are computed and analyzed by the feature selection algorithms. The divergence, histogram analysis, and discriminant analysis approaches are intercompared for their effectiveness in feature selection. The fuzzy expert system method is used not only to determine the effectiveness of each approach in classifying polar scenes, but also to further reduce the features into a more optimal set. For each selection method,features are ranked from best to worst, and the best half of the features are selected. Then, rules using these selected features are defined. The results of running the fuzzy expert system with these rules show that the divergence method produces the best set features, not only does it produce the highest classification accuracy, but also it has the lowest computation requirements. A reduction of the set of features produced by the divergence method using the fuzzy expert system results in an overall classification accuracy of over 95 %. However, this increase of accuracy has a high computation cost.
FPFH-based graph matching for 3D point cloud registration

NASA Astrophysics Data System (ADS)

Zhao, Jiapeng; Li, Chen; Tian, Lihua; Zhu, Jihua

2018-04-01

Correspondence detection is a vital step in point cloud registration and it can help getting a reliable initial alignment. In this paper, we put forward an advanced point feature-based graph matching algorithm to solve the initial alignment problem of rigid 3D point cloud registration with partial overlap. Specifically, Fast Point Feature Histograms are used to determine the initial possible correspondences firstly. Next, a new objective function is provided to make the graph matching more suitable for partially overlapping point cloud. The objective function is optimized by the simulated annealing algorithm for final group of correct correspondences. Finally, we present a novel set partitioning method which can transform the NP-hard optimization problem into a O(n3)-solvable one. Experiments on the Stanford and UWA public data sets indicates that our method can obtain better result in terms of both accuracy and time cost compared with other point cloud registration methods.
Correlation pattern recognition: optimal parameters for quality standards control of chocolate marshmallow candy

NASA Astrophysics Data System (ADS)

Flores, Jorge L.; García-Torales, G.; Ponce Ávila, Cristina

2006-08-01

This paper describes an in situ image recognition system designed to inspect the quality standards of the chocolate pops during their production. The essence of the recognition system is the localization of the events (i.e., defects) in the input images that affect the quality standards of pops. To this end, processing modules, based on correlation filter, and segmentation of images are employed with the objective of measuring the quality standards. Therefore, we designed the correlation filter and defined a set of features from the correlation plane. The desired values for these parameters are obtained by exploiting information about objects to be rejected in order to find the optimal discrimination capability of the system. Regarding this set of features, the pop can be correctly classified. The efficacy of the system has been tested thoroughly under laboratory conditions using at least 50 images, containing 3 different types of possible defects.
Statistical Inference on Optimal Points to Evaluate Multi-State Classification Systems

DTIC Science & Technology

2014-09-18

vs2+ ( dbcm3 ˆ 2 ) *vm3+( dbcs3 ˆ 2 ) * vs3 80 VETA <−VBCA+VBC EETA<−EBCA−EBC 82 W<− (EETA−TV) / s q r t ( VETA ) # T e s t p−v a l u e − t o compare t o a...event set, E = (ε1, ε2, ..., εk) to k distinct elements of a label set, L = (l1, l2, ..., lk) . These partitions may be referred to as classes. For...set of features, F = ( f1, f2, ..., fm) . These features are then used to assign the different elements from E to the respective labels, L , (A : E → F
Parameter optimization of parenchymal texture analysis for prediction of false-positive recalls from screening mammography

NASA Astrophysics Data System (ADS)

Ray, Shonket; Keller, Brad M.; Chen, Jinbo; Conant, Emily F.; Kontos, Despina

2016-03-01

This work details a methodology to obtain optimal parameter values for a locally-adaptive texture analysis algorithm that extracts mammographic texture features representative of breast parenchymal complexity for predicting falsepositive (FP) recalls from breast cancer screening with digital mammography. The algorithm has two components: (1) adaptive selection of localized regions of interest (ROIs) and (2) Haralick texture feature extraction via Gray- Level Co-Occurrence Matrices (GLCM). The following parameters were systematically varied: mammographic views used, upper limit of the ROI window size used for adaptive ROI selection, GLCM distance offsets, and gray levels (binning) used for feature extraction. Each iteration per parameter set had logistic regression with stepwise feature selection performed on a clinical screening cohort of 474 non-recalled women and 68 FP recalled women; FP recall prediction was evaluated using area under the curve (AUC) of the receiver operating characteristic (ROC) and associations between the extracted features and FP recall were assessed via odds ratios (OR). A default instance of mediolateral (MLO) view, upper ROI size limit of 143.36 mm (2048 pixels2), GLCM distance offset combination range of 0.07 to 0.84 mm (1 to 12 pixels) and 16 GLCM gray levels was set. The highest ROC performance value of AUC=0.77 [95% confidence intervals: 0.71-0.83] was obtained at three specific instances: the default instance, upper ROI window equal to 17.92 mm (256 pixels2), and gray levels set to 128. The texture feature of sum average was chosen as a statistically significant (p<0.05) predictor and associated with higher odds of FP recall for 12 out of 14 total instances.
Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations.

PubMed

Zollanvari, Amin; Dougherty, Edward R

2016-12-01

In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical. The prior distribution is formed by mapping a set of mathematical relations among the features and labels, the prior knowledge, into a distribution governing the probability mass across the uncertainty class. In this paper, we consider prior knowledge in the form of stochastic differential equations (SDEs). We consider a vector SDE in integral form involving a drift vector and dispersion matrix. Having constructed the prior, we develop the optimal Bayesian classifier between two models and examine, via synthetic experiments, the effects of uncertainty in the drift vector and dispersion matrix. We apply the theory to a set of SDEs for the purpose of differentiating the evolutionary history between two species.
Differential diagnosis of CT focal liver lesions using texture features, feature selection and ensemble driven classifiers.

PubMed

Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S

2007-09-01

The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.
Optimizing computer-aided colonic polyp detection for CT colonography by evolving the Pareto front1

PubMed Central

Li, Jiang; Huang, Adam; Yao, Jack; Liu, Jiamin; Van Uitert, Robert L.; Petrick, Nicholas; Summers, Ronald M.

2009-01-01

A multiobjective genetic algorithm is designed to optimize a computer-aided detection (CAD) system for identifying colonic polyps. Colonic polyps appear as elliptical protrusions on the inner surface of the colon. Curvature-based features for colonic polyp detection have proved to be successful in several CT colonography (CTC) CAD systems. Our CTC CAD program uses a sequential classifier to form initial polyp detections on the colon surface. The classifier utilizes a set of thresholds on curvature-based features to cluster suspicious colon surface regions into polyp candidates. The thresholds were previously chosen experimentally by using feature histograms. The chosen thresholds were effective for detecting polyps sized 10 mm or larger in diameter. However, many medium-sized polyps, 6–9 mm in diameter, were missed in the initial detection procedure. In this paper, the task of finding optimal thresholds as a multiobjective optimization problem was formulated, and a genetic algorithm to solve it was utilized by evolving the Pareto front of the Pareto optimal set. The new CTC CAD system was tested on 792 patients. The sensitivities of the optimized system improved significantly, from 61.68% to 74.71% with an increase of 13.03% (95% CI [6.57%, 19.5%], p=7.78×10−5) for the size category of 6–9 mm polyps, from 65.02% to 77.4% with an increase of 12.38% (95% CI [6.23%, 18.53%], p=7.95×10−5) for polyps 6 mm or larger, and from 82.2% to 90.58% with an increase of 8.38% (95%CI [0.75%, 16%], p=0.03) for polyps 8 mm or larger at comparable false positive rates. The sensitivities of the optimized system are nearly equivalent to those of expert radiologists. PMID:19235388
Improvement to the scanning electron microscope image adaptive Canny optimization colorization by pseudo-mapping.

PubMed

Lo, T Y; Sim, K S; Tso, C P; Nia, M E

2014-01-01

An improvement to the previously proposed adaptive Canny optimization technique for scanning electron microscope image colorization is reported. The additional feature, called pseudo-mapping technique, is that the grayscale markings are temporarily mapped to a set of pre-defined pseudo-color map as a mean to instill color information for grayscale colors in chrominance channels. This allows the presence of grayscale markings to be identified; hence optimization colorization of grayscale colors is made possible. This additional feature enhances the flexibility of scanning electron microscope image colorization by providing wider range of possible color enhancement. Furthermore, the nature of this technique also allows users to adjust the luminance intensities of selected region from the original image within certain extent. © 2014 Wiley Periodicals, Inc.
Phase-space interference in extensive and nonextensive quantum heat engines

NASA Astrophysics Data System (ADS)

Hardal, Ali Ü. C.; Paternostro, Mauro; Müstecaplıoǧlu, Özgür E.

2018-04-01

Quantum interference is at the heart of what sets the quantum and classical worlds apart. We demonstrate that quantum interference effects involving a many-body working medium is responsible for genuinely nonclassical features in the performance of a quantum heat engine. The features with which quantum interference manifests itself in the work output of the engine depends strongly on the extensive nature of the working medium. While identifying the class of work substances that optimize the performance of the engine, our results shed light on the optimal size of such media of quantum workers to maximize the work output and efficiency of quantum energy machines.

Web-based newborn screening system for metabolic diseases: machine learning versus clinicians.

PubMed

Chen, Wei-Hsin; Hsieh, Sheau-Ling; Hsu, Kai-Ping; Chen, Han-Ping; Su, Xing-Yu; Tseng, Yi-Ju; Chien, Yin-Hsiu; Hwu, Wuh-Liang; Lai, Feipei

2013-05-23

A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases. The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency. This SOA Web service-based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically.
Web-Based Newborn Screening System for Metabolic Diseases: Machine Learning Versus Clinicians

PubMed Central

Chen, Wei-Hsin; Hsu, Kai-Ping; Chen, Han-Ping; Su, Xing-Yu; Tseng, Yi-Ju; Chien, Yin-Hsiu; Hwu, Wuh-Liang; Lai, Feipei

2013-01-01

Background A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. Objective The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. Methods The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases. Results The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency. Conclusions This SOA Web service–based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically. PMID:23702487
Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment

PubMed Central

DeBlasio, Dan

2013-01-01

Abstract We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for “feature-based accuracy estimator”), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality. PMID:23489379
Quantifying and visualizing variations in sets of images using continuous linear optimal transport

NASA Astrophysics Data System (ADS)

Kolouri, Soheil; Rohde, Gustavo K.

2014-03-01

Modern advancements in imaging devices have enabled us to explore the subcellular structure of living organisms and extract vast amounts of information. However, interpreting the biological information mined in the captured images is not a trivial task. Utilizing predetermined numerical features is usually the only hope for quantifying this information. Nonetheless, direct visual or biological interpretation of results obtained from these selected features is non-intuitive and difficult. In this paper, we describe an automatic method for modeling visual variations in a set of images, which allows for direct visual interpretation of the most significant differences, without the need for predefined features. The method is based on a linearized version of the continuous optimal transport (OT) metric, which provides a natural linear embedding for the image data set, in which linear combination of images leads to a visually meaningful image. This enables us to apply linear geometric data analysis techniques such as principal component analysis and linear discriminant analysis in the linearly embedded space and visualize the most prominent modes, as well as the most discriminant modes of variations, in the dataset. Using the continuous OT framework, we are able to analyze variations in shape and texture in a set of images utilizing each image at full resolution, that otherwise cannot be done by existing methods. The proposed method is applied to a set of nuclei images segmented from Feulgen stained liver tissues in order to investigate the major visual differences in chromatin distribution of Fetal-Type Hepatoblastoma (FHB) cells compared to the normal cells.
Integrated feature extraction and selection for neuroimage classification

NASA Astrophysics Data System (ADS)

Fan, Yong; Shen, Dinggang

2009-02-01

Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.
Attention-based image similarity measure with application to content-based information retrieval

NASA Astrophysics Data System (ADS)

Stentiford, Fred W. M.

2003-01-01

Whilst storage and capture technologies are able to cope with huge numbers of images, image retrieval is in danger of rendering many repositories valueless because of the difficulty of access. This paper proposes a similarity measure that imposes only very weak assumptions on the nature of the features used in the recognition process. This approach does not make use of a pre-defined set of feature measurements which are extracted from a query image and used to match those from database images, but instead generates features on a trial and error basis during the calculation of the similarity measure. This has the significant advantage that features that determine similarity can match whatever image property is important in a particular region whether it be a shape, a texture, a colour or a combination of all three. It means that effort is expended searching for the best feature for the region rather than expecting that a fixed feature set will perform optimally over the whole area of an image and over every image in a database. The similarity measure is evaluated on a problem of distinguishing similar shapes in sets of black and white symbols.
Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.

PubMed

Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing

2015-01-01

Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.
a Fully Automated Pipeline for Classification Tasks with AN Application to Remote Sensing

NASA Astrophysics Data System (ADS)

Suzuki, K.; Claesen, M.; Takeda, H.; De Moor, B.

2016-06-01

Nowadays deep learning has been intensively in spotlight owing to its great victories at major competitions, which undeservedly pushed `shallow' machine learning methods, relatively naive/handy algorithms commonly used by industrial engineers, to the background in spite of their facilities such as small requisite amount of time/dataset for training. We, with a practical point of view, utilized shallow learning algorithms to construct a learning pipeline such that operators can utilize machine learning without any special knowledge, expensive computation environment, and a large amount of labelled data. The proposed pipeline automates a whole classification process, namely feature-selection, weighting features and the selection of the most suitable classifier with optimized hyperparameters. The configuration facilitates particle swarm optimization, one of well-known metaheuristic algorithms for the sake of generally fast and fine optimization, which enables us not only to optimize (hyper)parameters but also to determine appropriate features/classifier to the problem, which has conventionally been a priori based on domain knowledge and remained untouched or dealt with naïve algorithms such as grid search. Through experiments with the MNIST and CIFAR-10 datasets, common datasets in computer vision field for character recognition and object recognition problems respectively, our automated learning approach provides high performance considering its simple setting (i.e. non-specialized setting depending on dataset), small amount of training data, and practical learning time. Moreover, compared to deep learning the performance stays robust without almost any modification even with a remote sensing object recognition problem, which in turn indicates that there is a high possibility that our approach contributes to general classification problems.
Dynamic detection-rate-based bit allocation with genuine interval concealment for binary biometric representation.

PubMed

Lim, Meng-Hui; Teoh, Andrew Beng Jin; Toh, Kar-Ann

2013-06-01

Biometric discretization is a key component in biometric cryptographic key generation. It converts an extracted biometric feature vector into a binary string via typical steps such as segmentation of each feature element into a number of labeled intervals, mapping of each interval-captured feature element onto a binary space, and concatenation of the resulted binary output of all feature elements into a binary string. Currently, the detection rate optimized bit allocation (DROBA) scheme is one of the most effective biometric discretization schemes in terms of its capability to assign binary bits dynamically to user-specific features with respect to their discriminability. However, we learn that DROBA suffers from potential discriminative feature misdetection and underdiscretization in its bit allocation process. This paper highlights such drawbacks and improves upon DROBA based on a novel two-stage algorithm: 1) a dynamic search method to efficiently recapture such misdetected features and to optimize the bit allocation of underdiscretized features and 2) a genuine interval concealment technique to alleviate crucial information leakage resulted from the dynamic search. Improvements in classification accuracy on two popular face data sets vindicate the feasibility of our approach compared with DROBA.
Research on Optimization of GLCM Parameter in Cell Classification

NASA Astrophysics Data System (ADS)

Zhang, Xi-Kun; Hou, Jie; Hu, Xin-Hua

2016-05-01

Real-time classification of biological cells according to their 3D morphology is highly desired in a flow cytometer setting. Gray level co-occurrence matrix (GLCM) algorithm has been developed to extract feature parameters from measured diffraction images ,which are too complicated to coordinate with the real-time system for a large amount of calculation. An optimization of GLCM algorithm is provided based on correlation analysis of GLCM parameters. The results of GLCM analysis and subsequent classification demonstrate optimized method can lower the time complexity significantly without loss of classification accuracy.
Optimized extreme learning machine for urban land cover classification using hyperspectral imagery

NASA Astrophysics Data System (ADS)

Su, Hongjun; Tian, Shufang; Cai, Yue; Sheng, Yehua; Chen, Chen; Najafian, Maryam

2017-12-01

This work presents a new urban land cover classification framework using the firefly algorithm (FA) optimized extreme learning machine (ELM). FA is adopted to optimize the regularization coefficient C and Gaussian kernel σ for kernel ELM. Additionally, effectiveness of spectral features derived from an FA-based band selection algorithm is studied for the proposed classification task. Three sets of hyperspectral databases were recorded using different sensors, namely HYDICE, HyMap, and AVIRIS. Our study shows that the proposed method outperforms traditional classification algorithms such as SVM and reduces computational cost significantly.
Basis set construction for molecular electronic structure theory: natural orbital and Gauss-Slater basis for smooth pseudopotentials.

PubMed

Petruzielo, F R; Toulouse, Julien; Umrigar, C J

2011-02-14

A simple yet general method for constructing basis sets for molecular electronic structure calculations is presented. These basis sets consist of atomic natural orbitals from a multiconfigurational self-consistent field calculation supplemented with primitive functions, chosen such that the asymptotics are appropriate for the potential of the system. Primitives are optimized for the homonuclear diatomic molecule to produce a balanced basis set. Two general features that facilitate this basis construction are demonstrated. First, weak coupling exists between the optimal exponents of primitives with different angular momenta. Second, the optimal primitive exponents for a chosen system depend weakly on the particular level of theory employed for optimization. The explicit case considered here is a basis set appropriate for the Burkatzki-Filippi-Dolg pseudopotentials. Since these pseudopotentials are finite at nuclei and have a Coulomb tail, the recently proposed Gauss-Slater functions are the appropriate primitives. Double- and triple-zeta bases are developed for elements hydrogen through argon. These new bases offer significant gains over the corresponding Burkatzki-Filippi-Dolg bases at various levels of theory. Using a Gaussian expansion of the basis functions, these bases can be employed in any electronic structure method. Quantum Monte Carlo provides an added benefit: expansions are unnecessary since the integrals are evaluated numerically.
A random forest model based classification scheme for neonatal amplitude-integrated EEG.

PubMed

Chen, Weiting; Wang, Yu; Cao, Guitao; Chen, Guoqiang; Gu, Qiufang

2014-01-01

Modern medical advances have greatly increased the survival rate of infants, while they remain in the higher risk group for neurological problems later in life. For the infants with encephalopathy or seizures, identification of the extent of brain injury is clinically challenging. Continuous amplitude-integrated electroencephalography (aEEG) monitoring offers a possibility to directly monitor the brain functional state of the newborns over hours, and has seen an increasing application in neonatal intensive care units (NICUs). This paper presents a novel combined feature set of aEEG and applies random forest (RF) method to classify aEEG tracings. To that end, a series of experiments were conducted on 282 aEEG tracing cases (209 normal and 73 abnormal ones). Basic features, statistic features and segmentation features were extracted from both the tracing as a whole and the segmented recordings, and then form a combined feature set. All the features were sent to a classifier afterwards. The significance of feature, the data segmentation, the optimization of RF parameters, and the problem of imbalanced datasets were examined through experiments. Experiments were also done to evaluate the performance of RF on aEEG signal classifying, compared with several other widely used classifiers including SVM-Linear, SVM-RBF, ANN, Decision Tree (DT), Logistic Regression(LR), ML, and LDA. The combined feature set can better characterize aEEG signals, compared with basic features, statistic features and segmentation features respectively. With the combined feature set, the proposed RF-based aEEG classification system achieved a correct rate of 92.52% and a high F1-score of 95.26%. Among all of the seven classifiers examined in our work, the RF method got the highest correct rate, sensitivity, specificity, and F1-score, which means that RF outperforms all of the other classifiers considered here. The results show that the proposed RF-based aEEG classification system with the combined feature set is efficient and helpful to better detect the brain disorders in newborns.
Optimizing Radiometric Processing and Feature Extraction of Drone Based Hyperspectral Frame Format Imagery for Estimation of Yield Quantity and Quality of a Grass Sward

NASA Astrophysics Data System (ADS)

Näsi, R.; Viljanen, N.; Oliveira, R.; Kaivosoja, J.; Niemeläinen, O.; Hakala, T.; Markelin, L.; Nezami, S.; Suomalainen, J.; Honkavaara, E.

2018-04-01

Light-weight 2D format hyperspectral imagers operable from unmanned aerial vehicles (UAV) have become common in various remote sensing tasks in recent years. Using these technologies, the area of interest is covered by multiple overlapping hypercubes, in other words multiview hyperspectral photogrammetric imagery, and each object point appears in many, even tens of individual hypercubes. The common practice is to calculate hyperspectral orthomosaics utilizing only the most nadir areas of the images. However, the redundancy of the data gives potential for much more versatile and thorough feature extraction. We investigated various options of extracting spectral features in the grass sward quantity evaluation task. In addition to the various sets of spectral features, we used photogrammetry-based ultra-high density point clouds to extract features describing the canopy 3D structure. Machine learning technique based on the Random Forest algorithm was used to estimate the fresh biomass. Results showed high accuracies for all investigated features sets. The estimation results using multiview data provided approximately 10 % better results than the most nadir orthophotos. The utilization of the photogrammetric 3D features improved estimation accuracy by approximately 40 % compared to approaches where only spectral features were applied. The best estimation RMSE of 239 kg/ha (6.0 %) was obtained with multiview anisotropy corrected data set and the 3D features.
L1-norm kernel discriminant analysis via Bayes error bound optimization for robust feature extraction.

PubMed

Zheng, Wenming; Lin, Zhouchen; Wang, Haixian

2014-04-01

A novel discriminant analysis criterion is derived in this paper under the theoretical framework of Bayes optimality. In contrast to the conventional Fisher's discriminant criterion, the major novelty of the proposed one is the use of L1 norm rather than L2 norm, which makes it less sensitive to the outliers. With the L1-norm discriminant criterion, we propose a new linear discriminant analysis (L1-LDA) method for linear feature extraction problem. To solve the L1-LDA optimization problem, we propose an efficient iterative algorithm, in which a novel surrogate convex function is introduced such that the optimization problem in each iteration is to simply solve a convex programming problem and a close-form solution is guaranteed to this problem. Moreover, we also generalize the L1-LDA method to deal with the nonlinear robust feature extraction problems via the use of kernel trick, and hereafter proposed the L1-norm kernel discriminant analysis (L1-KDA) method. Extensive experiments on simulated and real data sets are conducted to evaluate the effectiveness of the proposed method in comparing with the state-of-the-art methods.
Hash Bit Selection for Nearest Neighbor Search.

PubMed

Xianglong Liu; Junfeng He; Shih-Fu Chang

2017-11-01

To overcome the barrier of storage and computation when dealing with gigantic-scale data sets, compact hashing has been studied extensively to approximate the nearest neighbor search. Despite the recent advances, critical design issues remain open in how to select the right features, hashing algorithms, and/or parameter settings. In this paper, we address these by posing an optimal hash bit selection problem, in which an optimal subset of hash bits are selected from a pool of candidate bits generated by different features, algorithms, or parameters. Inspired by the optimization criteria used in existing hashing algorithms, we adopt the bit reliability and their complementarity as the selection criteria that can be carefully tailored for hashing performance in different tasks. Then, the bit selection solution is discovered by finding the best tradeoff between search accuracy and time using a modified dynamic programming method. To further reduce the computational complexity, we employ the pairwise relationship among hash bits to approximate the high-order independence property, and formulate it as an efficient quadratic programming method that is theoretically equivalent to the normalized dominant set problem in a vertex- and edge-weighted graph. Extensive large-scale experiments have been conducted under several important application scenarios of hash techniques, where our bit selection framework can achieve superior performance over both the naive selection methods and the state-of-the-art hashing algorithms, with significant accuracy gains ranging from 10% to 50%, relatively.
Combining feature extraction and classification for fNIRS BCIs by regularized least squares optimization.

PubMed

Heger, Dominic; Herff, Christian; Schultz, Tanja

2014-01-01

In this paper, we show that multiple operations of the typical pattern recognition chain of an fNIRS-based BCI, including feature extraction and classification, can be unified by solving a convex optimization problem. We formulate a regularized least squares problem that learns a single affine transformation of raw HbO(2) and HbR signals. We show that this transformation can achieve competitive results in an fNIRS BCI classification task, as it significantly improves recognition of different levels of workload over previously published results on a publicly available n-back data set. Furthermore, we visualize the learned models and analyze their spatio-temporal characteristics.
Ordinal feature selection for iris and palmprint recognition.

PubMed

Sun, Zhenan; Wang, Libin; Tan, Tieniu

2014-09-01

Ordinal measures have been demonstrated as an effective feature representation model for iris and palmprint recognition. However, ordinal measures are a general concept of image analysis and numerous variants with different parameter settings, such as location, scale, orientation, and so on, can be derived to construct a huge feature space. This paper proposes a novel optimization formulation for ordinal feature selection with successful applications to both iris and palmprint recognition. The objective function of the proposed feature selection method has two parts, i.e., misclassification error of intra and interclass matching samples and weighted sparsity of ordinal feature descriptors. Therefore, the feature selection aims to achieve an accurate and sparse representation of ordinal measures. And, the optimization subjects to a number of linear inequality constraints, which require that all intra and interclass matching pairs are well separated with a large margin. Ordinal feature selection is formulated as a linear programming (LP) problem so that a solution can be efficiently obtained even on a large-scale feature pool and training database. Extensive experimental results demonstrate that the proposed LP formulation is advantageous over existing feature selection methods, such as mRMR, ReliefF, Boosting, and Lasso for biometric recognition, reporting state-of-the-art accuracy on CASIA and PolyU databases.
Optimizing spatial patterns with sparse filter bands for motor-imagery based brain-computer interface.

PubMed

Zhang, Yu; Zhou, Guoxu; Jin, Jing; Wang, Xingyu; Cichocki, Andrzej

2015-11-30

Common spatial pattern (CSP) has been most popularly applied to motor-imagery (MI) feature extraction for classification in brain-computer interface (BCI) application. Successful application of CSP depends on the filter band selection to a large degree. However, the most proper band is typically subject-specific and can hardly be determined manually. This study proposes a sparse filter band common spatial pattern (SFBCSP) for optimizing the spatial patterns. SFBCSP estimates CSP features on multiple signals that are filtered from raw EEG data at a set of overlapping bands. The filter bands that result in significant CSP features are then selected in a supervised way by exploiting sparse regression. A support vector machine (SVM) is implemented on the selected features for MI classification. Two public EEG datasets (BCI Competition III dataset IVa and BCI Competition IV IIb) are used to validate the proposed SFBCSP method. Experimental results demonstrate that SFBCSP help improve the classification performance of MI. The optimized spatial patterns by SFBCSP give overall better MI classification accuracy in comparison with several competing methods. The proposed SFBCSP is a potential method for improving the performance of MI-based BCI. Copyright © 2015 Elsevier B.V. All rights reserved.
Quantitative analysis of adipose tissue on chest CT to predict primary graft dysfunction in lung transplant recipients: a novel optimal biomarker approach

NASA Astrophysics Data System (ADS)

Tong, Yubing; Udupa, Jayaram K.; Wang, Chuang; Wu, Caiyun; Pednekar, Gargi; Restivo, Michaela D.; Lederer, David J.; Christie, Jason D.; Torigian, Drew A.

2018-02-01

In this study, patients who underwent lung transplantation are categorized into two groups of successful (positive) or failed (negative) transplantations according to primary graft dysfunction (PGD), i.e., acute lung injury within 72 hours of lung transplantation. Obesity or being underweight is associated with an increased risk of PGD. Adipose quantification and characterization via computed tomography (CT) imaging is an evolving topic of interest. However, very little research of PGD prediction using adipose quantity or characteristics derived from medical images has been performed. The aim of this study is to explore image-based features of thoracic adipose tissue on pre-operative chest CT to distinguish between the above two groups of patients. 140 unenhanced chest CT images from three lung transplant centers (Columbia, Penn, and Duke) are included in this study. 124 patients are in the successful group and 16 in failure group. Chest CT slices at the T7 and T8 vertebral levels are captured to represent the thoracic fat burden by using a standardized anatomic space (SAS) approach. Fat (subcutaneous adipose tissue (SAT)/ visceral adipose tissue (VAT)) intensity and texture properties (1142 in total) for each patient are collected, and then an optimal feature set is selected to maximize feature independence and separation between the two groups. Leave-one-out and leave-ten-out crossvalidation strategies are adopted to test the prediction ability based on those selected features all of which came from VAT texture properties. Accuracy of prediction (ACC), sensitivity (SEN), specificity (SPE), and area under the curve (AUC) of 0.87/0.97, 0.87/0.97, 0.88/1.00, and 0.88/0.99, respectively are achieved by the method. The optimal feature set includes only 5 features (also all from VAT), which might suggest that thoracic VAT plays a more important role than SAT in predicting PGD in lung transplant recipients.

Learning relevant features of data with multi-scale tensor networks

NASA Astrophysics Data System (ADS)

Miles Stoudenmire, E.

2018-07-01

Inspired by coarse-graining approaches used in physics, we show how similar algorithms can be adapted for data. The resulting algorithms are based on layered tree tensor networks and scale linearly with both the dimension of the input and the training set size. Computing most of the layers with an unsupervised algorithm, then optimizing just the top layer for supervised classification of the MNIST and fashion MNIST data sets gives very good results. We also discuss mixing a prior guess for supervised weights together with an unsupervised representation of the data, yielding a smaller number of features nevertheless able to give good performance.
Insights into multimodal imaging classification of ADHD

PubMed Central

Colby, John B.; Rudie, Jeffrey D.; Brown, Jesse A.; Douglas, Pamela K.; Cohen, Mark S.; Shehzad, Zarrar

2012-01-01

Attention deficit hyperactivity disorder (ADHD) currently is diagnosed in children by clinicians via subjective ADHD-specific behavioral instruments and by reports from the parents and teachers. Considering its high prevalence and large economic and societal costs, a quantitative tool that aids in diagnosis by characterizing underlying neurobiology would be extremely valuable. This provided motivation for the ADHD-200 machine learning (ML) competition, a multisite collaborative effort to investigate imaging classifiers for ADHD. Here we present our ML approach, which used structural and functional magnetic resonance imaging data, combined with demographic information, to predict diagnostic status of individuals with ADHD from typically developing (TD) children across eight different research sites. Structural features included quantitative metrics from 113 cortical and non-cortical regions. Functional features included Pearson correlation functional connectivity matrices, nodal and global graph theoretical measures, nodal power spectra, voxelwise global connectivity, and voxelwise regional homogeneity. We performed feature ranking for each site and modality using the multiple support vector machine recursive feature elimination (SVM-RFE) algorithm, and feature subset selection by optimizing the expected generalization performance of a radial basis function kernel SVM (RBF-SVM) trained across a range of the top features. Site-specific RBF-SVMs using these optimal feature sets from each imaging modality were used to predict the class labels of an independent hold-out test set. A voting approach was used to combine these multiple predictions and assign final class labels. With this methodology we were able to predict diagnosis of ADHD with 55% accuracy (versus a 39% chance level in this sample), 33% sensitivity, and 80% specificity. This approach also allowed us to evaluate predictive structural and functional features giving insight into abnormal brain circuitry in ADHD. PMID:22912605
Segmentation and texture analysis of structural biomarkers using neighborhood-clustering-based level set in MRI of the schizophrenic brain.

PubMed

Latha, Manohar; Kavitha, Ganesan

2018-02-03

Schizophrenia (SZ) is a psychiatric disorder that especially affects individuals during their adolescence. There is a need to study the subanatomical regions of SZ brain on magnetic resonance images (MRI) based on morphometry. In this work, an attempt was made to analyze alterations in structure and texture patterns in images of the SZ brain using the level-set method and Laws texture features. T1-weighted MRI of the brain from Center of Biomedical Research Excellence (COBRE) database were considered for analysis. Segmentation was carried out using the level-set method. Geometrical and Laws texture features were extracted from the segmented brain stem, corpus callosum, cerebellum, and ventricle regions to analyze pattern changes in SZ. The level-set method segmented multiple brain regions, with higher similarity and correlation values compared with an optimized method. The geometric features obtained from regions of the corpus callosum and ventricle showed significant variation (p < 0.00001) between normal and SZ brain. Laws texture feature identified a heterogeneous appearance in the brain stem, corpus callosum and ventricular regions, and features from the brain stem were correlated with Positive and Negative Syndrome Scale (PANSS) score (p < 0.005). A framework of geometric and Laws texture features obtained from brain subregions can be used as a supplement for diagnosis of psychiatric disorders.
An Energy-Based Approach for Detection and Characterization of Subtle Entities Within Laser Scanning Point-Clouds

NASA Astrophysics Data System (ADS)

Arav, Reuma; Filin, Sagi

2016-06-01

Airborne laser scans present an optimal tool to describe geomorphological features in natural environments. However, a challenge arises in the detection of such phenomena, as they are embedded in the topography, tend to blend into their surroundings and leave only a subtle signature within the data. Most object-recognition studies address mainly urban environments and follow a general pipeline where the data are partitioned into segments with uniform properties. These approaches are restricted to man-made domain and are capable to handle limited features that answer a well-defined geometric form. As natural environments present a more complex set of features, the common interpretation of the data is still manual at large. In this paper, we propose a data-aware detection scheme, unbound to specific domains or shapes. We define the recognition question as an energy optimization problem, solved by variational means. Our approach, based on the level-set method, characterizes geometrically local surfaces within the data, and uses these characteristics as potential field for minimization. The main advantage here is that it allows topological changes of the evolving curves, such as merging and breaking. We demonstrate the proposed methodology on the detection of collapse sinkholes.
A Novel Multilayer Correlation Maximization Model for Improving CCA-Based Frequency Recognition in SSVEP Brain-Computer Interface.

PubMed

Jiao, Yong; Zhang, Yu; Wang, Yu; Wang, Bei; Jin, Jing; Wang, Xingyu

2018-05-01

Multiset canonical correlation analysis (MsetCCA) has been successfully applied to optimize the reference signals by extracting common features from multiple sets of electroencephalogram (EEG) for steady-state visual evoked potential (SSVEP) recognition in brain-computer interface application. To avoid extracting the possible noise components as common features, this study proposes a sophisticated extension of MsetCCA, called multilayer correlation maximization (MCM) model for further improving SSVEP recognition accuracy. MCM combines advantages of both CCA and MsetCCA by carrying out three layers of correlation maximization processes. The first layer is to extract the stimulus frequency-related information in using CCA between EEG samples and sine-cosine reference signals. The second layer is to learn reference signals by extracting the common features with MsetCCA. The third layer is to re-optimize the reference signals set in using CCA with sine-cosine reference signals again. Experimental study is implemented to validate effectiveness of the proposed MCM model in comparison with the standard CCA and MsetCCA algorithms. Superior performance of MCM demonstrates its promising potential for the development of an improved SSVEP-based brain-computer interface.
Recovery of sparse translation-invariant signals with continuous basis pursuit

PubMed Central

Ekanadham, Chaitanya; Tranchina, Daniel; Simoncelli, Eero

2013-01-01

We consider the problem of decomposing a signal into a linear combination of features, each a continuously translated version of one of a small set of elementary features. Although these constituents are drawn from a continuous family, most current signal decomposition methods rely on a finite dictionary of discrete examples selected from this family (e.g., shifted copies of a set of basic waveforms), and apply sparse optimization methods to select and solve for the relevant coefficients. Here, we generate a dictionary that includes auxiliary interpolation functions that approximate translates of features via adjustment of their coefficients. We formulate a constrained convex optimization problem, in which the full set of dictionary coefficients represents a linear approximation of the signal, the auxiliary coefficients are constrained so as to only represent translated features, and sparsity is imposed on the primary coefficients using an L1 penalty. The basis pursuit denoising (BP) method may be seen as a special case, in which the auxiliary interpolation functions are omitted, and we thus refer to our methodology as continuous basis pursuit (CBP). We develop two implementations of CBP for a one-dimensional translation-invariant source, one using a first-order Taylor approximation, and another using a form of trigonometric spline. We examine the tradeoff between sparsity and signal reconstruction accuracy in these methods, demonstrating empirically that trigonometric CBP substantially outperforms Taylor CBP, which in turn offers substantial gains over ordinary BP. In addition, the CBP bases can generally achieve equally good or better approximations with much coarser sampling than BP, leading to a reduction in dictionary dimensionality. PMID:24352562
Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data.

PubMed

Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei

2006-06-23

Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.
DCT-based iris recognition.

PubMed

Monro, Donald M; Rakshit, Soumyadip; Zhang, Dexin

2007-04-01

This paper presents a novel iris coding method based on differences of discrete cosine transform (DCT) coefficients of overlapped angular patches from normalized iris images. The feature extraction capabilities of the DCT are optimized on the two largest publicly available iris image data sets, 2,156 images of 308 eyes from the CASIA database and 2,955 images of 150 eyes from the Bath database. On this data, we achieve 100 percent Correct Recognition Rate (CRR) and perfect Receiver-Operating Characteristic (ROC) Curves with no registered false accepts or rejects. Individual feature bit and patch position parameters are optimized for matching through a product-of-sum approach to Hamming distance calculation. For verification, a variable threshold is applied to the distance metric and the False Acceptance Rate (FAR) and False Rejection Rate (FRR) are recorded. A new worst-case metric is proposed for predicting practical system performance in the absence of matching failures, and the worst case theoretical Equal Error Rate (EER) is predicted to be as low as 2.59 x 10(-4) on the available data sets.
Pareto-Optimal Multi-objective Inversion of Geophysical Data

NASA Astrophysics Data System (ADS)

Schnaidt, Sebastian; Conway, Dennis; Krieger, Lars; Heinson, Graham

2018-01-01

In the process of modelling geophysical properties, jointly inverting different data sets can greatly improve model results, provided that the data sets are compatible, i.e., sensitive to similar features. Such a joint inversion requires a relationship between the different data sets, which can either be analytic or structural. Classically, the joint problem is expressed as a scalar objective function that combines the misfit functions of multiple data sets and a joint term which accounts for the assumed connection between the data sets. This approach suffers from two major disadvantages: first, it can be difficult to assess the compatibility of the data sets and second, the aggregation of misfit terms introduces a weighting of the data sets. We present a pareto-optimal multi-objective joint inversion approach based on an existing genetic algorithm. The algorithm treats each data set as a separate objective, avoiding forced weighting and generating curves of the trade-off between the different objectives. These curves are analysed by their shape and evolution to evaluate data set compatibility. Furthermore, the statistical analysis of the generated solution population provides valuable estimates of model uncertainty.
Statistical process control using optimized neural networks: a case study.

PubMed

Addeh, Jalil; Ebrahimzadeh, Ata; Azarbad, Milad; Ranaee, Vahid

2014-09-01

The most common statistical process control (SPC) tools employed for monitoring process changes are control charts. A control chart demonstrates that the process has altered by generating an out-of-control signal. This study investigates the design of an accurate system for the control chart patterns (CCPs) recognition in two aspects. First, an efficient system is introduced that includes two main modules: feature extraction module and classifier module. In the feature extraction module, a proper set of shape features and statistical feature are proposed as the efficient characteristics of the patterns. In the classifier module, several neural networks, such as multilayer perceptron, probabilistic neural network and radial basis function are investigated. Based on an experimental study, the best classifier is chosen in order to recognize the CCPs. Second, a hybrid heuristic recognition system is introduced based on cuckoo optimization algorithm (COA) algorithm to improve the generalization performance of the classifier. The simulation results show that the proposed algorithm has high recognition accuracy. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection

PubMed Central

Thatte, Gautam; Li, Ming; Lee, Sangwon; Emken, B. Adar; Annavaram, Murali; Narayanan, Shrikanth; Spruijt-Metz, Donna; Mitra, Urbashi

2011-01-01

The optimal allocation of samples for physical activity detection in a wireless body area network for health-monitoring is considered. The number of biometric samples collected at the mobile device fusion center, from both device-internal and external Bluetooth heterogeneous sensors, is optimized to minimize the transmission power for a fixed number of samples, and to meet a performance requirement defined using the probability of misclassification between multiple hypotheses. A filter-based feature selection method determines an optimal feature set for classification, and a correlated Gaussian model is considered. Using experimental data from overweight adolescent subjects, it is found that allocating a greater proportion of samples to sensors which better discriminate between certain activity levels can result in either a lower probability of error or energy-savings ranging from 18% to 22%, in comparison to equal allocation of samples. The current activity of the subjects and the performance requirements do not significantly affect the optimal allocation, but employing personalized models results in improved energy-efficiency. As the number of samples is an integer, an exhaustive search to determine the optimal allocation is typical, but computationally expensive. To this end, an alternate, continuous-valued vector optimization is derived which yields approximately optimal allocations and can be implemented on the mobile fusion center due to its significantly lower complexity. PMID:21796237
Criticism of generally accepted fundamentals and methodologies of traffic and transportation theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerner, Boris S.

It is explained why the set of the fundamental empirical features of traffic breakdown (a transition from free flow to congested traffic) should be the empirical basis for any traffic and transportation theory that can be reliable used for control and optimization in traffic networks. It is shown that generally accepted fundamentals and methodologies of traffic and transportation theory are not consistent with the set of the fundamental empirical features of traffic breakdown at a highway bottleneck. To these fundamentals and methodologies of traffic and transportation theory belong (i) Lighthill-Whitham-Richards (LWR) theory, (ii) the General Motors (GM) model class (formore » example, Herman, Gazis et al. GM model, Gipps’s model, Payne’s model, Newell’s optimal velocity (OV) model, Wiedemann’s model, Bando et al. OV model, Treiber’s IDM, Krauß’s model), (iii) the understanding of highway capacity as a particular stochastic value, and (iv) principles for traffic and transportation network optimization and control (for example, Wardrop’s user equilibrium (UE) and system optimum (SO) principles). Alternatively to these generally accepted fundamentals and methodologies of traffic and transportation theory, we discuss three-phase traffic theory as the basis for traffic flow modeling as well as briefly consider the network breakdown minimization (BM) principle for the optimization of traffic and transportation networks with road bottlenecks.« less
Robust Feature Selection Technique using Rank Aggregation.

PubMed

Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep

2014-01-01

Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.
Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis

NASA Astrophysics Data System (ADS)

Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan

2017-10-01

This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.
Evolutionary Algorithm Based Feature Optimization for Multi-Channel EEG Classification.

PubMed

Wang, Yubo; Veluvolu, Kalyana C

2017-01-01

The most BCI systems that rely on EEG signals employ Fourier based methods for time-frequency decomposition for feature extraction. The band-limited multiple Fourier linear combiner is well-suited for such band-limited signals due to its real-time applicability. Despite the improved performance of these techniques in two channel settings, its application in multiple-channel EEG is not straightforward and challenging. As more channels are available, a spatial filter will be required to eliminate the noise and preserve the required useful information. Moreover, multiple-channel EEG also adds the high dimensionality to the frequency feature space. Feature selection will be required to stabilize the performance of the classifier. In this paper, we develop a new method based on Evolutionary Algorithm (EA) to solve these two problems simultaneously. The real-valued EA encodes both the spatial filter estimates and the feature selection into its solution and optimizes it with respect to the classification error. Three Fourier based designs are tested in this paper. Our results show that the combination of Fourier based method with covariance matrix adaptation evolution strategy (CMA-ES) has the best overall performance.
Multi-scale textural feature extraction and particle swarm optimization based model selection for false positive reduction in mammography.

PubMed

Zyout, Imad; Czajkowska, Joanna; Grzegorzek, Marcin

2015-12-01

The high number of false positives and the resulting number of avoidable breast biopsies are the major problems faced by current mammography Computer Aided Detection (CAD) systems. False positive reduction is not only a requirement for mass but also for calcification CAD systems which are currently deployed for clinical use. This paper tackles two problems related to reducing the number of false positives in the detection of all lesions and masses, respectively. Firstly, textural patterns of breast tissue have been analyzed using several multi-scale textural descriptors based on wavelet and gray level co-occurrence matrix. The second problem addressed in this paper is the parameter selection and performance optimization. For this, we adopt a model selection procedure based on Particle Swarm Optimization (PSO) for selecting the most discriminative textural features and for strengthening the generalization capacity of the supervised learning stage based on a Support Vector Machine (SVM) classifier. For evaluating the proposed methods, two sets of suspicious mammogram regions have been used. The first one, obtained from Digital Database for Screening Mammography (DDSM), contains 1494 regions (1000 normal and 494 abnormal samples). The second set of suspicious regions was obtained from database of Mammographic Image Analysis Society (mini-MIAS) and contains 315 (207 normal and 108 abnormal) samples. Results from both datasets demonstrate the efficiency of using PSO based model selection for optimizing both classifier hyper-parameters and parameters, respectively. Furthermore, the obtained results indicate the promising performance of the proposed textural features and more specifically, those based on co-occurrence matrix of wavelet image representation technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling.

PubMed

Tsipouras, Markos G; Exarchos, Themis P; Fotiadis, Dimitrios I; Kotsia, Anna P; Vakalis, Konstantinos V; Naka, Katerina K; Michalis, Lampros K

2008-07-01

A fuzzy rule-based decision support system (DSS) is presented for the diagnosis of coronary artery disease (CAD). The system is automatically generated from an initial annotated dataset, using a four stage methodology: 1) induction of a decision tree from the data; 2) extraction of a set of rules from the decision tree, in disjunctive normal form and formulation of a crisp model; 3) transformation of the crisp set of rules into a fuzzy model; and 4) optimization of the parameters of the fuzzy model. The dataset used for the DSS generation and evaluation consists of 199 subjects, each one characterized by 19 features, including demographic and history data, as well as laboratory examinations. Tenfold cross validation is employed, and the average sensitivity and specificity obtained is 62% and 54%, respectively, using the set of rules extracted from the decision tree (first and second stages), while the average sensitivity and specificity increase to 80% and 65%, respectively, when the fuzzification and optimization stages are used. The system offers several advantages since it is automatically generated, it provides CAD diagnosis based on easily and noninvasively acquired features, and is able to provide interpretation for the decisions made.
Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships.

PubMed

Janet, Jon Paul; Kulik, Heather J

2017-11-22

Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.
Discriminative region extraction and feature selection based on the combination of SURF and saliency

NASA Astrophysics Data System (ADS)

Deng, Li; Wang, Chunhong; Rao, Changhui

2011-08-01

The objective of this paper is to provide a possible optimization on salient region algorithm, which is extensively used in recognizing and learning object categories. Salient region algorithm owns the superiority of intra-class tolerance, global score of features and automatically prominent scale selection under certain range. However, the major limitation behaves on performance, and that is what we attempt to improve. By reducing the number of pixels involved in saliency calculation, it can be accelerated. We use interest points detected by fast-Hessian, the detector of SURF, as the candidate feature for saliency operation, rather than the whole set in image. This implementation is thereby called Saliency based Optimization over SURF (SOSU for short). Experiment shows that bringing in of such a fast detector significantly speeds up the algorithm. Meanwhile, Robustness of intra-class diversity ensures object recognition accuracy.
Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.

PubMed

Daberdaku, Sebastian; Ferrari, Carlo

2018-02-06

The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.

Improving performance of breast cancer risk prediction using a new CAD-based region segmentation scheme

NASA Astrophysics Data System (ADS)

Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Qiu, Yuchen; Zheng, Bin

2018-02-01

Objective of this study is to develop and test a new computer-aided detection (CAD) scheme with improved region of interest (ROI) segmentation combined with an image feature extraction framework to improve performance in predicting short-term breast cancer risk. A dataset involving 570 sets of "prior" negative mammography screening cases was retrospectively assembled. In the next sequential "current" screening, 285 cases were positive and 285 cases remained negative. A CAD scheme was applied to all 570 "prior" negative images to stratify cases into the high and low risk case group of having cancer detected in the "current" screening. First, a new ROI segmentation algorithm was used to automatically remove useless area of mammograms. Second, from the matched bilateral craniocaudal view images, a set of 43 image features related to frequency characteristics of ROIs were initially computed from the discrete cosine transform and spatial domain of the images. Third, a support vector machine model based machine learning classifier was used to optimally classify the selected optimal image features to build a CAD-based risk prediction model. The classifier was trained using a leave-one-case-out based cross-validation method. Applying this improved CAD scheme to the testing dataset, an area under ROC curve, AUC = 0.70+/-0.04, which was significantly higher than using the extracting features directly from the dataset without the improved ROI segmentation step (AUC = 0.63+/-0.04). This study demonstrated that the proposed approach could improve accuracy on predicting short-term breast cancer risk, which may play an important role in helping eventually establish an optimal personalized breast cancer paradigm.
Characterization of coronary plaque regions in intravascular ultrasound images using a hybrid ensemble classifier.

PubMed

Hwang, Yoo Na; Lee, Ju Hwan; Kim, Ga Young; Shin, Eun Seok; Kim, Sung Min

2018-01-01

The purpose of this study was to propose a hybrid ensemble classifier to characterize coronary plaque regions in intravascular ultrasound (IVUS) images. Pixels were allocated to one of four tissues (fibrous tissue (FT), fibro-fatty tissue (FFT), necrotic core (NC), and dense calcium (DC)) through processes of border segmentation, feature extraction, feature selection, and classification. Grayscale IVUS images and their corresponding virtual histology images were acquired from 11 patients with known or suspected coronary artery disease using 20 MHz catheter. A total of 102 hybrid textural features including first order statistics (FOS), gray level co-occurrence matrix (GLCM), extended gray level run-length matrix (GLRLM), Laws, local binary pattern (LBP), intensity, and discrete wavelet features (DWF) were extracted from IVUS images. To select optimal feature sets, genetic algorithm was implemented. A hybrid ensemble classifier based on histogram and texture information was then used for plaque characterization in this study. The optimal feature set was used as input of this ensemble classifier. After tissue characterization, parameters including sensitivity, specificity, and accuracy were calculated to validate the proposed approach. A ten-fold cross validation approach was used to determine the statistical significance of the proposed method. Our experimental results showed that the proposed method had reliable performance for tissue characterization in IVUS images. The hybrid ensemble classification method outperformed other existing methods by achieving characterization accuracy of 81% for FFT and 75% for NC. In addition, this study showed that Laws features (SSV and SAV) were key indicators for coronary tissue characterization. The proposed method had high clinical applicability for image-based tissue characterization. Copyright © 2017 Elsevier B.V. All rights reserved.
A ℓ2, 1 norm regularized multi-kernel learning for false positive reduction in Lung nodule CAD.

PubMed

Cao, Peng; Liu, Xiaoli; Zhang, Jian; Li, Wei; Zhao, Dazhe; Huang, Min; Zaiane, Osmar

2017-03-01

The aim of this paper is to describe a novel algorithm for False Positive Reduction in lung nodule Computer Aided Detection(CAD). In this paper, we describes a new CT lung CAD method which aims to detect solid nodules. Specially, we proposed a multi-kernel classifier with a ℓ 2, 1 norm regularizer for heterogeneous feature fusion and selection from the feature subset level, and designed two efficient strategies to optimize the parameters of kernel weights in non-smooth ℓ 2, 1 regularized multiple kernel learning algorithm. The first optimization algorithm adapts a proximal gradient method for solving the ℓ 2, 1 norm of kernel weights, and use an accelerated method based on FISTA; the second one employs an iterative scheme based on an approximate gradient descent method. The results demonstrates that the FISTA-style accelerated proximal descent method is efficient for the ℓ 2, 1 norm formulation of multiple kernel learning with the theoretical guarantee of the convergence rate. Moreover, the experimental results demonstrate the effectiveness of the proposed methods in terms of Geometric mean (G-mean) and Area under the ROC curve (AUC), and significantly outperforms the competing methods. The proposed approach exhibits some remarkable advantages both in heterogeneous feature subsets fusion and classification phases. Compared with the fusion strategies of feature-level and decision level, the proposed ℓ 2, 1 norm multi-kernel learning algorithm is able to accurately fuse the complementary and heterogeneous feature sets, and automatically prune the irrelevant and redundant feature subsets to form a more discriminative feature set, leading a promising classification performance. Moreover, the proposed algorithm consistently outperforms the comparable classification approaches in the literature. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
xMSanalyzer: automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data.

PubMed

Uppal, Karan; Soltow, Quinlyn A; Strobel, Frederick H; Pittard, W Stephen; Gernert, Kim M; Yu, Tianwei; Jones, Dean P

2013-01-16

Detection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation. xMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites. xMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.
Weighted score-level feature fusion based on Dempster-Shafer evidence theory for action recognition

NASA Astrophysics Data System (ADS)

Zhang, Guoliang; Jia, Songmin; Li, Xiuzhi; Zhang, Xiangyin

2018-01-01

The majority of human action recognition methods use multifeature fusion strategy to improve the classification performance, where the contribution of different features for specific action has not been paid enough attention. We present an extendible and universal weighted score-level feature fusion method using the Dempster-Shafer (DS) evidence theory based on the pipeline of bag-of-visual-words. First, the partially distinctive samples in the training set are selected to construct the validation set. Then, local spatiotemporal features and pose features are extracted from these samples to obtain evidence information. The DS evidence theory and the proposed rule of survival of the fittest are employed to achieve evidence combination and calculate optimal weight vectors of every feature type belonging to each action class. Finally, the recognition results are deduced via the weighted summation strategy. The performance of the established recognition framework is evaluated on Penn Action dataset and a subset of the joint-annotated human metabolome database (sub-JHMDB). The experiment results demonstrate that the proposed feature fusion method can adequately exploit the complementarity among multiple features and improve upon most of the state-of-the-art algorithms on Penn Action and sub-JHMDB datasets.
Paroxysmal atrial fibrillation prediction based on HRV analysis and non-dominated sorting genetic algorithm III.

PubMed

Boon, K H; Khalil-Hani, M; Malarvili, M B

2018-01-01

This paper presents a method that able to predict the paroxysmal atrial fibrillation (PAF). The method uses shorter heart rate variability (HRV) signals when compared to existing methods, and achieves good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to electrically stabilize and prevent the onset of atrial arrhythmias with different pacing techniques. We propose a multi-objective optimization algorithm based on the non-dominated sorting genetic algorithm III for optimizing the baseline PAF prediction system, that consists of the stages of pre-processing, HRV feature extraction, and support vector machine (SVM) model. The pre-processing stage comprises of heart rate correction, interpolation, and signal detrending. After that, time-domain, frequency-domain, non-linear HRV features are extracted from the pre-processed data in feature extraction stage. Then, these features are used as input to the SVM for predicting the PAF event. The proposed optimization algorithm is used to optimize the parameters and settings of various HRV feature extraction algorithms, select the best feature subsets, and tune the SVM parameters simultaneously for maximum prediction performance. The proposed method achieves an accuracy rate of 87.7%, which significantly outperforms most of the previous works. This accuracy rate is achieved even with the HRV signal length being reduced from the typical 30 min to just 5 min (a reduction of 83%). Furthermore, another significant result is the sensitivity rate, which is considered more important that other performance metrics in this paper, can be improved with the trade-off of lower specificity. Copyright © 2017 Elsevier B.V. All rights reserved.
Validation of the SimSET simulation package for modeling the Siemens Biograph mCT PET scanner

NASA Astrophysics Data System (ADS)

Poon, Jonathan K.; Dahlbom, Magnus L.; Casey, Michael E.; Qi, Jinyi; Cherry, Simon R.; Badawi, Ramsey D.

2015-02-01

Monte Carlo simulation provides a valuable tool in performance assessment and optimization of system design parameters for PET scanners. SimSET is a popular Monte Carlo simulation toolkit that features fast simulation time, as well as variance reduction tools to further enhance computational efficiency. However, SimSET has lacked the ability to simulate block detectors until its most recent release. Our goal is to validate new features of SimSET by developing a simulation model of the Siemens Biograph mCT PET scanner and comparing the results to a simulation model developed in the GATE simulation suite and to experimental results. We used the NEMA NU-2 2007 scatter fraction, count rates, and spatial resolution protocols to validate the SimSET simulation model and its new features. The SimSET model overestimated the experimental results of the count rate tests by 11-23% and the spatial resolution test by 13-28%, which is comparable to previous validation studies of other PET scanners in the literature. The difference between the SimSET and GATE simulation was approximately 4-8% for the count rate test and approximately 3-11% for the spatial resolution test. In terms of computational time, SimSET performed simulations approximately 11 times faster than GATE simulations. The new block detector model in SimSET offers a fast and reasonably accurate simulation toolkit for PET imaging applications.
Real-time parameter optimization based on neural network for smart injection molding

NASA Astrophysics Data System (ADS)

Lee, H.; Liau, Y.; Ryu, K.

2018-03-01

The manufacturing industry has been facing several challenges, including sustainability, performance and quality of production. Manufacturers attempt to enhance the competitiveness of companies by implementing CPS (Cyber-Physical Systems) through the convergence of IoT(Internet of Things) and ICT(Information & Communication Technology) in the manufacturing process level. Injection molding process has a short cycle time and high productivity. This features have been making it suitable for mass production. In addition, this process is used to produce precise parts in various industry fields such as automobiles, optics and medical devices. Injection molding process has a mixture of discrete and continuous variables. In order to optimized the quality, variables that is generated in the injection molding process must be considered. Furthermore, Optimal parameter setting is time-consuming work to predict the optimum quality of the product. Since the process parameter cannot be easily corrected during the process execution. In this research, we propose a neural network based real-time process parameter optimization methodology that sets optimal process parameters by using mold data, molding machine data, and response data. This paper is expected to have academic contribution as a novel study of parameter optimization during production compare with pre - production parameter optimization in typical studies.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

PubMed

Ni, Qianwu; Chen, Lei

2017-01-01

Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Going Deeper With Contextual CNN for Hyperspectral Image Classification.

PubMed

Lee, Hyungtae; Kwon, Heesung

2017-10-01

In this paper, we describe a novel deep convolutional neural network (CNN) that is deeper and wider than other existing deep networks for hyperspectral image classification. Unlike current state-of-the-art approaches in CNN-based hyperspectral image classification, the proposed network, called contextual deep CNN, can optimally explore local contextual interactions by jointly exploiting local spatio-spectral relationships of neighboring individual pixel vectors. The joint exploitation of the spatio-spectral information is achieved by a multi-scale convolutional filter bank used as an initial component of the proposed CNN pipeline. The initial spatial and spectral feature maps obtained from the multi-scale filter bank are then combined together to form a joint spatio-spectral feature map. The joint feature map representing rich spectral and spatial properties of the hyperspectral image is then fed through a fully convolutional network that eventually predicts the corresponding label of each pixel vector. The proposed approach is tested on three benchmark data sets: the Indian Pines data set, the Salinas data set, and the University of Pavia data set. Performance comparison shows enhanced classification performance of the proposed approach over the current state-of-the-art on the three data sets.
ECG based Myocardial Infarction detection using Hybrid Firefly Algorithm.

PubMed

Kora, Padmavathi

2017-12-01

Myocardial Infarction (MI) is one of the most frequent diseases, and can also cause demise, disability and monetary loss in patients who suffer from cardiovascular disorder. Diagnostic methods of this ailment by physicians are typically invasive, even though they do not fulfill the required detection accuracy. Recent feature extraction methods, for example, Auto Regressive (AR) modelling; Magnitude Squared Coherence (MSC); Wavelet Coherence (WTC) using Physionet database, yielded a collection of huge feature set. A large number of these features may be inconsequential containing some excess and non-discriminative components that present excess burden in computation and loss of execution performance. So Hybrid Firefly and Particle Swarm Optimization (FFPSO) is directly used to optimise the raw ECG signal instead of extracting features using the above feature extraction techniques. Provided results in this paper show that, for the detection of MI class, the FFPSO algorithm with ANN gives 99.3% accuracy, sensitivity of 99.97%, and specificity of 98.7% on MIT-BIH database by including NSR database also. The proposed approach has shown that methods that are based on the feature optimization of the ECG signals are the perfect to diagnosis the condition of the heart patients. Copyright © 2017 Elsevier B.V. All rights reserved.
Modeling OPC complexity for design for manufacturability

NASA Astrophysics Data System (ADS)

Gupta, Puneet; Kahng, Andrew B.; Muddu, Swamy; Nakagawa, Sam; Park, Chul-Hong

2005-11-01

Increasing design complexity in sub-90nm designs results in increased mask complexity and cost. Resolution enhancement techniques (RET) such as assist feature addition, phase shifting (attenuated PSM) and aggressive optical proximity correction (OPC) help in preserving feature fidelity in silicon but increase mask complexity and cost. Data volume increase with rise in mask complexity is becoming prohibitive for manufacturing. Mask cost is determined by mask write time and mask inspection time, which are directly related to the complexity of features printed on the mask. Aggressive RET increase complexity by adding assist features and by modifying existing features. Passing design intent to OPC has been identified as a solution for reducing mask complexity and cost in several recent works. The goal of design-aware OPC is to relax OPC tolerances of layout features to minimize mask cost, without sacrificing parametric yield. To convey optimal OPC tolerances for manufacturing, design optimization should drive OPC tolerance optimization using models of mask cost for devices and wires. Design optimization should be aware of impact of OPC correction levels on mask cost and performance of the design. This work introduces mask cost characterization (MCC) that quantifies OPC complexity, measured in terms of fracture count of the mask, for different OPC tolerances. MCC with different OPC tolerances is a critical step in linking design and manufacturing. In this paper, we present a MCC methodology that provides models of fracture count of standard cells and wire patterns for use in design optimization. MCC cannot be performed by designers as they do not have access to foundry OPC recipes and RET tools. To build a fracture count model, we perform OPC and fracturing on a limited set of standard cells and wire configurations with all tolerance combinations. Separately, we identify the characteristics of the layout that impact fracture count. Based on the fracture count (FC) data from OPC and mask data preparation runs, we build models of FC as function of OPC tolerances and layout parameters.
Paving the COWpath: data-driven design of pediatric order sets

PubMed Central

Zhang, Yiye; Padman, Rema; Levin, James E

2014-01-01

Objective Evidence indicates that users incur significant physical and cognitive costs in the use of order sets, a core feature of computerized provider order entry systems. This paper develops data-driven approaches for automating the construction of order sets that match closely with user preferences and workflow while minimizing physical and cognitive workload. Materials and methods We developed and tested optimization-based models embedded with clustering techniques using physical and cognitive click cost criteria. By judiciously learning from users’ actual actions, our methods identify items for constituting order sets that are relevant according to historical ordering data and grouped on the basis of order similarity and ordering time. We evaluated performance of the methods using 47 099 orders from the year 2011 for asthma, appendectomy and pneumonia management in a pediatric inpatient setting. Results In comparison with existing order sets, those developed using the new approach significantly reduce the physical and cognitive workload associated with usage by 14–52%. This approach is also capable of accommodating variations in clinical conditions that affect order set usage and development. Discussion There is a critical need to investigate the cognitive complexity imposed on users by complex clinical information systems, and to design their features according to ‘human factors’ best practices. Optimizing order set generation using cognitive cost criteria introduces a new approach that can potentially improve ordering efficiency, reduce unintended variations in order placement, and enhance patient safety. Conclusions We demonstrate that data-driven methods offer a promising approach for designing order sets that are generalizable, data-driven, condition-based, and up to date with current best practices. PMID:24674844
A ranking algorithm for spacelab crew and experiment scheduling

NASA Technical Reports Server (NTRS)

Grone, R. D.; Mathis, F. H.

1980-01-01

The problem of obtaining an optimal or near optimal schedule for scientific experiments to be performed on Spacelab missions is addressed. The current capabilities in this regard are examined and a method of ranking experiments in order of difficulty is developed to support the existing software. Experimental data is obtained from applying this method to the sets of experiments corresponding to Spacelab mission 1, 2, and 3. Finally, suggestions are made concerning desirable modifications and features of second generation software being developed for this problem.
Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and Adaboost-based classification

NASA Astrophysics Data System (ADS)

Charfi, Imen; Miteran, Johel; Dubois, Julien; Atri, Mohamed; Tourki, Rached

2013-10-01

We propose a supervised approach to detect falls in a home environment using an optimized descriptor adapted to real-time tasks. We introduce a realistic dataset of 222 videos, a new metric allowing evaluation of fall detection performance in a video stream, and an automatically optimized set of spatio-temporal descriptors which fed a supervised classifier. We build the initial spatio-temporal descriptor named STHF using several combinations of transformations of geometrical features (height and width of human body bounding box, the user's trajectory with her/his orientation, projection histograms, and moments of orders 0, 1, and 2). We study the combinations of usual transformations of the features (Fourier transform, wavelet transform, first and second derivatives), and we show experimentally that it is possible to achieve high performance using support vector machine and Adaboost classifiers. Automatic feature selection allows to show that the best tradeoff between classification performance and processing time is obtained by combining the original low-level features with their first derivative. Hence, we evaluate the robustness of the fall detection regarding location changes. We propose a realistic and pragmatic protocol that enables performance to be improved by updating the training in the current location with normal activities records.
Policy tree optimization for adaptive management of water resources systems

NASA Astrophysics Data System (ADS)

Herman, Jonathan; Giuliani, Matteo

2017-04-01

Water resources systems must cope with irreducible uncertainty in supply and demand, requiring policy alternatives capable of adapting to a range of possible future scenarios. Recent studies have developed adaptive policies based on "signposts" or "tipping points" that suggest the need of updating the policy. However, there remains a need for a general method to optimize the choice of the signposts to be used and their threshold values. This work contributes a general framework and computational algorithm to design adaptive policies as a tree structure (i.e., a hierarchical set of logical rules) using a simulation-optimization approach based on genetic programming. Given a set of feature variables (e.g., reservoir level, inflow observations, inflow forecasts), the resulting policy defines both the optimal reservoir operations and the conditions under which such operations should be triggered. We demonstrate the approach using Folsom Reservoir (California) as a case study, in which operating policies must balance the risk of both floods and droughts. Numerical results show that the tree-based policies outperform the ones designed via Dynamic Programming. In addition, they display good adaptive capacity to the changing climate, successfully adapting the reservoir operations across a large set of uncertain climate scenarios.
Multi-probe-based resonance-frequency electrical impedance spectroscopy for detection of suspicious breast lesions: improving performance using partial ROC optimization

NASA Astrophysics Data System (ADS)

Lederman, Dror; Zheng, Bin; Wang, Xingwei; Wang, Xiao Hui; Gur, David

2011-03-01

We have developed a multi-probe resonance-frequency electrical impedance spectroscope (REIS) system to detect breast abnormalities. Based on assessing asymmetry in REIS signals acquired between left and right breasts, we developed several machine learning classifiers to classify younger women (i.e., under 50YO) into two groups of having high and low risk for developing breast cancer. In this study, we investigated a new method to optimize performance based on the area under a selected partial receiver operating characteristic (ROC) curve when optimizing an artificial neural network (ANN), and tested whether it could improve classification performance. From an ongoing prospective study, we selected a dataset of 174 cases for whom we have both REIS signals and diagnostic status verification. The dataset includes 66 "positive" cases recommended for biopsy due to detection of highly suspicious breast lesions and 108 "negative" cases determined by imaging based examinations. A set of REIS-based feature differences, extracted from the two breasts using a mirror-matched approach, was computed and constituted an initial feature pool. Using a leave-one-case-out cross-validation method, we applied a genetic algorithm (GA) to train the ANN with an optimal subset of features. Two optimization criteria were separately used in GA optimization, namely the area under the entire ROC curve (AUC) and the partial area under the ROC curve, up to a predetermined threshold (i.e., 90% specificity). The results showed that although the ANN optimized using the entire AUC yielded higher overall performance (AUC = 0.83 versus 0.76), the ANN optimized using the partial ROC area criterion achieved substantially higher operational performance (i.e., increasing sensitivity level from 28% to 48% at 95% specificity and/ or from 48% to 58% at 90% specificity).
Proportional plus integral MIMO controller for regulation and tracking with anti-wind-up features

DOE Office of Scientific and Technical Information (OSTI.GOV)

Puleston, P.F.; Mantz, R.J.

1993-11-01

A proportional plus integral matrix control structure for MIMO systems is proposed. Based on a standard optimal control structure with integral action, it permits a greater degree of independence of the design and tuning of the regulating and tracking features, without considerably increasing the controller complexity. Fast recovery from load disturbances is achieved, while large overshoots associated with set-point changes and reset wind-up problems can be reduced. A simple effective procedure for practical tuning is introduced.
Hypergraph Based Feature Selection Technique for Medical Diagnosis.

PubMed

Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar

2016-11-01

The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.
Automatic tissue characterization from ultrasound imagery

NASA Astrophysics Data System (ADS)

Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.

1993-08-01

In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.

PrAS: Prediction of amidation sites using multiple feature extraction.

PubMed

Wang, Tong; Zheng, Wei; Wuyun, Qiqige; Wu, Zhenfeng; Ruan, Jishou; Hu, Gang; Gao, Jianzhao

2017-02-01

Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg. Copyright © 2016 Elsevier Ltd. All rights reserved.
Evolutionary computation applied to the reconstruction of 3-D surface topography in the SEM.

PubMed

Kodama, Tetsuji; Li, Xiaoyuan; Nakahira, Kenji; Ito, Dai

2005-10-01

A genetic algorithm has been applied to the line profile reconstruction from the signals of the standard secondary electron (SE) and/or backscattered electron detectors in a scanning electron microscope. This method solves the topographical surface reconstruction problem as one of combinatorial optimization. To extend this optimization approach for three-dimensional (3-D) surface topography, this paper considers the use of a string coding where a 3-D surface topography is represented by a set of coordinates of vertices. We introduce the Delaunay triangulation, which attains the minimum roughness for any set of height data to capture the fundamental features of the surface being probed by an electron beam. With this coding, the strings are processed with a class of hybrid optimization algorithms that combine genetic algorithms and simulated annealing algorithms. Experimental results on SE images are presented.
DCTune Perceptual Optimization of Compressed Dental X-Rays

NASA Technical Reports Server (NTRS)

Watson, Andrew B.; Null, Cynthia H. (Technical Monitor)

1997-01-01

In current dental practice, x-rays of completed dental work are often sent to the insurer for verification. It is faster and cheaper to transmit instead digital scans of the x-rays. Further economies result if the images are sent in compressed form. DCtune is a technology for optimizing DCT quantization matrices to yield maximum perceptual quality for a given bit-rate, or minimum bit-rate for a given perceptual quality. In addition, the technology provides a means of setting the perceptual quality of compressed imagery in a systematic way. The purpose of this research was, with respect to dental x-rays: (1) to verify the advantage of DCTune over standard JPEG; (2) to verify the quality control feature of DCTune; and (3) to discover regularities in the optimized matrices of a set of images. Additional information is contained in the original extended abstract.
Support Vector Data Description Model to Map Specific Land Cover with Optimal Parameters Determined from a Window-Based Validation Set.

PubMed

Zhang, Jinshui; Yuan, Zhoumiqi; Shuai, Guanyuan; Pan, Yaozhong; Zhu, Xiufang

2017-04-26

This paper developed an approach, the window-based validation set for support vector data description (WVS-SVDD), to determine optimal parameters for support vector data description (SVDD) model to map specific land cover by integrating training and window-based validation sets. Compared to the conventional approach where the validation set included target and outlier pixels selected visually and randomly, the validation set derived from WVS-SVDD constructed a tightened hypersphere because of the compact constraint by the outlier pixels which were located neighboring to the target class in the spectral feature space. The overall accuracies for wheat and bare land achieved were as high as 89.25% and 83.65%, respectively. However, target class was underestimated because the validation set covers only a small fraction of the heterogeneous spectra of the target class. The different window sizes were then tested to acquire more wheat pixels for validation set. The results showed that classification accuracy increased with the increasing window size and the overall accuracies were higher than 88% at all window size scales. Moreover, WVS-SVDD showed much less sensitivity to the untrained classes than the multi-class support vector machine (SVM) method. Therefore, the developed method showed its merits using the optimal parameters, tradeoff coefficient ( C ) and kernel width ( s ), in mapping homogeneous specific land cover.
Structure and weights optimisation of a modified Elman network emotion classifier using hybrid computational intelligence algorithms: a comparative study

NASA Astrophysics Data System (ADS)

Sheikhan, Mansour; Abbasnezhad Arabi, Mahdi; Gharavian, Davood

2015-10-01

Artificial neural networks are efficient models in pattern recognition applications, but their performance is dependent on employing suitable structure and connection weights. This study used a hybrid method for obtaining the optimal weight set and architecture of a recurrent neural emotion classifier based on gravitational search algorithm (GSA) and its binary version (BGSA), respectively. By considering the features of speech signal that were related to prosody, voice quality, and spectrum, a rich feature set was constructed. To select more efficient features, a fast feature selection method was employed. The performance of the proposed hybrid GSA-BGSA method was compared with similar hybrid methods based on particle swarm optimisation (PSO) algorithm and its binary version, PSO and discrete firefly algorithm, and hybrid of error back-propagation and genetic algorithm that were used for optimisation. Experimental tests on Berlin emotional database demonstrated the superior performance of the proposed method using a lighter network structure.
Improving the Computational Effort of Set-Inversion-Based Prandial Insulin Delivery for Its Integration in Insulin Pumps

PubMed Central

León-Vargas, Fabian; Calm, Remei; Bondia, Jorge; Vehí, Josep

2012-01-01

Objective Set-inversion-based prandial insulin delivery is a new model-based bolus advisor for postprandial glucose control in type 1 diabetes mellitus (T1DM). It automatically coordinates the values of basal–bolus insulin to be infused during the postprandial period so as to achieve some predefined control objectives. However, the method requires an excessive computation time to compute the solution set of feasible insulin profiles, which impedes its integration into an insulin pump. In this work, a new algorithm is presented, which reduces computation time significantly and enables the integration of this new bolus advisor into current processing features of smart insulin pumps. Methods A new strategy was implemented that focused on finding the combined basal–bolus solution of interest rather than an extensive search of the feasible set of solutions. Analysis of interval simulations, inclusion of physiological assumptions, and search domain contractions were used. Data from six real patients with T1DM were used to compare the performance between the optimized and the conventional computations. Results In all cases, the optimized version yielded the basal–bolus combination recommended by the conventional method and in only 0.032% of the computation time. Simulations show that the mean number of iterations for the optimized computation requires approximately 3.59 s at 20 MHz processing power, in line with current features of smart pumps. Conclusions A computationally efficient method for basal–bolus coordination in postprandial glucose control has been presented and tested. The results indicate that an embedded algorithm within smart insulin pumps is now feasible. Nonetheless, we acknowledge that a clinical trial will be needed in order to justify this claim. PMID:23294789
Spectral feature design in high dimensional multispectral data

NASA Technical Reports Server (NTRS)

Chen, Chih-Chien Thomas; Landgrebe, David A.

1988-01-01

The High resolution Imaging Spectrometer (HIRIS) is designed to acquire images simultaneously in 192 spectral bands in the 0.4 to 2.5 micrometers wavelength region. It will make possible the collection of essentially continuous reflectance spectra at a spectral resolution sufficient to extract significantly enhanced amounts of information from return signals as compared to existing systems. The advantages of such high dimensional data come at a cost of increased system and data complexity. For example, since the finer the spectral resolution, the higher the data rate, it becomes impractical to design the sensor to be operated continuously. It is essential to find new ways to preprocess the data which reduce the data rate while at the same time maintaining the information content of the high dimensional signal produced. Four spectral feature design techniques are developed from the Weighted Karhunen-Loeve Transforms: (1) non-overlapping band feature selection algorithm; (2) overlapping band feature selection algorithm; (3) Walsh function approach; and (4) infinite clipped optimal function approach. The infinite clipped optimal function approach is chosen since the features are easiest to find and their classification performance is the best. After the preprocessed data has been received at the ground station, canonical analysis is further used to find the best set of features under the criterion that maximal class separability is achieved. Both 100 dimensional vegetation data and 200 dimensional soil data were used to test the spectral feature design system. It was shown that the infinite clipped versions of the first 16 optimal features had excellent classification performance. The overall probability of correct classification is over 90 percent while providing for a reduced downlink data rate by a factor of 10.
Deploying response surface methodology (RSM) and glowworm swarm optimization (GSO) in optimizing warpage on a mobile phone cover

NASA Astrophysics Data System (ADS)

Lee, X. N.; Fathullah, M.; Shayfull, Z.; Nasir, S. M.; Hazwan, M. H. M.; Shazzuan, S.

2017-09-01

Plastic injection moulding is a popular manufacturing method not only it is reliable, but also efficient and cost saving. It able to produce plastic part with detailed features and complex geometry. However, defects in injection moulding process degrades the quality and aesthetic of the injection moulded product. The most common defect occur in the process is warpage. Inappropriate process parameter setting of injection moulding machine is one of the reason that leads to the occurrence of warpage. The aims of this study were to improve the quality of injection moulded part by investigating the optimal parameters in minimizing warpage using Response Surface Methodology (RSM) and Glowworm Swarm Optimization (GSO). Subsequent to this, the most significant parameter was identified and recommended parameters setting was compared with the optimized parameter setting using RSM and GSO. In this research, the mobile phone case was selected as case study. The mould temperature, melt temperature, packing pressure, packing time and cooling time were selected as variables whereas warpage in y-direction was selected as responses in this research. The simulation was carried out by using Autodesk Moldflow Insight 2012. In addition, the RSM was performed by using Design Expert 7.0 whereas the GSO was utilized by using MATLAB. The warpage in y direction recommended by RSM were reduced by 70 %. The warpages recommended by GSO were decreased by 61 % in y direction. The resulting warpages under optimal parameter setting by RSM and GSO were validated by simulation in AMI 2012. RSM performed better than GSO in solving warpage issue.
Design of an Adaptive Human-Machine System Based on Dynamical Pattern Recognition of Cognitive Task-Load.

PubMed

Zhang, Jianhua; Yin, Zhong; Wang, Rubin

2017-01-01

This paper developed a cognitive task-load (CTL) classification algorithm and allocation strategy to sustain the optimal operator CTL levels over time in safety-critical human-machine integrated systems. An adaptive human-machine system is designed based on a non-linear dynamic CTL classifier, which maps a set of electroencephalogram (EEG) and electrocardiogram (ECG) related features to a few CTL classes. The least-squares support vector machine (LSSVM) is used as dynamic pattern classifier. A series of electrophysiological and performance data acquisition experiments were performed on seven volunteer participants under a simulated process control task environment. The participant-specific dynamic LSSVM model is constructed to classify the instantaneous CTL into five classes at each time instant. The initial feature set, comprising 56 EEG and ECG related features, is reduced to a set of 12 salient features (including 11 EEG-related features) by using the locality preserving projection (LPP) technique. An overall correct classification rate of about 80% is achieved for the 5-class CTL classification problem. Then the predicted CTL is used to adaptively allocate the number of process control tasks between operator and computer-based controller. Simulation results showed that the overall performance of the human-machine system can be improved by using the adaptive automation strategy proposed.
Systematic feature selection improves accuracy of methylation-based forensic age estimation in Han Chinese males.

PubMed

Feng, Lei; Peng, Fuduan; Li, Shanfei; Jiang, Li; Sun, Hui; Ji, Anquan; Zeng, Changqing; Li, Caixia; Liu, Fan

2018-03-23

Estimating individual age from biomarkers may provide key information facilitating forensic investigations. Recent progress has shown DNA methylation at age-associated CpG sites as the most informative biomarkers for estimating the individual age of an unknown donor. Optimal feature selection plays a critical role in determining the performance of the final prediction model. In this study we investigate methylation levels at 153 age-associated CpG sites from 21 previously reported genomic regions using the EpiTYPER system for their predictive power on individual age in 390 Han Chinese males ranging from 15 to 75 years of age. We conducted a systematic feature selection using a stepwise backward multiple linear regression analysis as well as an exhaustive searching algorithm. Both approaches identified the same subset of 9 CpG sites, which in linear combination provided the optimal model fitting with mean absolute deviation (MAD) of 2.89 years of age and explainable variance (R 2 ) of 0.92. The final model was validated in two independent Han Chinese male samples (validation set 1, N = 65, MAD = 2.49, R 2  = 0.95, and validation set 2, N = 62, MAD = 3.36, R 2  = 0.89). Other competing models such as support vector machine and artificial neural network did not outperform the linear model to any noticeable degree. The validation set 1 was additionally analyzed using Pyrosequencing technology for cross-platform validation and was termed as validation set 3. Directly applying our model, in which the methylation levels were detected by the EpiTYPER system, to the data from pyrosequencing technology showed, however, less accurate results in terms of MAD (validation set 3, N = 65 Han Chinese males, MAD = 4.20, R 2  = 0.93), suggesting the presence of a batch effect between different data generation platforms. This batch effect could be partially overcome by a z-score transformation (MAD = 2.76, R 2  = 0.93). Overall, our systematic feature selection identified 9 CpG sites as the optimal subset for forensic age estimation and the prediction model consisting of these 9 markers demonstrated high potential in forensic practice. An age estimator implementing our prediction model allowing missing markers is freely available at http://liufan.big.ac.cn/AgePrediction. Copyright © 2018 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuo, J; Su, K; Department of Radiology, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, Ohio

Purpose: Accurate and robust photon attenuation derived from MR is essential for PET/MR and MR-based radiation treatment planning applications. Although the fuzzy C-means (FCM) algorithm has been applied for pseudo-CT generation, the input feature combination and the number of clusters have not been optimized. This study aims to optimize both for clinically practical pseudo-CT generation. Methods: Nine volunteers were recruited. A 190-second, single-acquisition UTE-mDixon with 25% (angular) sampling and 3D radial readout was performed to acquire three primitive MR features at TEs of 0.1, 1.5, and 2.8 ms: the free-induction-decay (FID), the first and the second echo images. Three derivedmore » images, Dixon-fat and Dixon-water generated by two-point Dixon water/fat separation, and R2* (1/T2*) map, were also created. To identify informative inputs for generating a pseudo-CT image volume, all 63 combinations, choosing one to six of the feature images, were used as inputs to FCM for pseudo-CT generation. Further, the number of clusters was varied from four to seven to find the optimal approach. Mean prediction deviation (MPD), mean absolute prediction deviation (MAPD), and correlation coefficient (R) of different combinations were compared for feature selection. Results: Among the 63 feature combinations, the four that resulted in the best MAPD and R were further compared along with the set containing all six features. The results suggested that R2* and Dixon-water are the most informative features. Further, including FID also improved the performance of pseudo-CT generation. Consequently, the set containing FID, Dixon-water, and R2* resulted in the most accurate, robust pseudo-CT when the number of cluster equals to five (5C). The clusters were interpreted as air, fat, bone, brain, and fluid. The six-cluster Result additionally included bone marrow. Conclusion: The results suggested that FID, Dixon-water, R2* are the most important features. The findings can be used to facilitate pseudo-CT generation for unsupervised clustering. Please note that the project was completed with partial funding from the Ohio Department of Development grant TECH 11-063 and a sponsored research agreement with Philips Healthcare that is managed by Case Western Reserve University. As noted in the affiliations, some of the authors are Philips employees.« less
Cellular neural network-based hybrid approach toward automatic image registration

NASA Astrophysics Data System (ADS)

Arun, Pattathal VijayaKumar; Katiyar, Sunil Kumar

2013-01-01

Image registration is a key component of various image processing operations that involve the analysis of different image data sets. Automatic image registration domains have witnessed the application of many intelligent methodologies over the past decade; however, inability to properly model object shape as well as contextual information has limited the attainable accuracy. A framework for accurate feature shape modeling and adaptive resampling using advanced techniques such as vector machines, cellular neural network (CNN), scale invariant feature transform (SIFT), coreset, and cellular automata is proposed. CNN has been found to be effective in improving feature matching as well as resampling stages of registration and complexity of the approach has been considerably reduced using coreset optimization. The salient features of this work are cellular neural network approach-based SIFT feature point optimization, adaptive resampling, and intelligent object modelling. Developed methodology has been compared with contemporary methods using different statistical measures. Investigations over various satellite images revealed that considerable success was achieved with the approach. This system has dynamically used spectral and spatial information for representing contextual knowledge using CNN-prolog approach. This methodology is also illustrated to be effective in providing intelligent interpretation and adaptive resampling.
Vehicle Mode and Driving Activity Detection Based on Analyzing Sensor Data of Smartphones.

PubMed

Lu, Dang-Nhac; Nguyen, Duc-Nhan; Nguyen, Thi-Hau; Nguyen, Ha-Nam

2018-03-29

In this paper, we present a flexible combined system, namely the Vehicle mode-driving Activity Detection System (VADS), that is capable of detecting either the current vehicle mode or the current driving activity of travelers. Our proposed system is designed to be lightweight in computation and very fast in response to the changes of travelers' vehicle modes or driving events. The vehicle mode detection module is responsible for recognizing both motorized vehicles, such as cars, buses, and motorbikes, and non-motorized ones, for instance, walking, and bikes. It relies only on accelerometer data in order to minimize the energy consumption of smartphones. By contrast, the driving activity detection module uses the data collected from the accelerometer, gyroscope, and magnetometer of a smartphone to detect various driving activities, i.e., stopping, going straight, turning left, and turning right. Furthermore, we propose a method to compute the optimized data window size and the optimized overlapping ratio for each vehicle mode and each driving event from the training datasets. The experimental results show that this strategy significantly increases the overall prediction accuracy. Additionally, numerous experiments are carried out to compare the impact of different feature sets (time domain features, frequency domain features, Hjorth features) as well as the impact of various classification algorithms (Random Forest, Naïve Bayes, Decision tree J48, K Nearest Neighbor, Support Vector Machine) contributing to the prediction accuracy. Our system achieves an average accuracy of 98.33% in detecting the vehicle modes and an average accuracy of 98.95% in recognizing the driving events of motorcyclists when using the Random Forest classifier and a feature set containing time domain features, frequency domain features, and Hjorth features. Moreover, on a public dataset of HTC company in New Taipei, Taiwan, our framework obtains the overall accuracy of 97.33% that is considerably higher than that of the state-of the art.
CIRCAL-2 - General-purpose on-line circuit design.

NASA Technical Reports Server (NTRS)

Dertouzos, M. L.; Jessel, G. P.; Stinger, J. R.

1972-01-01

CIRCAL-2 is a second-generation general-purpose on-line circuit-design program with the following main features: (1) multiple-analysis capability; (2) uniform and general data structures for handling text editing, network representations, and output results, regardless of analysis; (3) special techniques and structures for minimizing and controlling user-program interaction; (4) use of functionals for the description of hysteresis and heat effects; and (5) ability to define optimization procedures that 'replace' the user. The paper discusses the organization of CIRCAL-2, the aforementioned main features, and their consequences, such as a set of network elements and models general enough for most analyses and a set of functions tailored to circuit-design requirements. The presentation is descriptive, concentrating on conceptual rather than on program implementation details.
Constrained clusters of gene expression profiles with pathological features.

PubMed

Sese, Jun; Kurokawa, Yukinori; Monden, Morito; Kato, Kikuya; Morishita, Shinichi

2004-11-22

Gene expression profiles should be useful in distinguishing variations in disease, since they reflect accurately the status of cells. The primary clustering of gene expression reveals the genotypes that are responsible for the proximity of members within each cluster, while further clustering elucidates the pathological features of the individual members of each cluster. However, since the first clustering process and the second classification step, in which the features are associated with clusters, are performed independently, the initial set of clusters may omit genes that are associated with pathologically meaningful features. Therefore, it is important to devise a way of identifying gene expression clusters that are associated with pathological features. We present the novel technique of 'itemset constrained clustering' (IC-Clustering), which computes the optimal cluster that maximizes the interclass variance of gene expression between groups, which are divided according to the restriction that only divisions that can be expressed using common features are allowed. This constraint automatically labels each cluster with a set of pathological features which characterize that cluster. When applied to liver cancer datasets, IC-Clustering revealed informative gene expression clusters, which could be annotated with various pathological features, such as 'tumor' and 'man', or 'except tumor' and 'normal liver function'. In contrast, the k-means method overlooked these clusters.
Personalized features for attention detection in children with Attention Deficit Hyperactivity Disorder.

PubMed

Fahimi, Fatemeh; Guan, Cuntai; Wooi Boon Goh; Kai Keng Ang; Choon Guan Lim; Tih Shih Lee

2017-07-01

Measuring attention from electroencephalogram (EEG) has found applications in the treatment of Attention Deficit Hyperactivity Disorder (ADHD). It is of great interest to understand what features in EEG are most representative of attention. Intensive research has been done in the past and it has been proven that frequency band powers and their ratios are effective features in detecting attention. However, there are still unanswered questions, like, what features in EEG are most discriminative between attentive and non-attentive states? Are these features common among all subjects or are they subject-specific and must be optimized for each subject? Using Mutual Information (MI) to perform subject-specific feature selection on a large data set including 120 ADHD children, we found that besides theta beta ratio (TBR) which is commonly used in attention detection and neurofeedback, the relative beta power and theta/(alpha+beta) (TBAR) are also equally significant and informative for attention detection. Interestingly, we found that the relative theta power (which is also commonly used) may not have sufficient discriminative information itself (it is informative only for 3.26% of ADHD children). We have also demonstrated that although these features (relative beta power, TBR and TBAR) are the most important measures to detect attention on average, different subjects have different set of most discriminative features.
Hybrid machine learning technique for forecasting Dhaka stock market timing decisions.

PubMed

Banik, Shipra; Khodadad Khan, A F M; Anwer, Mohammad

2014-01-01

Forecasting stock market has been a difficult job for applied researchers owing to nature of facts which is very noisy and time varying. However, this hypothesis has been featured by several empirical experiential studies and a number of researchers have efficiently applied machine learning techniques to forecast stock market. This paper studied stock prediction for the use of investors. It is always true that investors typically obtain loss because of uncertain investment purposes and unsighted assets. This paper proposes a rough set model, a neural network model, and a hybrid neural network and rough set model to find optimal buy and sell of a share on Dhaka stock exchange. Investigational findings demonstrate that our proposed hybrid model has higher precision than the single rough set model and the neural network model. We believe this paper findings will help stock investors to decide about optimal buy and/or sell time on Dhaka stock exchange.
Hybrid Machine Learning Technique for Forecasting Dhaka Stock Market Timing Decisions

PubMed Central

Banik, Shipra; Khodadad Khan, A. F. M.; Anwer, Mohammad

2014-01-01

Forecasting stock market has been a difficult job for applied researchers owing to nature of facts which is very noisy and time varying. However, this hypothesis has been featured by several empirical experiential studies and a number of researchers have efficiently applied machine learning techniques to forecast stock market. This paper studied stock prediction for the use of investors. It is always true that investors typically obtain loss because of uncertain investment purposes and unsighted assets. This paper proposes a rough set model, a neural network model, and a hybrid neural network and rough set model to find optimal buy and sell of a share on Dhaka stock exchange. Investigational findings demonstrate that our proposed hybrid model has higher precision than the single rough set model and the neural network model. We believe this paper findings will help stock investors to decide about optimal buy and/or sell time on Dhaka stock exchange. PMID:24701205
Health Care Human Factors/Ergonomics Fieldwork in Home and Community Settings

PubMed Central

Valdez, Rupa S.; Holden, Richard J.

2017-01-01

Feature at a glance Designing innovations aligned with patients’ needs and workflow requires human factors and ergonomics (HF/E) fieldwork in home and community settings. Fieldwork in these extra-institutional settings is challenged by a need to balance the occasionally competing priorities of patient and informal caregiver participants, study team members, and the overall project. We offer several strategies that HF/E professionals can use before, during, and after home and community site visits to optimize fieldwork and mitigate challenges in these settings. Strategies include interacting respectfully with participants, documenting the visit, managing the study team-participant relationship, and engaging in dialogue with institutional review boards. PMID:28781512
An accelerated training method for back propagation networks

NASA Technical Reports Server (NTRS)

Shelton, Robert O. (Inventor)

1993-01-01

The principal objective is to provide a training procedure for a feed forward, back propagation neural network which greatly accelerates the training process. A set of orthogonal singular vectors are determined from the input matrix such that the standard deviations of the projections of the input vectors along these singular vectors, as a set, are substantially maximized, thus providing an optimal means of presenting the input data. Novelty exists in the method of extracting from the set of input data, a set of features which can serve to represent the input data in a simplified manner, thus greatly reducing the time/expense to training the system.

The pre-image problem in kernel methods.

PubMed

Kwok, James Tin-yau; Tsang, Ivor Wai-hung

2004-11-01

In this paper, we address the problem of finding the pre-image of a feature vector in the feature space induced by a kernel. This is of central importance in some kernel applications, such as on using kernel principal component analysis (PCA) for image denoising. Unlike the traditional method which relies on nonlinear optimization, our proposed method directly finds the location of the pre-image based on distance constraints in the feature space. It is noniterative, involves only linear algebra and does not suffer from numerical instability or local minimum problems. Evaluations on performing kernel PCA and kernel clustering on the USPS data set show much improved performance.
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera.

PubMed

Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun

2015-08-31

Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments.
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera

PubMed Central

Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun

2015-01-01

Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments. PMID:26404284
On the Impact of Local Taxes in a Set Cover Game

NASA Astrophysics Data System (ADS)

Escoffier, Bruno; Gourvès, Laurent; Monnot, Jérôme

Given a collection C of weighted subsets of a ground set E, the SET cover problem is to find a minimum weight subset of C which covers all elements of E. We study a strategic game defined upon this classical optimization problem. Every element of E is a player which chooses one set of C where it appears. Following a public tax function, every player is charged a fraction of the weight of the set that it has selected. Our motivation is to design a tax function having the following features: it can be implemented in a distributed manner, existence of an equilibrium is guaranteed and the social cost for these equilibria is minimized.
Differential evolution-based multi-objective optimization for the definition of a health indicator for fault diagnostics and prognostics

NASA Astrophysics Data System (ADS)

Baraldi, P.; Bonfanti, G.; Zio, E.

2018-03-01

The identification of the current degradation state of an industrial component and the prediction of its future evolution is a fundamental step for the development of condition-based and predictive maintenance approaches. The objective of the present work is to propose a general method for extracting a health indicator to measure the amount of component degradation from a set of signals measured during operation. The proposed method is based on the combined use of feature extraction techniques, such as Empirical Mode Decomposition and Auto-Associative Kernel Regression, and a multi-objective Binary Differential Evolution (BDE) algorithm for selecting the subset of features optimal for the definition of the health indicator. The objectives of the optimization are desired characteristics of the health indicator, such as monotonicity, trendability and prognosability. A case study is considered, concerning the prediction of the remaining useful life of turbofan engines. The obtained results confirm that the method is capable of extracting health indicators suitable for accurate prognostics.
Search asymmetries: parallel processing of uncertain sensory information.

PubMed

Vincent, Benjamin T

2011-08-01

What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.
A Generic multi-dimensional feature extraction method using multiobjective genetic programming.

PubMed

Zhang, Yang; Rockett, Peter I

2009-01-01

In this paper, we present a generic feature extraction method for pattern classification using multiobjective genetic programming. This not only evolves the (near-)optimal set of mappings from a pattern space to a multi-dimensional decision space, but also simultaneously optimizes the dimensionality of that decision space. The presented framework evolves vector-to-vector feature extractors that maximize class separability. We demonstrate the efficacy of our approach by making statistically-founded comparisons with a wide variety of established classifier paradigms over a range of datasets and find that for most of the pairwise comparisons, our evolutionary method delivers statistically smaller misclassification errors. At very worst, our method displays no statistical difference in a few pairwise comparisons with established classifier/dataset combinations; crucially, none of the misclassification results produced by our method is worse than any comparator classifier. Although principally focused on feature extraction, feature selection is also performed as an implicit side effect; we show that both feature extraction and selection are important to the success of our technique. The presented method has the practical consequence of obviating the need to exhaustively evaluate a large family of conventional classifiers when faced with a new pattern recognition problem in order to attain a good classification accuracy.
A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization.

PubMed

Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein

2016-06-01

This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Reconfigurable Model Execution in the OpenMDAO Framework

NASA Technical Reports Server (NTRS)

Hwang, John T.

2017-01-01

NASA's OpenMDAO framework facilitates constructing complex models and computing their derivatives for multidisciplinary design optimization. Decomposing a model into components that follow a prescribed interface enables OpenMDAO to assemble multidisciplinary derivatives from the component derivatives using what amounts to the adjoint method, direct method, chain rule, global sensitivity equations, or any combination thereof, using the MAUD architecture. OpenMDAO also handles the distribution of processors among the disciplines by hierarchically grouping the components, and it automates the data transfer between components that are on different processors. These features have made OpenMDAO useful for applications in aircraft design, satellite design, wind turbine design, and aircraft engine design, among others. This paper presents new algorithms for OpenMDAO that enable reconfigurable model execution. This concept refers to dynamically changing, during execution, one or more of: the variable sizes, solution algorithm, parallel load balancing, or set of variables-i.e., adding and removing components, perhaps to switch to a higher-fidelity sub-model. Any component can reconfigure at any point, even when running in parallel with other components, and the reconfiguration algorithm presented here performs the synchronized updates to all other components that are affected. A reconfigurable software framework for multidisciplinary design optimization enables new adaptive solvers, adaptive parallelization, and new applications such as gradient-based optimization with overset flow solvers and adaptive mesh refinement. Benchmarking results demonstrate the time savings for reconfiguration compared to setting up the model again from scratch, which can be significant in large-scale problems. Additionally, the new reconfigurability feature is applied to a mission profile optimization problem for commercial aircraft where both the parametrization of the mission profile and the time discretization are adaptively refined, resulting in computational savings of roughly 10% and the elimination of oscillations in the optimized altitude profile.
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.

PubMed

Ma, Li; Fan, Suohai

2017-03-14

The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Interpretable Decision Sets: A Joint Framework for Description and Prediction

PubMed Central

Lakkaraju, Himabindu; Bach, Stephen H.; Jure, Leskovec

2016-01-01

One of the most important obstacles to deploying predictive models is the fact that humans do not understand and trust them. Knowing which variables are important in a model’s prediction and how they are combined can be very powerful in helping people understand and trust automatic decision making systems. Here we propose interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently, decision sets are simple, concise, and easily interpretable. We formalize decision set learning through an objective function that simultaneously optimizes accuracy and interpretability of the rules. In particular, our approach learns short, accurate, and non-overlapping rules that cover the whole feature space and pay attention to small but important classes. Moreover, we prove that our objective is a non-monotone submodular function, which we efficiently optimize to find a near-optimal set of rules. Experiments show that interpretable decision sets are as accurate at classification as state-of-the-art machine learning techniques. They are also three times smaller on average than rule-based models learned by other methods. Finally, results of a user study show that people are able to answer multiple-choice questions about the decision boundaries of interpretable decision sets and write descriptions of classes based on them faster and more accurately than with other rule-based models that were designed for interpretability. Overall, our framework provides a new approach to interpretable machine learning that balances accuracy, interpretability, and computational efficiency. PMID:27853627
Learning in data-limited multimodal scenarios: Scandent decision forests and tree-based features.

PubMed

Hor, Soheil; Moradi, Mehdi

2016-12-01

Incomplete and inconsistent datasets often pose difficulties in multimodal studies. We introduce the concept of scandent decision trees to tackle these difficulties. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We show how scandent trees can be used to enhance the performance of decision forests trained on a small number of multimodal samples when we have access to larger datasets with vastly incomplete feature sets. Additionally, we introduce the concept of tree-based feature transforms in the decision forest paradigm. When combined with scandent trees, the tree-based feature transforms enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. Using this methodology, we build a model trained on MRI and PET images of the ADNI dataset, and then test it on cases with only MRI data. We show that this is significantly more effective in staging of cognitive impairments compared to a similar decision forest model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data. Copyright © 2016. Published by Elsevier B.V.
Margin-maximizing feature elimination methods for linear and nonlinear kernel-based discriminant functions.

PubMed

Aksu, Yaman; Miller, David J; Kesidis, George; Yang, Qing X

2010-05-01

Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
Texture feature extraction based on wavelet transform and gray-level co-occurrence matrices applied to osteosarcoma diagnosis.

PubMed

Hu, Shan; Xu, Chao; Guan, Weiqiao; Tang, Yong; Liu, Yana

2014-01-01

Osteosarcoma is the most common malignant bone tumor among children and adolescents. In this study, image texture analysis was made to extract texture features from bone CR images to evaluate the recognition rate of osteosarcoma. To obtain the optimal set of features, Sym4 and Db4 wavelet transforms and gray-level co-occurrence matrices were applied to the image, with statistical methods being used to maximize the feature selection. To evaluate the performance of these methods, a support vector machine algorithm was used. The experimental results demonstrated that the Sym4 wavelet had a higher classification accuracy (93.44%) than the Db4 wavelet with respect to osteosarcoma occurrence in the epiphysis, whereas the Db4 wavelet had a higher classification accuracy (96.25%) for osteosarcoma occurrence in the diaphysis. Results including accuracy, sensitivity, specificity and ROC curves obtained using the wavelets were all higher than those obtained using the features derived from the GLCM method. It is concluded that, a set of texture features can be extracted from the wavelets and used in computer-aided osteosarcoma diagnosis systems. In addition, this study also confirms that multi-resolution analysis is a useful tool for texture feature extraction during bone CR image processing.
Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach.

PubMed

Liu, Li; Shao, Ling; Li, Xuelong; Lu, Ke

2016-01-01

Extracting discriminative and robust features from video sequences is the first and most critical step in human action recognition. In this paper, instead of using handcrafted features, we automatically learn spatio-temporal motion features for action recognition. This is achieved via an evolutionary method, i.e., genetic programming (GP), which evolves the motion feature descriptor on a population of primitive 3D operators (e.g., 3D-Gabor and wavelet). In this way, the scale and shift invariant features can be effectively extracted from both color and optical flow sequences. We intend to learn data adaptive descriptors for different datasets with multiple layers, which makes fully use of the knowledge to mimic the physical structure of the human visual cortex for action recognition and simultaneously reduce the GP searching space to effectively accelerate the convergence of optimal solutions. In our evolutionary architecture, the average cross-validation classification error, which is calculated by an support-vector-machine classifier on the training set, is adopted as the evaluation criterion for the GP fitness function. After the entire evolution procedure finishes, the best-so-far solution selected by GP is regarded as the (near-)optimal action descriptor obtained. The GP-evolving feature extraction method is evaluated on four popular action datasets, namely KTH, HMDB51, UCF YouTube, and Hollywood2. Experimental results show that our method significantly outperforms other types of features, either hand-designed or machine-learned.
Computer-aided diagnosis for classifying benign versus malignant thyroid nodules based on ultrasound images: A comparison with radiologist-based assessments.

PubMed

Chang, Yongjun; Paul, Anjan Kumar; Kim, Namkug; Baek, Jung Hwan; Choi, Young Jun; Ha, Eun Ju; Lee, Kang Dae; Lee, Hyoung Shin; Shin, DaeSeock; Kim, Nakyoung

2016-01-01

To develop a semiautomated computer-aided diagnosis (cad) system for thyroid cancer using two-dimensional ultrasound images that can be used to yield a second opinion in the clinic to differentiate malignant and benign lesions. A total of 118 ultrasound images that included axial and longitudinal images from patients with biopsy-confirmed malignant (n = 30) and benign (n = 29) nodules were collected. Thyroid cad software was developed to extract quantitative features from these images based on thyroid nodule segmentation in which adaptive diffusion flow for active contours was used. Various features, including histogram, intensity differences, elliptical fit, gray-level co-occurrence matrixes, and gray-level run-length matrixes, were evaluated for each region imaged. Based on these imaging features, a support vector machine (SVM) classifier was used to differentiate benign and malignant nodules. Leave-one-out cross-validation with sequential forward feature selection was performed to evaluate the overall accuracy of this method. Additionally, analyses with contingency tables and receiver operating characteristic (ROC) curves were performed to compare the performance of cad with visual inspection by expert radiologists based on established gold standards. Most univariate features for this proposed cad system attained accuracies that ranged from 78.0% to 83.1%. When optimal SVM parameters that were established using a grid search method with features that radiologists use for visual inspection were employed, the authors could attain rates of accuracy that ranged from 72.9% to 84.7%. Using leave-one-out cross-validation results in a multivariate analysis of various features, the highest accuracy achieved using the proposed cad system was 98.3%, whereas visual inspection by radiologists reached 94.9% accuracy. To obtain the highest accuracies, "axial ratio" and "max probability" in axial images were most frequently included in the optimal feature sets for the authors' proposed cad system, while "shape" and "calcification" in longitudinal images were most frequently included in the optimal feature sets for visual inspection by radiologists. The computed areas under curves in the ROC analysis were 0.986 and 0.979 for the proposed cad system and visual inspection by radiologists, respectively; no significant difference was detected between these groups. The use of thyroid cad to differentiate malignant from benign lesions shows accuracy similar to that obtained via visual inspection by radiologists. Thyroid cad might be considered a viable way to generate a second opinion for radiologists in clinical practice.
Computer-aided diagnosis for classifying benign versus malignant thyroid nodules based on ultrasound images: A comparison with radiologist-based assessments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Yongjun; Paul, Anjan Kumar; Kim, Namkug, E-mail: namkugkim@gmail.com

Purpose: To develop a semiautomated computer-aided diagnosis (CAD) system for thyroid cancer using two-dimensional ultrasound images that can be used to yield a second opinion in the clinic to differentiate malignant and benign lesions. Methods: A total of 118 ultrasound images that included axial and longitudinal images from patients with biopsy-confirmed malignant (n = 30) and benign (n = 29) nodules were collected. Thyroid CAD software was developed to extract quantitative features from these images based on thyroid nodule segmentation in which adaptive diffusion flow for active contours was used. Various features, including histogram, intensity differences, elliptical fit, gray-level co-occurrencemore » matrixes, and gray-level run-length matrixes, were evaluated for each region imaged. Based on these imaging features, a support vector machine (SVM) classifier was used to differentiate benign and malignant nodules. Leave-one-out cross-validation with sequential forward feature selection was performed to evaluate the overall accuracy of this method. Additionally, analyses with contingency tables and receiver operating characteristic (ROC) curves were performed to compare the performance of CAD with visual inspection by expert radiologists based on established gold standards. Results: Most univariate features for this proposed CAD system attained accuracies that ranged from 78.0% to 83.1%. When optimal SVM parameters that were established using a grid search method with features that radiologists use for visual inspection were employed, the authors could attain rates of accuracy that ranged from 72.9% to 84.7%. Using leave-one-out cross-validation results in a multivariate analysis of various features, the highest accuracy achieved using the proposed CAD system was 98.3%, whereas visual inspection by radiologists reached 94.9% accuracy. To obtain the highest accuracies, “axial ratio” and “max probability” in axial images were most frequently included in the optimal feature sets for the authors’ proposed CAD system, while “shape” and “calcification” in longitudinal images were most frequently included in the optimal feature sets for visual inspection by radiologists. The computed areas under curves in the ROC analysis were 0.986 and 0.979 for the proposed CAD system and visual inspection by radiologists, respectively; no significant difference was detected between these groups. Conclusions: The use of thyroid CAD to differentiate malignant from benign lesions shows accuracy similar to that obtained via visual inspection by radiologists. Thyroid CAD might be considered a viable way to generate a second opinion for radiologists in clinical practice.« less
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study

PubMed Central

Puthiyedth, Nisha; Riveros, Carlos; Berretta, Regina; Moscato, Pablo

2015-01-01

Background The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. Methods We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. Results Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease. PMID:26106884
Learning optimal features for visual pattern recognition

NASA Astrophysics Data System (ADS)

Labusch, Kai; Siewert, Udo; Martinetz, Thomas; Barth, Erhardt

2007-02-01

The optimal coding hypothesis proposes that the human visual system has adapted to the statistical properties of the environment by the use of relatively simple optimality criteria. We here (i) discuss how the properties of different models of image coding, i.e. sparseness, decorrelation, and statistical independence are related to each other (ii) propose to evaluate the different models by verifiable performance measures (iii) analyse the classification performance on images of handwritten digits (MNIST data base). We first employ the SPARSENET algorithm (Olshausen, 1998) to derive a local filter basis (on 13 × 13 pixels windows). We then filter the images in the database (28 × 28 pixels images of digits) and reduce the dimensionality of the resulting feature space by selecting the locally maximal filter responses. We then train a support vector machine on a training set to classify the digits and report results obtained on a separate test set. Currently, the best state-of-the-art result on the MNIST data base has an error rate of 0,4%. This result, however, has been obtained by using explicit knowledge that is specific to the data (elastic distortion model for digits). We here obtain an error rate of 0,55% which is second best but does not use explicit data specific knowledge. In particular it outperforms by far all methods that do not use data-specific knowledge.
Optimal filter design with progressive genetic algorithm for local damage detection in rolling bearings

NASA Astrophysics Data System (ADS)

Wodecki, Jacek; Michalak, Anna; Zimroz, Radoslaw

2018-03-01

Harsh industrial conditions present in underground mining cause a lot of difficulties for local damage detection in heavy-duty machinery. For vibration signals one of the most intuitive approaches of obtaining signal with expected properties, such as clearly visible informative features, is prefiltration with appropriately prepared filter. Design of such filter is very broad field of research on its own. In this paper authors propose a novel approach to dedicated optimal filter design using progressive genetic algorithm. Presented method is fully data-driven and requires no prior knowledge of the signal. It has been tested against a set of real and simulated data. Effectiveness of operation has been proven for both healthy and damaged case. Termination criterion for evolution process was developed, and diagnostic decision making feature has been proposed for final result determinance.

Enabling Quantitative Optical Imaging for In-die-capable Critical Dimension Targets

PubMed Central

Barnes, B.M.; Henn, M.-A.; Sohn, M. Y.; Zhou, H.; Silver, R. M.

2017-01-01

Dimensional scaling trends will eventually bring semiconductor critical dimensions (CDs) down to only a few atoms in width. New optical techniques are required to address the measurement and variability for these CDs using sufficiently small in-die metrology targets. Recently, Qin et al. [Light Sci Appl, 5, e16038 (2016)] demonstrated quantitative model-based measurements of finite sets of lines with features as small as 16 nm using 450 nm wavelength light. This paper uses simulation studies, augmented with experiments at 193 nm wavelength, to adapt and optimize the finite sets of features that work as in-die-capable metrology targets with minimal increases in parametric uncertainty. A finite element based solver for time-harmonic Maxwell's equations yields two- and three-dimensional simulations of the electromagnetic scattering for optimizing the design of such targets as functions of reduced line lengths, fewer number of lines, fewer focal positions, smaller critical dimensions, and shorter illumination wavelength. Metrology targets that exceeded performance requirements are as short as 3 μm for 193 nm light, feature as few as eight lines, and are extensible to sub-10 nm CDs. Target areas measured at 193 nm can be fifteen times smaller in area than current state-of-the-art scatterometry targets described in the literature. This new methodology is demonstrated to be a promising alternative for optical model-based in-die CD metrology. PMID:28757674
Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

NASA Astrophysics Data System (ADS)

Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

2017-04-01

Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

PubMed

Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann

2012-09-24

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Prediction of Skin Sensitization with a Particle Swarm Optimized Support Vector Machine

PubMed Central

Yuan, Hua; Huang, Jianping; Cao, Chenzhong

2009-01-01

Skin sensitization is the most commonly reported occupational illness, causing much suffering to a wide range of people. Identification and labeling of environmental allergens is urgently required to protect people from skin sensitization. The guinea pig maximization test (GPMT) and murine local lymph node assay (LLNA) are the two most important in vivo models for identification of skin sensitizers. In order to reduce the number of animal tests, quantitative structure-activity relationships (QSARs) are strongly encouraged in the assessment of skin sensitization of chemicals. This paper has investigated the skin sensitization potential of 162 compounds with LLNA results and 92 compounds with GPMT results using a support vector machine. A particle swarm optimization algorithm was implemented for feature selection from a large number of molecular descriptors calculated by Dragon. For the LLNA data set, the classification accuracies are 95.37% and 88.89% for the training and the test sets, respectively. For the GPMT data set, the classification accuracies are 91.80% and 90.32% for the training and the test sets, respectively. The classification performances were greatly improved compared to those reported in the literature, indicating that the support vector machine optimized by particle swarm in this paper is competent for the identification of skin sensitizers. PMID:19742136
An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification.

PubMed

Esfahani, Mohammad Shahrokh; Dougherty, Edward R

2015-01-01

Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
Automatic Segmenting Structures in MRI's Based on Texture Analysis and Fuzzy Logic

NASA Astrophysics Data System (ADS)

Kaur, Mandeep; Rattan, Munish; Singh, Pushpinder

2017-12-01

The purpose of this paper is to present the variational method for geometric contours which helps the level set function remain close to the sign distance function, therefor it remove the need of expensive re-initialization procedure and thus, level set method is applied on magnetic resonance images (MRI) to track the irregularities in them as medical imaging plays a substantial part in the treatment, therapy and diagnosis of various organs, tumors and various abnormalities. It favors the patient with more speedy and decisive disease controlling with lesser side effects. The geometrical shape, the tumor's size and tissue's abnormal growth can be calculated by the segmentation of that particular image. It is still a great challenge for the researchers to tackle with an automatic segmentation in the medical imaging. Based on the texture analysis, different images are processed by optimization of level set segmentation. Traditionally, optimization was manual for every image where each parameter is selected one after another. By applying fuzzy logic, the segmentation of image is correlated based on texture features, to make it automatic and more effective. There is no initialization of parameters and it works like an intelligent system. It segments the different MRI images without tuning the level set parameters and give optimized results for all MRI's.
Predicting protein-protein interactions by combing various sequence- derived features into the general form of Chou's Pseudo amino acid composition.

PubMed

Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

2012-05-01

Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
Convex Hull Aided Registration Method (CHARM).

PubMed

Fan, Jingfan; Yang, Jian; Zhao, Yitian; Ai, Danni; Liu, Yonghuai; Wang, Ge; Wang, Yongtian

2017-09-01

Non-rigid registration finds many applications such as photogrammetry, motion tracking, model retrieval, and object recognition. In this paper we propose a novel convex hull aided registration method (CHARM) to match two point sets subject to a non-rigid transformation. First, two convex hulls are extracted from the source and target respectively. Then, all points of the point sets are projected onto the reference plane through each triangular facet of the hulls. From these projections, invariant features are extracted and matched optimally. The matched feature point pairs are mapped back onto the triangular facets of the convex hulls to remove outliers that are outside any relevant triangular facet. The rigid transformation from the source to the target is robustly estimated by the random sample consensus (RANSAC) scheme through minimizing the distance between the matched feature point pairs. Finally, these feature points are utilized as the control points to achieve non-rigid deformation in the form of thin-plate spline of the entire source point set towards the target one. The experimental results based on both synthetic and real data show that the proposed algorithm outperforms several state-of-the-art ones with respect to sampling, rotational angle, and data noise. In addition, the proposed CHARM algorithm also shows higher computational efficiency compared to these methods.
A novel automated spike sorting algorithm with adaptable feature extraction.

PubMed

Bestel, Robert; Daus, Andreas W; Thielemann, Christiane

2012-10-15

To study the electrophysiological properties of neuronal networks, in vitro studies based on microelectrode arrays have become a viable tool for analysis. Although in constant progress, a challenging task still remains in this area: the development of an efficient spike sorting algorithm that allows an accurate signal analysis at the single-cell level. Most sorting algorithms currently available only extract a specific feature type, such as the principal components or Wavelet coefficients of the measured spike signals in order to separate different spike shapes generated by different neurons. However, due to the great variety in the obtained spike shapes, the derivation of an optimal feature set is still a very complex issue that current algorithms struggle with. To address this problem, we propose a novel algorithm that (i) extracts a variety of geometric, Wavelet and principal component-based features and (ii) automatically derives a feature subset, most suitable for sorting an individual set of spike signals. Thus, there is a new approach that evaluates the probability distribution of the obtained spike features and consequently determines the candidates most suitable for the actual spike sorting. These candidates can be formed into an individually adjusted set of spike features, allowing a separation of the various shapes present in the obtained neuronal signal by a subsequent expectation maximisation clustering algorithm. Test results with simulated data files and data obtained from chick embryonic neurons cultured on microelectrode arrays showed an excellent classification result, indicating the superior performance of the described algorithm approach. Copyright © 2012 Elsevier B.V. All rights reserved.
Variable Selection for Road Segmentation in Aerial Images

NASA Astrophysics Data System (ADS)

Warnke, S.; Bulatov, D.

2017-05-01

For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.
Classification of focal liver lesions on ultrasound images by extracting hybrid textural features and using an artificial neural network.

PubMed

Hwang, Yoo Na; Lee, Ju Hwan; Kim, Ga Young; Jiang, Yuan Yuan; Kim, Sung Min

2015-01-01

This paper focuses on the improvement of the diagnostic accuracy of focal liver lesions by quantifying the key features of cysts, hemangiomas, and malignant lesions on ultrasound images. The focal liver lesions were divided into 29 cysts, 37 hemangiomas, and 33 malignancies. A total of 42 hybrid textural features that composed of 5 first order statistics, 18 gray level co-occurrence matrices, 18 Law's, and echogenicity were extracted. A total of 29 key features that were selected by principal component analysis were used as a set of inputs for a feed-forward neural network. For each lesion, the performance of the diagnosis was evaluated by using the positive predictive value, negative predictive value, sensitivity, specificity, and accuracy. The results of the experiment indicate that the proposed method exhibits great performance, a high diagnosis accuracy of over 96% among all focal liver lesion groups (cyst vs. hemangioma, cyst vs. malignant, and hemangioma vs. malignant) on ultrasound images. The accuracy was slightly increased when echogenicity was included in the optimal feature set. These results indicate that it is possible for the proposed method to be applied clinically.
Multi-modal classification of neurodegenerative disease by progressive graph-based transductive learning

PubMed Central

Wang, Zhengxia; Zhu, Xiaofeng; Adeli, Ehsan; Zhu, Yingying; Nie, Feiping; Munsell, Brent

2018-01-01

Graph-based transductive learning (GTL) is a powerful machine learning technique that is used when sufficient training data is not available. In particular, conventional GTL approaches first construct a fixed inter-subject relation graph that is based on similarities in voxel intensity values in the feature domain, which can then be used to propagate the known phenotype data (i.e., clinical scores and labels) from the training data to the testing data in the label domain. However, this type of graph is exclusively learned in the feature domain, and primarily due to outliers in the observed features, may not be optimal for label propagation in the label domain. To address this limitation, a progressive GTL (pGTL) method is proposed that gradually finds an intrinsic data representation that more accurately aligns imaging features with the phenotype data. In general, optimal feature-to-phenotype alignment is achieved using an iterative approach that: (1) refines inter-subject relationships observed in the feature domain by using the learned intrinsic data representation in the label domain, (2) updates the intrinsic data representation from the refined inter-subject relationships, and (3) verifies the intrinsic data representation on the training data to guarantee an optimal classification when applied to testing data. Additionally, the iterative approach is extended to multi-modal imaging data to further improve pGTL classification accuracy. Using Alzheimer’s disease and Parkinson’s disease study data, the classification accuracy of the proposed pGTL method is compared to several state-of-the-art classification methods, and the results show pGTL can more accurately identify subjects, even at different progression stages, in these two study data sets. PMID:28551556
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine

PubMed Central

Manavalan, Balachandran; Shin, Tae H.; Lee, Gwang

2018-01-01

Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html. PMID:29616000
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.

PubMed

Manavalan, Balachandran; Shin, Tae H; Lee, Gwang

2018-01-01

Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.
Firefly as a novel swarm intelligence variable selection method in spectroscopy.

PubMed

Goodarzi, Mohammad; dos Santos Coelho, Leandro

2014-12-10

A critical step in multivariate calibration is wavelength selection, which is used to build models with better prediction performance when applied to spectral data. Up to now, many feature selection techniques have been developed. Among all different types of feature selection techniques, those based on swarm intelligence optimization methodologies are more interesting since they are usually simulated based on animal and insect life behavior to, e.g., find the shortest path between a food source and their nests. This decision is made by a crowd, leading to a more robust model with less falling in local minima during the optimization cycle. This paper represents a novel feature selection approach to the selection of spectroscopic data, leading to more robust calibration models. The performance of the firefly algorithm, a swarm intelligence paradigm, was evaluated and compared with genetic algorithm and particle swarm optimization. All three techniques were coupled with partial least squares (PLS) and applied to three spectroscopic data sets. They demonstrate improved prediction results in comparison to when only a PLS model was built using all wavelengths. Results show that firefly algorithm as a novel swarm paradigm leads to a lower number of selected wavelengths while the prediction performance of built PLS stays the same. Copyright © 2014. Published by Elsevier B.V.
Stimulus information contaminates summation tests of independent neural representations of features

NASA Technical Reports Server (NTRS)

Shimozaki, Steven S.; Eckstein, Miguel P.; Abbey, Craig K.

2002-01-01

Many models of visual processing assume that visual information is analyzed into separable and independent neural codes, or features. A common psychophysical test of independent features is known as a summation study, which measures performance in a detection, discrimination, or visual search task as the number of proposed features increases. Improvement in human performance with increasing number of available features is typically attributed to the summation, or combination, of information across independent neural coding of the features. In many instances, however, increasing the number of available features also increases the stimulus information in the task, as assessed by an optimal observer that does not include the independent neural codes. In a visual search task with spatial frequency and orientation as the component features, a particular set of stimuli were chosen so that all searches had equivalent stimulus information, regardless of the number of features. In this case, human performance did not improve with increasing number of features, implying that the improvement observed with additional features may be due to stimulus information and not the combination across independent features.
Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.

PubMed

Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

2011-09-20

Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
Numerical optimization of three-dimensional coils for NSTX-U

DOE PAGES

Lazerson, S. A.; Park, J. -K.; Logan, N.; ...

2015-09-03

A tool for the calculation of optimal three-dimensional (3D) perturbative magnetic fields in tokamaks has been developed. The IPECOPT code builds upon the stellarator optimization code STELLOPT to allow for optimization of linear ideal magnetohydrodynamic perturbed equilibrium (IPEC). This tool has been applied to NSTX-U equilibria, addressing which fields are the most effective at driving NTV torques. The NTV torque calculation is performed by the PENT code. Optimization of the normal field spectrum shows that fields with n = 1 character can drive a large core torque. It is also shown that fields with n = 3 features are capablemore » of driving edge torque and some core torque. Coil current optimization (using the planned in-vessel and existing RWM coils) on NSTX-U suggest the planned coils set is adequate for core and edge torque control. In conclusion, comparison between error field correction experiments on DIII-D and the optimizer show good agreement.« less
Fixed versus mixed RSA: Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models.

PubMed

Khaligh-Razavi, Seyed-Mahdi; Henriksson, Linda; Kay, Kendrick; Kriegeskorte, Nikolaus

2017-02-01

Studies of the primate visual system have begun to test a wide range of complex computational object-vision models. Realistic models have many parameters, which in practice cannot be fitted using the limited amounts of brain-activity data typically available. Task performance optimization (e.g. using backpropagation to train neural networks) provides major constraints for fitting parameters and discovering nonlinear representational features appropriate for the task (e.g. object classification). Model representations can be compared to brain representations in terms of the representational dissimilarities they predict for an image set. This method, called representational similarity analysis (RSA), enables us to test the representational feature space as is (fixed RSA) or to fit a linear transformation that mixes the nonlinear model features so as to best explain a cortical area's representational space (mixed RSA). Like voxel/population-receptive-field modelling, mixed RSA uses a training set (different stimuli) to fit one weight per model feature and response channel (voxels here), so as to best predict the response profile across images for each response channel. We analysed response patterns elicited by natural images, which were measured with functional magnetic resonance imaging (fMRI). We found that early visual areas were best accounted for by shallow models, such as a Gabor wavelet pyramid (GWP). The GWP model performed similarly with and without mixing, suggesting that the original features already approximated the representational space, obviating the need for mixing. However, a higher ventral-stream visual representation (lateral occipital region) was best explained by the higher layers of a deep convolutional network and mixing of its feature set was essential for this model to explain the representation. We suspect that mixing was essential because the convolutional network had been trained to discriminate a set of 1000 categories, whose frequencies in the training set did not match their frequencies in natural experience or their behavioural importance. The latter factors might determine the representational prominence of semantic dimensions in higher-level ventral-stream areas. Our results demonstrate the benefits of testing both the specific representational hypothesis expressed by a model's original feature space and the hypothesis space generated by linear transformations of that feature space.
Breaking the polar-nonpolar division in solvation free energy prediction.

PubMed

Wang, Bao; Wang, Chengzhang; Wu, Kedi; Wei, Guo-Wei

2018-02-05

Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break this ad hoc division. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature-function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson-Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave-one-out test gives an optimal root-mean-square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94 molecules and its associated training set, the present approach was carefully compared with a classic solvation model based on weighted solvent accessible surface area. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

A general optimality criteria algorithm for a class of engineering optimization problems

NASA Astrophysics Data System (ADS)

Belegundu, Ashok D.

2015-05-01

An optimality criteria (OC)-based algorithm for optimization of a general class of nonlinear programming (NLP) problems is presented. The algorithm is only applicable to problems where the objective and constraint functions satisfy certain monotonicity properties. For multiply constrained problems which satisfy these assumptions, the algorithm is attractive compared with existing NLP methods as well as prevalent OC methods, as the latter involve computationally expensive active set and step-size control strategies. The fixed point algorithm presented here is applicable not only to structural optimization problems but also to certain problems as occur in resource allocation and inventory models. Convergence aspects are discussed. The fixed point update or resizing formula is given physical significance, which brings out a strength and trim feature. The number of function evaluations remains independent of the number of variables, allowing the efficient solution of problems with large number of variables.
System and Method for Modeling the Flow Performance Features of an Object

NASA Technical Reports Server (NTRS)

Jorgensen, Charles (Inventor); Ross, James (Inventor)

1997-01-01

The method and apparatus includes a neural network for generating a model of an object in a wind tunnel from performance data on the object. The network is trained from test input signals (e.g., leading edge flap position, trailing edge flap position, angle of attack, and other geometric configurations, and power settings) and test output signals (e.g., lift, drag, pitching moment, or other performance features). In one embodiment, the neural network training method employs a modified Levenberg-Marquardt optimization technique. The model can be generated 'real time' as wind tunnel testing proceeds. Once trained, the model is used to estimate performance features associated with the aircraft given geometric configuration and/or power setting input. The invention can also be applied in other similar static flow modeling applications in aerodynamics, hydrodynamics, fluid dynamics, and other such disciplines. For example, the static testing of cars, sails, and foils, propellers, keels, rudders, turbines, fins, and the like, in a wind tunnel, water trough, or other flowing medium.
Study of wavelet packet energy entropy for emotion classification in speech and glottal signals

NASA Astrophysics Data System (ADS)

He, Ling; Lech, Margaret; Zhang, Jing; Ren, Xiaomei; Deng, Lihua

2013-07-01

The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
Anomaly Detection of Electromyographic Signals.

PubMed

Ijaz, Ahsan; Choi, Jongeun

2018-04-01

In this paper, we provide a robust framework to detect anomalous electromyographic (EMG) signals and identify contamination types. As a first step for feature selection, optimally selected Lawton wavelets transform is applied. Robust principal component analysis (rPCA) is then performed on these wavelet coefficients to obtain features in a lower dimension. The rPCA based features are used for constructing a self-organizing map (SOM). Finally, hierarchical clustering is applied on the SOM that separates anomalous signals residing in the smaller clusters and breaks them into logical units for contamination identification. The proposed methodology is tested using synthetic and real world EMG signals. The synthetic EMG signals are generated using a heteroscedastic process mimicking desired experimental setups. A sub-part of these synthetic signals is introduced with anomalies. These results are followed with real EMG signals introduced with synthetic anomalies. Finally, a heterogeneous real world data set is used with known quality issues under an unsupervised setting. The framework provides recall of 90% (± 3.3) and precision of 99%(±0.4).
User-customized brain computer interfaces using Bayesian optimization

NASA Astrophysics Data System (ADS)

Bashashati, Hossein; Ward, Rabab K.; Bashashati, Ali

2016-04-01

Objective. The brain characteristics of different people are not the same. Brain computer interfaces (BCIs) should thus be customized for each individual person. In motor-imagery based synchronous BCIs, a number of parameters (referred to as hyper-parameters) including the EEG frequency bands, the channels and the time intervals from which the features are extracted should be pre-determined based on each subject’s brain characteristics. Approach. To determine the hyper-parameter values, previous work has relied on manual or semi-automatic methods that are not applicable to high-dimensional search spaces. In this paper, we propose a fully automatic, scalable and computationally inexpensive algorithm that uses Bayesian optimization to tune these hyper-parameters. We then build different classifiers trained on the sets of hyper-parameter values proposed by the Bayesian optimization. A final classifier aggregates the results of the different classifiers. Main Results. We have applied our method to 21 subjects from three BCI competition datasets. We have conducted rigorous statistical tests, and have shown the positive impact of hyper-parameter optimization in improving the accuracy of BCIs. Furthermore, We have compared our results to those reported in the literature. Significance. Unlike the best reported results in the literature, which are based on more sophisticated feature extraction and classification methods, and rely on prestudies to determine the hyper-parameter values, our method has the advantage of being fully automated, uses less sophisticated feature extraction and classification methods, and yields similar or superior results compared to the best performing designs in the literature.
Stabilizing l1-norm prediction models by supervised feature grouping.

PubMed

Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

2016-02-01

Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. Copyright © 2015 Elsevier Inc. All rights reserved.
Automatic Removal of Physiological Artifacts in EEG: The Optimized Fingerprint Method for Sports Science Applications

PubMed Central

Stone, David B.; Tamburro, Gabriella; Fiedler, Patrique; Haueisen, Jens; Comani, Silvia

2018-01-01

Data contamination due to physiological artifacts such as those generated by eyeblinks, eye movements, and muscle activity continues to be a central concern in the acquisition and analysis of electroencephalographic (EEG) data. This issue is further compounded in EEG sports science applications where the presence of artifacts is notoriously difficult to control because behaviors that generate these interferences are often the behaviors under investigation. Therefore, there is a need to develop effective and efficient methods to identify physiological artifacts in EEG recordings during sports applications so that they can be isolated from cerebral activity related to the activities of interest. We have developed an EEG artifact detection model, the Fingerprint Method, which identifies different spatial, temporal, spectral, and statistical features indicative of physiological artifacts and uses these features to automatically classify artifactual independent components in EEG based on a machine leaning approach. Here, we optimized our method using artifact-rich training data and a procedure to determine which features were best suited to identify eyeblinks, eye movements, and muscle artifacts. We then applied our model to an experimental dataset collected during endurance cycling. Results reveal that unique sets of features are suitable for the detection of distinct types of artifacts and that the Optimized Fingerprint Method was able to correctly identify over 90% of the artifactual components with physiological origin present in the experimental data. These results represent a significant advancement in the search for effective means to address artifact contamination in EEG sports science applications. PMID:29618975
Automatic Removal of Physiological Artifacts in EEG: The Optimized Fingerprint Method for Sports Science Applications.

PubMed

Stone, David B; Tamburro, Gabriella; Fiedler, Patrique; Haueisen, Jens; Comani, Silvia

2018-01-01

Data contamination due to physiological artifacts such as those generated by eyeblinks, eye movements, and muscle activity continues to be a central concern in the acquisition and analysis of electroencephalographic (EEG) data. This issue is further compounded in EEG sports science applications where the presence of artifacts is notoriously difficult to control because behaviors that generate these interferences are often the behaviors under investigation. Therefore, there is a need to develop effective and efficient methods to identify physiological artifacts in EEG recordings during sports applications so that they can be isolated from cerebral activity related to the activities of interest. We have developed an EEG artifact detection model, the Fingerprint Method, which identifies different spatial, temporal, spectral, and statistical features indicative of physiological artifacts and uses these features to automatically classify artifactual independent components in EEG based on a machine leaning approach. Here, we optimized our method using artifact-rich training data and a procedure to determine which features were best suited to identify eyeblinks, eye movements, and muscle artifacts. We then applied our model to an experimental dataset collected during endurance cycling. Results reveal that unique sets of features are suitable for the detection of distinct types of artifacts and that the Optimized Fingerprint Method was able to correctly identify over 90% of the artifactual components with physiological origin present in the experimental data. These results represent a significant advancement in the search for effective means to address artifact contamination in EEG sports science applications.
Training set size, scale, and features in Geographic Object-Based Image Analysis of very high resolution unmanned aerial vehicle imagery

NASA Astrophysics Data System (ADS)

Ma, Lei; Cheng, Liang; Li, Manchun; Liu, Yongxue; Ma, Xiaoxue

2015-04-01

Unmanned Aerial Vehicle (UAV) has been used increasingly for natural resource applications in recent years due to their greater availability and the miniaturization of sensors. In addition, Geographic Object-Based Image Analysis (GEOBIA) has received more attention as a novel paradigm for remote sensing earth observation data. However, GEOBIA generates some new problems compared with pixel-based methods. In this study, we developed a strategy for the semi-automatic optimization of object-based classification, which involves an area-based accuracy assessment that analyzes the relationship between scale and the training set size. We found that the Overall Accuracy (OA) increased as the training set ratio (proportion of the segmented objects used for training) increased when the Segmentation Scale Parameter (SSP) was fixed. The OA increased more slowly as the training set ratio became larger and a similar rule was obtained according to the pixel-based image analysis. The OA decreased as the SSP increased when the training set ratio was fixed. Consequently, the SSP should not be too large during classification using a small training set ratio. By contrast, a large training set ratio is required if classification is performed using a high SSP. In addition, we suggest that the optimal SSP for each class has a high positive correlation with the mean area obtained by manual interpretation, which can be summarized by a linear correlation equation. We expect that these results will be applicable to UAV imagery classification to determine the optimal SSP for each class.
An Expert Opinion on Advanced Insulin Pump Use in Youth with Type 1 Diabetes.

PubMed

Bode, Bruce W; Kaufman, Francine R; Vint, Nan

2017-03-01

Among children and adolescents with type 1 diabetes mellitus, the use of insulin pump therapy has increased since its introduction in the early 1980s. Optimal management of type 1 diabetes mellitus depends on sufficient understanding by patients, their families, and healthcare providers on how to use pump technology. The goal for the use of insulin pump therapy should be to advance proficiency over time from the basics taught at the initiation of pump therapy to utilizing advanced settings to obtain optimal glycemic control. However, this goal is often not met, and appropriate understanding of the full features of pump technology can be lacking. The objective of this review is to provide an expert perspective on the advanced features and use of insulin pump therapy, including practical guidelines for the successful use of insulin pump technology, and other considerations specific to patients and healthcare providers.
A boosted optimal linear learner for retinal vessel segmentation

NASA Astrophysics Data System (ADS)

Poletti, E.; Grisan, E.

2014-03-01

Ocular fundus images provide important information about retinal degeneration, which may be related to acute pathologies or to early signs of systemic diseases. An automatic and quantitative assessment of vessel morphological features, such as diameters and tortuosity, can improve clinical diagnosis and evaluation of retinopathy. At variance with available methods, we propose a data-driven approach, in which the system learns a set of optimal discriminative convolution kernels (linear learner). The set is progressively built based on an ADA-boost sample weighting scheme, providing seamless integration between linear learner estimation and classification. In order to capture the vessel appearance changes at different scales, the kernels are estimated on a pyramidal decomposition of the training samples. The set is employed as a rotating bank of matched filters, whose response is used by the boosted linear classifier to provide a classification of each image pixel into the two classes of interest (vessel/background). We tested the approach fundus images available from the DRIVE dataset. We show that the segmentation performance yields an accuracy of 0.94.
Detection of Foreign Matter in Transfusion Solution Based on Gaussian Background Modeling and an Optimized BP Neural Network

PubMed Central

Zhou, Fuqiang; Su, Zhen; Chai, Xinghua; Chen, Lipeng

2014-01-01

This paper proposes a new method to detect and identify foreign matter mixed in a plastic bottle filled with transfusion solution. A spin-stop mechanism and mixed illumination style are applied to obtain high contrast images between moving foreign matter and a static transfusion background. The Gaussian mixture model is used to model the complex background of the transfusion image and to extract moving objects. A set of features of moving objects are extracted and selected by the ReliefF algorithm, and optimal feature vectors are fed into the back propagation (BP) neural network to distinguish between foreign matter and bubbles. The mind evolutionary algorithm (MEA) is applied to optimize the connection weights and thresholds of the BP neural network to obtain a higher classification accuracy and faster convergence rate. Experimental results show that the proposed method can effectively detect visible foreign matter in 250-mL transfusion bottles. The misdetection rate and false alarm rate are low, and the detection accuracy and detection speed are satisfactory. PMID:25347581
Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease.

PubMed

Vivekanandan, T; Sriman Narayana Iyengar, N Ch

2017-11-01

Enormous data growth in multiple domains has posed a great challenge for data processing and analysis techniques. In particular, the traditional record maintenance strategy has been replaced in the healthcare system. It is vital to develop a model that is able to handle the huge amount of e-healthcare data efficiently. In this paper, the challenging tasks of selecting critical features from the enormous set of available features and diagnosing heart disease are carried out. Feature selection is one of the most widely used pre-processing steps in classification problems. A modified differential evolution (DE) algorithm is used to perform feature selection for cardiovascular disease and optimization of selected features. Of the 10 available strategies for the traditional DE algorithm, the seventh strategy, which is represented by DE/rand/2/exp, is considered for comparative study. The performance analysis of the developed modified DE strategy is given in this paper. With the selected critical features, prediction of heart disease is carried out using fuzzy AHP and a feed-forward neural network. Various performance measures of integrating the modified differential evolution algorithm with fuzzy AHP and a feed-forward neural network in the prediction of heart disease are evaluated in this paper. The accuracy of the proposed hybrid model is 83%, which is higher than that of some other existing models. In addition, the prediction time of the proposed hybrid model is also evaluated and has shown promising results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

PubMed Central

Liu, Aiming; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

2017-01-01

Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems. PMID:29117100
Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata.

PubMed

Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

2017-11-08

Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.
Coherent optimal control of photosynthetic molecules

NASA Astrophysics Data System (ADS)

Caruso, F.; Montangero, S.; Calarco, T.; Huelga, S. F.; Plenio, M. B.

2012-04-01

We demonstrate theoretically that open-loop quantum optimal control techniques can provide efficient tools for the verification of various quantum coherent transport mechanisms in natural and artificial light-harvesting complexes under realistic experimental conditions. To assess the feasibility of possible biocontrol experiments, we introduce the main settings and derive optimally shaped and robust laser pulses that allow for the faithful preparation of specified initial states (such as localized excitation or coherent superposition, i.e., propagating and nonpropagating states) of the photosystem and probe efficiently the subsequent dynamics. With these tools, different transport pathways can be discriminated, which should facilitate the elucidation of genuine quantum dynamical features of photosystems and therefore enhance our understanding of the role that coherent processes may play in actual biological complexes.
Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18F-FDG PET/CT images.

PubMed

Wang, Hongkai; Zhou, Zongwei; Li, Yingci; Chen, Zhonghua; Lu, Peiou; Wang, Wenzhi; Liu, Wanyu; Yu, Lijuan

2017-12-01

This study aimed to compare one state-of-the-art deep learning method and four classical machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer (NSCLC) from 18 F-FDG PET/CT images. Another objective was to compare the discriminative power of the recently popular PET/CT texture features with the widely used diagnostic features such as tumor size, CT value, SUV, image contrast, and intensity standard deviation. The four classical machine learning methods included random forests, support vector machines, adaptive boosting, and artificial neural network. The deep learning method was the convolutional neural networks (CNN). The five methods were evaluated using 1397 lymph nodes collected from PET/CT images of 168 patients, with corresponding pathology analysis results as gold standard. The comparison was conducted using 10 times 10-fold cross-validation based on the criterion of sensitivity, specificity, accuracy (ACC), and area under the ROC curve (AUC). For each classical method, different input features were compared to select the optimal feature set. Based on the optimal feature set, the classical methods were compared with CNN, as well as with human doctors from our institute. For the classical methods, the diagnostic features resulted in 81~85% ACC and 0.87~0.92 AUC, which were significantly higher than the results of texture features. CNN's sensitivity, specificity, ACC, and AUC were 84, 88, 86, and 0.91, respectively. There was no significant difference between the results of CNN and the best classical method. The sensitivity, specificity, and ACC of human doctors were 73, 90, and 82, respectively. All the five machine learning methods had higher sensitivities but lower specificities than human doctors. The present study shows that the performance of CNN is not significantly different from the best classical methods and human doctors for classifying mediastinal lymph node metastasis of NSCLC from PET/CT images. Because CNN does not need tumor segmentation or feature calculation, it is more convenient and more objective than the classical methods. However, CNN does not make use of the import diagnostic features, which have been proved more discriminative than the texture features for classifying small-sized lymph nodes. Therefore, incorporating the diagnostic features into CNN is a promising direction for future research.
iTOUGH2: A multiphysics simulation-optimization framework for analyzing subsurface systems

NASA Astrophysics Data System (ADS)

Finsterle, S.; Commer, M.; Edmiston, J. K.; Jung, Y.; Kowalsky, M. B.; Pau, G. S. H.; Wainwright, H. M.; Zhang, Y.

2017-11-01

iTOUGH2 is a simulation-optimization framework for the TOUGH suite of nonisothermal multiphase flow models and related simulators of geophysical, geochemical, and geomechanical processes. After appropriate parameterization of subsurface structures and their properties, iTOUGH2 runs simulations for multiple parameter sets and analyzes the resulting output for parameter estimation through automatic model calibration, local and global sensitivity analyses, data-worth analyses, and uncertainty propagation analyses. Development of iTOUGH2 is driven by scientific challenges and user needs, with new capabilities continually added to both the forward simulator and the optimization framework. This review article provides a summary description of methods and features implemented in iTOUGH2, and discusses the usefulness and limitations of an integrated simulation-optimization workflow in support of the characterization and analysis of complex multiphysics subsurface systems.
A Novel Approach for Lie Detection Based on F-Score and Extreme Learning Machine

PubMed Central

Gao, Junfeng; Wang, Zhao; Yang, Yong; Zhang, Wenjia; Tao, Chunyi; Guan, Jinan; Rao, Nini

2013-01-01

A new machine learning method referred to as F-score_ELM was proposed to classify the lying and truth-telling using the electroencephalogram (EEG) signals from 28 guilty and innocent subjects. Thirty-one features were extracted from the probe responses from these subjects. Then, a recently-developed classifier called extreme learning machine (ELM) was combined with F-score, a simple but effective feature selection method, to jointly optimize the number of the hidden nodes of ELM and the feature subset by a grid-searching training procedure. The method was compared to two classification models combining principal component analysis with back-propagation network and support vector machine classifiers. We thoroughly assessed the performance of these classification models including the training and testing time, sensitivity and specificity from the training and testing sets, as well as network size. The experimental results showed that the number of the hidden nodes can be effectively optimized by the proposed method. Also, F-score_ELM obtained the best classification accuracy and required the shortest training and testing time. PMID:23755136
T-ray relevant frequencies for osteosarcoma classification

NASA Astrophysics Data System (ADS)

Withayachumnankul, W.; Ferguson, B.; Rainsford, T.; Findlay, D.; Mickan, S. P.; Abbott, D.

2006-01-01

We investigate the classification of the T-ray response of normal human bone cells and human osteosarcoma cells, grown in culture. Given the magnitude and phase responses within a reliable spectral range as features for input vectors, a trained support vector machine can correctly classify the two cell types to some extent. Performance of the support vector machine is deteriorated by the curse of dimensionality, resulting from the comparatively large number of features in the input vectors. Feature subset selection methods are used to select only an optimal number of relevant features for inputs. As a result, an improvement in generalization performance is attainable, and the selected frequencies can be used for further describing different mechanisms of the cells, responding to T-rays. We demonstrate a consistent classification accuracy of 89.6%, while the only one fifth of the original features are retained in the data set.

Angular description for 3D scattering centers

NASA Astrophysics Data System (ADS)

Bhalla, Rajan; Raynal, Ann Marie; Ling, Hao; Moore, John; Velten, Vincent J.

2006-05-01

The electromagnetic scattered field from an electrically large target can often be well modeled as if it is emanating from a discrete set of scattering centers (see Fig. 1). In the scattering center extraction tool we developed previously based on the shooting and bouncing ray technique, no correspondence is maintained amongst the 3D scattering center extracted at adjacent angles. In this paper we present a multi-dimensional clustering algorithm to track the angular and spatial behaviors of 3D scattering centers and group them into features. The extracted features for the Slicy and backhoe targets are presented. We also describe two metrics for measuring the angular persistence and spatial mobility of the 3D scattering centers that make up these features in order to gather insights into target physics and feature stability. We find that features that are most persistent are also the most mobile and discuss implications for optimal SAR imaging.
Method of generating features optimal to a dataset and classifier

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.
Binding Free Energy Calculations for Lead Optimization: Assessment of Their Accuracy in an Industrial Drug Design Context.

PubMed

Homeyer, Nadine; Stoll, Friederike; Hillisch, Alexander; Gohlke, Holger

2014-08-12

Correctly ranking compounds according to their computed relative binding affinities will be of great value for decision making in the lead optimization phase of industrial drug discovery. However, the performance of existing computationally demanding binding free energy calculation methods in this context is largely unknown. We analyzed the performance of the molecular mechanics continuum solvent, the linear interaction energy (LIE), and the thermodynamic integration (TI) approach for three sets of compounds from industrial lead optimization projects. The data sets pose challenges typical for this early stage of drug discovery. None of the methods was sufficiently predictive when applied out of the box without considering these challenges. Detailed investigations of failures revealed critical points that are essential for good binding free energy predictions. When data set-specific features were considered accordingly, predictions valuable for lead optimization could be obtained for all approaches but LIE. Our findings lead to clear recommendations for when to use which of the above approaches. Our findings also stress the important role of expert knowledge in this process, not least for estimating the accuracy of prediction results by TI, using indicators such as the size and chemical structure of exchanged groups and the statistical error in the predictions. Such knowledge will be invaluable when it comes to the question which of the TI results can be trusted for decision making.
Specific features of goal setting in road traffic safety

NASA Astrophysics Data System (ADS)

Kolesov, V. I.; Danilov, O. F.; Petrov, A. I.

2017-10-01

Road traffic safety (RTS) management is inherently a branch of cybernetics and therefore requires clear formalization of the task. The paper aims at identification of the specific features of goal setting in RTS management under the system approach. The paper presents the results of cybernetic modeling of the cause-to-effect mechanism of a road traffic accident (RTA); in here, the mechanism itself is viewed as a complex system. A designed management goal function is focused on minimizing the difficulty in achieving the target goal. Optimization of the target goal has been performed using the Lagrange principle. The created working algorithms have passed the soft testing. The key role of the obtained solution in the tactical and strategic RTS management is considered. The dynamics of the management effectiveness indicator has been analyzed based on the ten-year statistics for Russia.
Lead identification and optimization of novel collagenase inhibitors; pharmacophore and structure based studies

PubMed Central

Kalva, Sukesh; Vadivelan, S; Sanam, Ramadevi; Jagarlapudi, Sarma ARP; Saleena, Lilly M

2012-01-01

In this study, chemical feature based pharmacophore models of MMP-1, MMP-8 and MMP-13 inhibitors have been developed with the aid of HypoGen module within Catalyst program package. In MMP-1 and MMP-13, all the compounds in the training set mapped HBA and RA, while in MMP-8, the training set mapped HBA and HY. These features revealed responsibility for the high molecular bioactivity, and this is further used as a three dimensional query to screen the knowledge based designed molecules. These pharmacophore models for collagenases picked up some potent and novel inhibitors. Subsequently, docking studies were performed for the potent molecules and novel hits were suggested for further studies based on the docking score and active site interactions in MMP-1, MMP-8 and MMP-13. PMID:22553386
Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection.

PubMed

Cheng, Tiejun; Li, Qingliang; Wang, Yanli; Bryant, Stephen H

2011-02-28

Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46 000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.
Online tracking of outdoor lighting variations for augmented reality with moving cameras.

PubMed

Liu, Yanli; Granier, Xavier

2012-04-01

In augmented reality, one of key tasks to achieve a convincing visual appearance consistency between virtual objects and video scenes is to have a coherent illumination along the whole sequence. As outdoor illumination is largely dependent on the weather, the lighting condition may change from frame to frame. In this paper, we propose a full image-based approach for online tracking of outdoor illumination variations from videos captured with moving cameras. Our key idea is to estimate the relative intensities of sunlight and skylight via a sparse set of planar feature-points extracted from each frame. To address the inevitable feature misalignments, a set of constraints are introduced to select the most reliable ones. Exploiting the spatial and temporal coherence of illumination, the relative intensities of sunlight and skylight are finally estimated by using an optimization process. We validate our technique on a set of real-life videos and show that the results with our estimations are visually coherent along the video sequences.
A principled approach to setting optimal diagnostic thresholds: where ROC and indifference curves meet.

PubMed

Irwin, R John; Irwin, Timothy C

2011-06-01

Making clinical decisions on the basis of diagnostic tests is an essential feature of medical practice and the choice of the decision threshold is therefore crucial. A test's optimal diagnostic threshold is the threshold that maximizes expected utility. It is given by the product of the prior odds of a disease and a measure of the importance of the diagnostic test's sensitivity relative to its specificity. Choosing this threshold is the same as choosing the point on the Receiver Operating Characteristic (ROC) curve whose slope equals this product. We contend that a test's likelihood ratio is the canonical decision variable and contrast diagnostic thresholds based on likelihood ratio with two popular rules of thumb for choosing a threshold. The two rules are appealing because they have clear graphical interpretations, but they yield optimal thresholds only in special cases. The optimal rule can be given similar appeal by presenting indifference curves, each of which shows a set of equally good combinations of sensitivity and specificity. The indifference curve is tangent to the ROC curve at the optimal threshold. Whereas ROC curves show what is feasible, indifference curves show what is desirable. Together they show what should be chosen. Copyright © 2010 European Federation of Internal Medicine. Published by Elsevier B.V. All rights reserved.
Identifying Patients with Atrioventricular Septal Defect in Down Syndrome Populations by Using Self-Normalizing Neural Networks and Feature Selection.

PubMed

Pan, Xiaoyong; Hu, Xiaohua; Zhang, Yu Hang; Feng, Kaiyan; Wang, Shao Peng; Chen, Lei; Huang, Tao; Cai, Yu Dong

2018-04-12

Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew's correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles.
A novel scheme for abnormal cell detection in Pap smear images

NASA Astrophysics Data System (ADS)

Zhao, Tong; Wachman, Elliot S.; Farkas, Daniel L.

2004-07-01

Finding malignant cells in Pap smear images is a "needle in a haystack"-type problem, tedious, labor-intensive and error-prone. It is therefore desirable to have an automatic screening tool in order that human experts can concentrate on the evaluation of the more difficult cases. Most research on automatic cervical screening tries to extract morphometric and texture features at the cell level, in accordance with the NIH "The Bethesda System" rules. Due to variances in image quality and features, such as brightness, magnification and focus, morphometric and texture analysis is insufficient to provide robust cervical cancer detection. Using a microscopic spectral imaging system, we have produced a set of multispectral Pap smear images with wavelengths from 400 nm to 690 nm, containing both spectral signatures and spatial attributes. We describe a novel scheme that combines spatial information (including texture and morphometric features) with spectral information to significantly improve abnormal cell detection. Three kinds of wavelet features, orthogonal, bi-orthogonal and non-orthogonal, are carefully chosen to optimize recognition performance. Multispectral feature sets are then extracted in the wavelet domain. Using a Back-Propagation Neural Network classifier that greatly decreases the influence of spurious events, we obtain a classification error rate of 5%. Cell morphometric features, such as area and shape, are then used to eliminate most remaining small artifacts. We report initial results from 149 cells from 40 separate image sets, in which only one abnormal cell was missed (TPR = 97.6%) and one normal cell was falsely classified as cancerous (FPR = 1%).
Deep 3D Convolutional Encoder Networks With Shortcuts for Multiscale Feature Integration Applied to Multiple Sclerosis Lesion Segmentation.

PubMed

Brosch, Tom; Tang, Lisa Y W; Youngjin Yoo; Li, David K B; Traboulsee, Anthony; Tam, Roger

2016-05-01

We propose a novel segmentation approach based on deep 3D convolutional encoder networks with shortcut connections and apply it to the segmentation of multiple sclerosis (MS) lesions in magnetic resonance images. Our model is a neural network that consists of two interconnected pathways, a convolutional pathway, which learns increasingly more abstract and higher-level image features, and a deconvolutional pathway, which predicts the final segmentation at the voxel level. The joint training of the feature extraction and prediction pathways allows for the automatic learning of features at different scales that are optimized for accuracy for any given combination of image types and segmentation task. In addition, shortcut connections between the two pathways allow high- and low-level features to be integrated, which enables the segmentation of lesions across a wide range of sizes. We have evaluated our method on two publicly available data sets (MICCAI 2008 and ISBI 2015 challenges) with the results showing that our method performs comparably to the top-ranked state-of-the-art methods, even when only relatively small data sets are available for training. In addition, we have compared our method with five freely available and widely used MS lesion segmentation methods (EMS, LST-LPA, LST-LGA, Lesion-TOADS, and SLS) on a large data set from an MS clinical trial. The results show that our method consistently outperforms these other methods across a wide range of lesion sizes.
Forensic SNP genotyping with SNaPshot: Technical considerations for the development and optimization of multiplexed SNP assays.

PubMed

Fondevila, M; Børsting, C; Phillips, C; de la Puente, M; Consortium, Euroforen-NoE; Carracedo, A; Morling, N; Lareu, M V

2017-01-01

This review explores the key factors that influence the optimization, routine use, and profile interpretation of the SNaPshot single-base extension (SBE) system applied to forensic single-nucleotide polymorphism (SNP) genotyping. Despite being a mainly complimentary DNA genotyping technique to routine STR profiling, use of SNaPshot is an important part of the development of SNP sets for a wide range of forensic applications with these markers, from genotyping highly degraded DNA with very short amplicons to the introduction of SNPs to ascertain the ancestry and physical characteristics of an unidentified contact trace donor. However, this technology, as resourceful as it is, displays several features that depart from the usual STR genotyping far enough to demand a certain degree of expertise from the forensic analyst before tackling the complex casework on which SNaPshot application provides an advantage. In order to provide the basis for developing such expertise, we cover in this paper the most challenging aspects of the SNaPshot technology, focusing on the steps taken to design primer sets, optimize the PCR and single-base extension chemistries, and the important features of the peak patterns observed in typical forensic SNP profiles using SNaPshot. With that purpose in mind, we provide guidelines and troubleshooting for multiplex-SNaPshot-oriented primer design and the resulting capillary electrophoresis (CE) profile interpretation (covering the most commonly observed artifacts and expected departures from the ideal conditions). Copyright © 2017 Central Police University.
Feature Selection and Classifier Development for Radio Frequency Device Identification

DTIC Science & Technology

2015-12-01

adds important background knowledge for this research . 41 Four leading RF-based device identification methods have been proposed: Radio...appropriate level of dimensionality. Both qualitative and quantitative DRA dimensionality assessment methods are possible. Prior RF-DNA DRA research , e.g...Employing experimental designs to find optimal algorithm settings has been seen in hyperspectral anomaly detection research , c.f. [513–520], but not
Pyomo v5.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woodruff, David; Hackebeil, Gabe; Laird, Carl Damon

Pyomo supports the formulation and analysis of mathematical models for complex optimization applications. This capability is commonly associated with algebraic modeling languages (AMLs), which support the description and analysis of mathematical models with a high-level language. Although most AMLs are implemented in custom modeling languages, Pyomo's modeling objects are embedded within Python, a full- featured high-level programming language that contains a rich set of supporting libraries.
AutoSite: an automated approach for pseudo-ligands prediction—from ligand-binding sites identification to predicting key ligand atoms

PubMed Central

Ravindranath, Pradeep Anand; Sanner, Michel F.

2016-01-01

Motivation: The identification of ligand-binding sites from a protein structure facilitates computational drug design and optimization, and protein function assignment. We introduce AutoSite: an efficient software tool for identifying ligand-binding sites and predicting pseudo ligand corresponding to each binding site identified. Binding sites are reported as clusters of 3D points called fills in which every point is labelled as hydrophobic or as hydrogen bond donor or acceptor. From these fills AutoSite derives feature points: a set of putative positions of hydrophobic-, and hydrogen-bond forming ligand atoms. Results: We show that AutoSite identifies ligand-binding sites with higher accuracy than other leading methods, and produces fills that better matches the ligand shape and properties, than the fills obtained with a software program with similar capabilities, AutoLigand. In addition, we demonstrate that for the Astex Diverse Set, the feature points identify 79% of hydrophobic ligand atoms, and 81% and 62% of the hydrogen acceptor and donor hydrogen ligand atoms interacting with the receptor, and predict 81.2% of water molecules mediating interactions between ligand and receptor. Finally, we illustrate potential uses of the predicted feature points in the context of lead optimization in drug discovery projects. Availability and Implementation: http://adfr.scripps.edu/AutoDockFR/autosite.html Contact: sanner@scripps.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27354702
Deep learning with word embeddings improves biomedical named entity recognition.

PubMed

Habibi, Maryam; Weber, Leon; Neves, Mariana; Wiegandt, David Luis; Leser, Ulf

2017-07-15

Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/ . habibima@informatik.hu-berlin.de. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Deep learning with word embeddings improves biomedical named entity recognition

PubMed Central

Habibi, Maryam; Weber, Leon; Neves, Mariana; Wiegandt, David Luis; Leser, Ulf

2017-01-01

Abstract Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Contact: habibima@informatik.hu-berlin.de PMID:28881963
Exploring Sampling in the Detection of Multicategory EEG Signals

PubMed Central

Siuly, Siuly; Kabir, Enamul; Wang, Hua; Zhang, Yanchun

2015-01-01

The paper presents a structure based on samplings and machine leaning techniques for the detection of multicategory EEG signals where random sampling (RS) and optimal allocation sampling (OS) are explored. In the proposed framework, before using the RS and OS scheme, the entire EEG signals of each class are partitioned into several groups based on a particular time period. The RS and OS schemes are used in order to have representative observations from each group of each category of EEG data. Then all of the selected samples by the RS from the groups of each category are combined in a one set named RS set. In the similar way, for the OS scheme, an OS set is obtained. Then eleven statistical features are extracted from the RS and OS set, separately. Finally this study employs three well-known classifiers: k-nearest neighbor (k-NN), multinomial logistic regression with a ridge estimator (MLR), and support vector machine (SVM) to evaluate the performance for the RS and OS feature set. The experimental outcomes demonstrate that the RS scheme well represents the EEG signals and the k-NN with the RS is the optimum choice for detection of multicategory EEG signals. PMID:25977705
Applying a new unequally weighted feature fusion method to improve CAD performance of classifying breast lesions

NASA Astrophysics Data System (ADS)

Zargari Khuzani, Abolfazl; Danala, Gopichandh; Heidari, Morteza; Du, Yue; Mashhadi, Najmeh; Qiu, Yuchen; Zheng, Bin

2018-02-01

Higher recall rates are a major challenge in mammography screening. Thus, developing computer-aided diagnosis (CAD) scheme to classify between malignant and benign breast lesions can play an important role to improve efficacy of mammography screening. Objective of this study is to develop and test a unique image feature fusion framework to improve performance in classifying suspicious mass-like breast lesions depicting on mammograms. The image dataset consists of 302 suspicious masses detected on both craniocaudal and mediolateral-oblique view images. Amongst them, 151 were malignant and 151 were benign. The study consists of following 3 image processing and feature analysis steps. First, an adaptive region growing segmentation algorithm was used to automatically segment mass regions. Second, a set of 70 image features related to spatial and frequency characteristics of mass regions were initially computed. Third, a generalized linear regression model (GLM) based machine learning classifier combined with a bat optimization algorithm was used to optimally fuse the selected image features based on predefined assessment performance index. An area under ROC curve (AUC) with was used as a performance assessment index. Applying CAD scheme to the testing dataset, AUC was 0.75+/-0.04, which was significantly higher than using a single best feature (AUC=0.69+/-0.05) or the classifier with equally weighted features (AUC=0.73+/-0.05). This study demonstrated that comparing to the conventional equal-weighted approach, using an unequal-weighted feature fusion approach had potential to significantly improve accuracy in classifying between malignant and benign breast masses.
Intrapartum fetal heart rate classification from trajectory in Sparse SVM feature space.

PubMed

Spilka, J; Frecon, J; Leonarduzzi, R; Pustelnik, N; Abry, P; Doret, M

2015-01-01

Intrapartum fetal heart rate (FHR) constitutes a prominent source of information for the assessment of fetal reactions to stress events during delivery. Yet, early detection of fetal acidosis remains a challenging signal processing task. The originality of the present contribution are three-fold: multiscale representations and wavelet leader based multifractal analysis are used to quantify FHR variability ; Supervised classification is achieved by means of Sparse-SVM that aim jointly to achieve optimal detection performance and to select relevant features in a multivariate setting ; Trajectories in the feature space accounting for the evolution along time of features while labor progresses are involved in the construction of indices quantifying fetal health. The classification performance permitted by this combination of tools are quantified on a intrapartum FHR large database (≃ 1250 subjects) collected at a French academic public hospital.

An impatient evolutionary algorithm with probabilistic tabu search for unified solution of some NP-hard problems in graph and set theory via clique finding.

PubMed

Guturu, Parthasarathy; Dantu, Ram

2008-06-01

Many graph- and set-theoretic problems, because of their tremendous application potential and theoretical appeal, have been well investigated by the researchers in complexity theory and were found to be NP-hard. Since the combinatorial complexity of these problems does not permit exhaustive searches for optimal solutions, only near-optimal solutions can be explored using either various problem-specific heuristic strategies or metaheuristic global-optimization methods, such as simulated annealing, genetic algorithms, etc. In this paper, we propose a unified evolutionary algorithm (EA) to the problems of maximum clique finding, maximum independent set, minimum vertex cover, subgraph and double subgraph isomorphism, set packing, set partitioning, and set cover. In the proposed approach, we first map these problems onto the maximum clique-finding problem (MCP), which is later solved using an evolutionary strategy. The proposed impatient EA with probabilistic tabu search (IEA-PTS) for the MCP integrates the best features of earlier successful approaches with a number of new heuristics that we developed to yield a performance that advances the state of the art in EAs for the exploration of the maximum cliques in a graph. Results of experimentation with the 37 DIMACS benchmark graphs and comparative analyses with six state-of-the-art algorithms, including two from the smaller EA community and four from the larger metaheuristics community, indicate that the IEA-PTS outperforms the EAs with respect to a Pareto-lexicographic ranking criterion and offers competitive performance on some graph instances when individually compared to the other heuristic algorithms. It has also successfully set a new benchmark on one graph instance. On another benchmark suite called Benchmarks with Hidden Optimal Solutions, IEA-PTS ranks second, after a very recent algorithm called COVER, among its peers that have experimented with this suite.
Feature-based Alignment of Volumetric Multi-modal Images

PubMed Central

Toews, Matthew; Zöllei, Lilla; Wells, William M.

2014-01-01

This paper proposes a method for aligning image volumes acquired from different imaging modalities (e.g. MR, CT) based on 3D scale-invariant image features. A novel method for encoding invariant feature geometry and appearance is developed, based on the assumption of locally linear intensity relationships, providing a solution to poor repeatability of feature detection in different image modalities. The encoding method is incorporated into a probabilistic feature-based model for multi-modal image alignment. The model parameters are estimated via a group-wise alignment algorithm, that iteratively alternates between estimating a feature-based model from feature data, then realigning feature data to the model, converging to a stable alignment solution with few pre-processing or pre-alignment requirements. The resulting model can be used to align multi-modal image data with the benefits of invariant feature correspondence: globally optimal solutions, high efficiency and low memory usage. The method is tested on the difficult RIRE data set of CT, T1, T2, PD and MP-RAGE brain images of subjects exhibiting significant inter-subject variability due to pathology. PMID:24683955
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

PubMed

Sarker, Abeed; Gonzalez, Graciela

2015-02-01

Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

PubMed Central

Gonzalez, Graciela

2014-01-01

Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
Frequency Optimization for Enhancement of Surface Defect Classification Using the Eddy Current Technique

PubMed Central

Fan, Mengbao; Wang, Qi; Cao, Binghua; Ye, Bo; Sunny, Ali Imam; Tian, Guiyun

2016-01-01

Eddy current testing is quite a popular non-contact and cost-effective method for nondestructive evaluation of product quality and structural integrity. Excitation frequency is one of the key performance factors for defect characterization. In the literature, there are many interesting papers dealing with wide spectral content and optimal frequency in terms of detection sensitivity. However, research activity on frequency optimization with respect to characterization performances is lacking. In this paper, an investigation into optimum excitation frequency has been conducted to enhance surface defect classification performance. The influences of excitation frequency for a group of defects were revealed in terms of detection sensitivity, contrast between defect features, and classification accuracy using kernel principal component analysis (KPCA) and a support vector machine (SVM). It is observed that probe signals are the most sensitive on the whole for a group of defects when excitation frequency is set near the frequency at which maximum probe signals are retrieved for the largest defect. After the use of KPCA, the margins between the defect features are optimum from the perspective of the SVM, which adopts optimal hyperplanes for structure risk minimization. As a result, the best classification accuracy is obtained. The main contribution is that the influences of excitation frequency on defect characterization are interpreted, and experiment-based procedures are proposed to determine the optimal excitation frequency for a group of defects rather than a single defect with respect to optimal characterization performances. PMID:27164112
Frequency Optimization for Enhancement of Surface Defect Classification Using the Eddy Current Technique.

PubMed

Fan, Mengbao; Wang, Qi; Cao, Binghua; Ye, Bo; Sunny, Ali Imam; Tian, Guiyun

2016-05-07

Eddy current testing is quite a popular non-contact and cost-effective method for nondestructive evaluation of product quality and structural integrity. Excitation frequency is one of the key performance factors for defect characterization. In the literature, there are many interesting papers dealing with wide spectral content and optimal frequency in terms of detection sensitivity. However, research activity on frequency optimization with respect to characterization performances is lacking. In this paper, an investigation into optimum excitation frequency has been conducted to enhance surface defect classification performance. The influences of excitation frequency for a group of defects were revealed in terms of detection sensitivity, contrast between defect features, and classification accuracy using kernel principal component analysis (KPCA) and a support vector machine (SVM). It is observed that probe signals are the most sensitive on the whole for a group of defects when excitation frequency is set near the frequency at which maximum probe signals are retrieved for the largest defect. After the use of KPCA, the margins between the defect features are optimum from the perspective of the SVM, which adopts optimal hyperplanes for structure risk minimization. As a result, the best classification accuracy is obtained. The main contribution is that the influences of excitation frequency on defect characterization are interpreted, and experiment-based procedures are proposed to determine the optimal excitation frequency for a group of defects rather than a single defect with respect to optimal characterization performances.
Large Margin Multi-Modal Multi-Task Feature Extraction for Image Classification.

PubMed

Yong Luo; Yonggang Wen; Dacheng Tao; Jie Gui; Chao Xu

2016-01-01

The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We, therefore, propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features, so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging real-world image data sets demonstrate the effectiveness and superiority of the proposed method.
A DFT-Based Method of Feature Extraction for Palmprint Recognition

NASA Astrophysics Data System (ADS)

Choge, H. Kipsang; Karungaru, Stephen G.; Tsuge, Satoru; Fukumi, Minoru

Over the last quarter century, research in biometric systems has developed at a breathtaking pace and what started with the focus on the fingerprint has now expanded to include face, voice, iris, and behavioral characteristics such as gait. Palmprint is one of the most recent additions, and is currently the subject of great research interest due to its inherent uniqueness, stability, user-friendliness and ease of acquisition. This paper describes an effective and procedurally simple method of palmprint feature extraction specifically for palmprint recognition, although verification experiments are also conducted. This method takes advantage of the correspondences that exist between prominent palmprint features or objects in the spatial domain with those in the frequency or Fourier domain. Multi-dimensional feature vectors are formed by extracting a GA-optimized set of points from the 2-D Fourier spectrum of the palmprint images. The feature vectors are then used for palmprint recognition, before and after dimensionality reduction via the Karhunen-Loeve Transform (KLT). Experiments performed using palmprint images from the ‘PolyU Palmprint Database’ indicate that using a compact set of DFT coefficients, combined with KLT and data preprocessing, produces a recognition accuracy of more than 98% and can provide a fast and effective technique for personal identification.
A flexible data-driven comorbidity feature extraction framework.

PubMed

Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid

2016-06-01

Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
GeneratorSE: A Sizing Tool for Variable-Speed Wind Turbine Generators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sethuraman, Latha; Dykes, Katherine L

This report documents a set of analytical models employed by the optimization algorithms within the GeneratorSE framework. The initial values and boundary conditions employed for the generation of the various designs and initial estimates for basic design dimensions, masses, and efficiency for the four different models of generators are presented and compared with empirical data collected from previous studies and some existing commercial turbines. These models include designs applicable for variable-speed, high-torque application featuring direct-drive synchronous generators and low-torque application featuring induction generators. In all of the four models presented, the main focus of optimization is electromagnetic design with themore » exception of permanent-magnet and wire-wound synchronous generators, wherein the structural design is also optimized. Thermal design is accommodated in GeneratorSE as a secondary attribute by limiting the winding current densities to acceptable limits. A preliminary validation of electromagnetic design was carried out by comparing the optimized magnetic loading against those predicted by numerical simulation in FEMM4.2, a finite-element software for analyzing electromagnetic and thermal physics problems for electrical machines. For direct-drive synchronous generators, the analytical models for the structural design are validated by static structural analysis in ANSYS.« less
Extensions of algebraic image operators: An approach to model-based vision

NASA Technical Reports Server (NTRS)

Lerner, Bao-Ting; Morelli, Michael V.

1990-01-01

Researchers extend their previous research on a highly structured and compact algebraic representation of grey-level images which can be viewed as fuzzy sets. Addition and multiplication are defined for the set of all grey-level images, which can then be described as polynomials of two variables. Utilizing this new algebraic structure, researchers devised an innovative, efficient edge detection scheme. An accurate method for deriving gradient component information from this edge detector is presented. Based upon this new edge detection system researchers developed a robust method for linear feature extraction by combining the techniques of a Hough transform and a line follower. The major advantage of this feature extractor is its general, object-independent nature. Target attributes, such as line segment lengths, intersections, angles of intersection, and endpoints are derived by the feature extraction algorithm and employed during model matching. The algebraic operators are global operations which are easily reconfigured to operate on any size or shape region. This provides a natural platform from which to pursue dynamic scene analysis. A method for optimizing the linear feature extractor which capitalizes on the spatially reconfiguration nature of the edge detector/gradient component operator is discussed.
Automatic parameter selection for feature-based multi-sensor image registration

NASA Astrophysics Data System (ADS)

DelMarco, Stephen; Tom, Victor; Webb, Helen; Chao, Alan

2006-05-01

Accurate image registration is critical for applications such as precision targeting, geo-location, change-detection, surveillance, and remote sensing. However, the increasing volume of image data is exceeding the current capacity of human analysts to perform manual registration. This image data glut necessitates the development of automated approaches to image registration, including algorithm parameter value selection. Proper parameter value selection is crucial to the success of registration techniques. The appropriate algorithm parameters can be highly scene and sensor dependent. Therefore, robust algorithm parameter value selection approaches are a critical component of an end-to-end image registration algorithm. In previous work, we developed a general framework for multisensor image registration which includes feature-based registration approaches. In this work we examine the problem of automated parameter selection. We apply the automated parameter selection approach of Yitzhaky and Peli to select parameters for feature-based registration of multisensor image data. The approach consists of generating multiple feature-detected images by sweeping over parameter combinations and using these images to generate estimated ground truth. The feature-detected images are compared to the estimated ground truth images to generate ROC points associated with each parameter combination. We develop a strategy for selecting the optimal parameter set by choosing the parameter combination corresponding to the optimal ROC point. We present numerical results showing the effectiveness of the approach using registration of collected SAR data to reference EO data.
Efficient methods for overlapping group lasso.

PubMed

Yuan, Lei; Liu, Jun; Ye, Jieping

2013-09-01

The group Lasso is an extension of the Lasso for feature selection on (predefined) nonoverlapping groups of features. The nonoverlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efficient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. Our methods and theoretical results are then generalized to tackle the general overlapping group Lasso formulation based on the l(q) norm. We further extend our algorithm to solve a nonconvex overlapping group Lasso formulation based on the capped norm regularization, which reduces the estimation bias introduced by the convex penalty. We have performed empirical evaluations using both a synthetic and the breast cancer gene expression dataset, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efficient than existing state-of-the-art algorithms. Results also demonstrate the effectiveness of the nonconvex formulation for overlapping group Lasso.
RET selection on state-of-the-art NAND flash

NASA Astrophysics Data System (ADS)

Lafferty, Neal V.; He, Yuan; Pei, Jinhua; Shao, Feng; Liu, QingWei; Shi, Xuelong

2015-03-01

We present results generated using a new gauge-based Resolution Enhancement Technique (RET) Selection flow during the technology set up phase of a 3x-node NAND Flash product. As a testcase, we consider a challenging critical level for this ash product. The RET solutions include inverse lithography technology (ILT) optimized masks with sub-resolution assist features (SRAF) and companion illumination sources developed using a new pixel based Source Mask Optimization (SMO) tool that uses measurement gauges as a primary input. The flow includes verification objectives which allow tolerancing of particular measurement gauges based on lithographic criteria. Relative importance for particular gauges may also be set, to aid in down-selection from several candidate sources. The end result is a sensitive, objective score of RET performance. Using these custom-defined importance metrics, decisions on the final RET style can be made in an objective way.
Adaptive feature selection using v-shaped binary particle swarm optimization.

PubMed

Teng, Xuyang; Dong, Hongbin; Zhou, Xiurong

2017-01-01

Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers.
Adaptive feature selection using v-shaped binary particle swarm optimization

PubMed Central

Dong, Hongbin; Zhou, Xiurong

2017-01-01

Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850
General subspace learning with corrupted training data via graph embedding.

PubMed

Bao, Bing-Kun; Liu, Guangcan; Hong, Richang; Yan, Shuicheng; Xu, Changsheng

2013-11-01

We address the following subspace learning problem: supposing we are given a set of labeled, corrupted training data points, how to learn the underlying subspace, which contains three components: an intrinsic subspace that captures certain desired properties of a data set, a penalty subspace that fits the undesired properties of the data, and an error container that models the gross corruptions possibly existing in the data. Given a set of data points, these three components can be learned by solving a nuclear norm regularized optimization problem, which is convex and can be efficiently solved in polynomial time. Using the method as a tool, we propose a new discriminant analysis (i.e., supervised subspace learning) algorithm called Corruptions Tolerant Discriminant Analysis (CTDA), in which the intrinsic subspace is used to capture the features with high within-class similarity, the penalty subspace takes the role of modeling the undesired features with high between-class similarity, and the error container takes charge of fitting the possible corruptions in the data. We show that CTDA can well handle the gross corruptions possibly existing in the training data, whereas previous linear discriminant analysis algorithms arguably fail in such a setting. Extensive experiments conducted on two benchmark human face data sets and one object recognition data set show that CTDA outperforms the related algorithms.
How well does the standard body mass index or variations with a different exponent predict human lifespan?

PubMed

Foster, Dean; Karloff, Howard; Shirley, Kenneth E

2016-02-01

The objective was twofold: (1) to estimate for each individual the body mass index (BMI) which is associated with the lowest risk of death, and (2) to study variants of the BMI formula to determine which gives the best predictions of death. Treating BMI = mass/height(2) as a continuous variable and estimating its interaction effects with several other variables, this study analyzed the NIH-AARP study data set of approximately 566,000 individuals and fit Cox proportional hazards models to the survival times. For each individual, a "personalized optimal BMI," the BMI for that individual which, according to the model, is associated with the lowest risk of death, is estimated. The average personalized optimal BMI is approximately 26, which is in the current "overweight" category. In fact, mass/height is a better predictor of death on the data set than BMI itself. The model suggests that an individual's "optimal" BMI depends on his or her features; "one-size-fits-all" recommendations may be not best. © 2016 The Obesity Society.
Flagging threshold optimization for manual blood smear review in primary care laboratory.

PubMed

Bihl, Pierre-Adrien

2018-04-01

Manual blood smear review is required when an anomaly detected by the automated hematologic analyzer triggers a flag. Our will through this study is to optimize these flagging thresholds for manual slide review in order to limit workload, while insuring clinical care through no extra false-negative. Flagging causes of 4,373 samples were investigated by manual slide review, after having been run on ADVIA 2120i. A set of 6 user-adjustments is proposed. By implementing all recommendations that we made, false-positive rate falls from 81.8% to 58.6%, while PPV increases from 18.2% to 23.7%. Hence, use of such optimized thresholds enables us to maximize efficiency without altering clinical care, but each laboratory should establish its own criteria to take into consideration local distinctive features.
Warpage optimization on a mobile phone case using response surface methodology (RSM)

NASA Astrophysics Data System (ADS)

Lee, X. N.; Fathullah, M.; Shayfull, Z.; Nasir, S. M.; Hazwan, M. H. M.; Shazzuan, S.

2017-09-01

Plastic injection moulding is a popular manufacturing method not only it is reliable, but also efficient and cost saving. It able to produce plastic part with detailed features and complex geometry. However, defects in injection moulding process degrades the quality and aesthetic of the injection moulded product. The most common defect occur in the process is warpage. Inappropriate process parameter setting of injection moulding machine is one of the reason that leads to the occurrence of warpage. The aims of this study were to improve the quality of injection moulded part by investigating the optimal parameters in minimizing warpage using Response Surface Methodology (RSM). Subsequent to this, the most significant parameter was identified and recommended parameters setting was compared with the optimized parameter setting using RSM. In this research, the mobile phone case was selected as case study. The mould temperature, melt temperature, packing pressure, packing time and cooling time were selected as variables whereas warpage in y-direction was selected as responses in this research. The simulation was carried out by using Autodesk Moldflow Insight 2012. In addition, the RSM was performed by using Design Expert 7.0. The warpage in y direction recommended by RSM were reduced by 70 %. RSM performed well in solving warpage issue.

Bayesian segmentation of atrium wall using globally-optimal graph cuts on 3D meshes.

PubMed

Veni, Gopalkrishna; Fu, Zhisong; Awate, Suyash P; Whitaker, Ross T

2013-01-01

Efficient segmentation of the left atrium (LA) wall from delayed enhancement MRI is challenging due to inconsistent contrast, combined with noise, and high variation in atrial shape and size. We present a surface-detection method that is capable of extracting the atrial wall by computing an optimal a-posteriori estimate. This estimation is done on a set of nested meshes, constructed from an ensemble of segmented training images, and graph cuts on an associated multi-column, proper-ordered graph. The graph/mesh is a part of a template/model that has an associated set of learned intensity features. When this mesh is overlaid onto a test image, it produces a set of costs which lead to an optimal segmentation. The 3D mesh has an associated weighted, directed multi-column graph with edges that encode smoothness and inter-surface penalties. Unlike previous graph-cut methods that impose hard constraints on the surface properties, the proposed method follows from a Bayesian formulation resulting in soft penalties on spatial variation of the cuts through the mesh. The novelty of this method also lies in the construction of proper-ordered graphs on complex shapes for choosing among distinct classes of base shapes for automatic LA segmentation. We evaluate the proposed segmentation framework on simulated and clinical cardiac MRI.
Determination of the mobility profile in GaAs-MESFETs. Thesis

NASA Technical Reports Server (NTRS)

Prost, W.

1985-01-01

A process for measuring charge carrier mobility for gallium-arsenide metal semiconductor field effect transistors is described in an attempt to optimize the relationship between this factor and production. The measuring procedure allows an actual determination of local mobility in the channel. The physical basis for the process and features of the measuring room are outlined. The measuring technique is described and recommendations are made for setting measuring parameters.
Navigating neurites utilize cellular topography of Schwann cell somas and processes for optimal guidance

PubMed Central

Lopez-Fagundo, Cristina; Mitchel, Jennifer A.; Ramchal, Talisha D.; Dingle, Yu-Ting L.; Hoffman-Kim, Diane

2013-01-01

The path created by aligned Schwann cells (SCs) after nerve injury underlies peripheral nerve regeneration. We developed geometric bioinspired substrates to extract key information needed for axon guidance by deconstructing the topographical cues presented by SCs. We have previously reported materials that directly replicate SC topography with micro- and nanoscale resolution, but a detailed explanation of the means of directed axon extension on SC topography has not yet been described. Here, using neurite tracing and time-lapse microscopy, we analyzed the SC features that influence axon guidance. Novel poly(dimethylsiloxane) materials, fabricated via photolithography, incorporated bioinspired topographical components with the shapes and sizes of aligned SCs, namely somas and processes, where the length of the processes were varied but the soma geometry and dimensions were kept constant. Rat dorsal root ganglia neurites aligned to all materials presenting bioinspired topography after a 5 days in culture and to bioinspired materials presenting soma and process features after only 17 hours in culture. Key findings of this study were: Neurite response to underlying bioinspired topographical features was time dependent, where at 5 days, neurites aligned most strongly to materials presenting combinations of soma and process features, with higher than average density of either process or soma features; but at 17 hours they aligned more strongly to materials presenting average densities of soma and process features and to materials presenting process features only. These studies elucidate the influence of SC topography on axon guidance in a time-dependent setting and have implications for the optimization of nerve regeneration strategies. PMID:23557939
Optimally managing water resources in large river basins for an uncertain future

USGS Publications Warehouse

Roehl, Edwin A.; Conrads, Paul

2014-01-01

One of the challenges of basin management is the optimization of water use through ongoing regional economic development, droughts, and climate change. This paper describes a model of the Savannah River Basin designed to continuously optimize regulated flow to meet prioritized objectives set by resource managers and stakeholders. The model was developed from historical data by using machine learning, making it more accurate and adaptable to changing conditions than traditional models. The model is coupled to an optimization routine that computes the daily flow needed to most efficiently meet the water-resource management objectives. The model and optimization routine are packaged in a decision support system that makes it easy for managers and stakeholders to use. Simulation results show that flow can be regulated to substantially reduce salinity intrusions in the Savannah National Wildlife Refuge while conserving more water in the reservoirs. A method for using the model to assess the effectiveness of the flow-alteration features after the deepening also is demonstrated.
Parameterization of Shape and Compactness in Object-based Image Classification Using Quickbird-2 Imagery

NASA Astrophysics Data System (ADS)

Tonbul, H.; Kavzoglu, T.

2016-12-01

In recent years, object based image analysis (OBIA) has spread out and become a widely accepted technique for the analysis of remotely sensed data. OBIA deals with grouping pixels into homogenous objects based on spectral, spatial and textural features of contiguous pixels in an image. The first stage of OBIA, named as image segmentation, is the most prominent part of object recognition. In this study, multiresolution segmentation, which is a region-based approach, was employed to construct image objects. In the application of multi-resolution, three parameters, namely shape, compactness and scale must be set by the analyst. Segmentation quality remarkably influences the fidelity of the thematic maps and accordingly the classification accuracy. Therefore, it is of great importance to search and set optimal values for the segmentation parameters. In the literature, main focus has been on the definition of scale parameter, assuming that the effect of shape and compactness parameters is limited in terms of achieved classification accuracy. The aim of this study is to deeply analyze the influence of shape/compactness parameters by varying their values while using the optimal scale parameter determined by the use of Estimation of Scale Parameter (ESP-2) approach. A pansharpened Qickbird-2 image covering Trabzon, Turkey was employed to investigate the objectives of the study. For this purpose, six different combinations of shape/compactness were utilized to make deductions on the behavior of shape and compactness parameters and optimal setting for all parameters as a whole. Objects were assigned to classes using nearest neighbor classifier in all segmentation observations and equal number of pixels was randomly selected to calculate accuracy metrics. The highest overall accuracy (92.3%) was achieved by setting the shape/compactness criteria to 0.3/0.3. The results of this study indicate that shape/compactness parameters can have significant effect on classification accuracy with 4% change in overall accuracy. Also, statistical significance of differences in accuracy was tested using the McNemar's test and found that the difference between poor and optimal setting of shape/compactness parameters was statistically significant, suggesting a search for optimal parameterization instead of default setting.
Policy Tree Optimization for Adaptive Management of Water Resources Systems

NASA Astrophysics Data System (ADS)

Herman, J. D.; Giuliani, M.

2016-12-01

Water resources systems must cope with irreducible uncertainty in supply and demand, requiring policy alternatives capable of adapting to a range of possible future scenarios. Recent studies have developed adaptive policies based on "signposts" or "tipping points", which are threshold values of indicator variables that signal a change in policy. However, there remains a need for a general method to optimize the choice of indicators and their threshold values in a way that is easily interpretable for decision makers. Here we propose a conceptual framework and computational algorithm to design adaptive policies as a tree structure (i.e., a hierarchical set of logical rules) using a simulation-optimization approach based on genetic programming. We demonstrate the approach using Folsom Reservoir, California as a case study, in which operating policies must balance the risk of both floods and droughts. Given a set of feature variables, such as reservoir level, inflow observations and forecasts, and time of year, the resulting policy defines the conditions under which flood control and water supply hedging operations should be triggered. Importantly, the tree-based rule sets are easy to interpret for decision making, and can be compared to historical operating policies to understand the adaptations needed under possible climate change scenarios. Several remaining challenges are discussed, including the empirical convergence properties of the method, and extensions to irreversible decisions such as infrastructure. Policy tree optimization, and corresponding open-source software, provide a generalizable, interpretable approach to designing adaptive policies under uncertainty for water resources systems.
Metric Learning for Hyperspectral Image Segmentation

NASA Technical Reports Server (NTRS)

Bue, Brian D.; Thompson, David R.; Gilmore, Martha S.; Castano, Rebecca

2011-01-01

We present a metric learning approach to improve the performance of unsupervised hyperspectral image segmentation. Unsupervised spatial segmentation can assist both user visualization and automatic recognition of surface features. Analysts can use spatially-continuous segments to decrease noise levels and/or localize feature boundaries. However, existing segmentation methods use tasks-agnostic measures of similarity. Here we learn task-specific similarity measures from training data, improving segment fidelity to classes of interest. Multiclass Linear Discriminate Analysis produces a linear transform that optimally separates a labeled set of training classes. The defines a distance metric that generalized to a new scenes, enabling graph-based segmentation that emphasizes key spectral features. We describe tests based on data from the Compact Reconnaissance Imaging Spectrometer (CRISM) in which learned metrics improve segment homogeneity with respect to mineralogical classes.
Registration of prone and supine CT colonography scans using correlation optimized warping and canonical correlation analysis

PubMed Central

Wang, Shijun; Yao, Jianhua; Liu, Jiamin; Petrick, Nicholas; Van Uitert, Robert L.; Periaswamy, Senthil; Summers, Ronald M.

2009-01-01

Purpose: In computed tomographic colonography (CTC), a patient will be scanned twice—Once supine and once prone—to improve the sensitivity for polyp detection. To assist radiologists in CTC reading, in this paper we propose an automated method for colon registration from supine and prone CTC scans. Methods: We propose a new colon centerline registration method for prone and supine CTC scans using correlation optimized warping (COW) and canonical correlation analysis (CCA) based on the anatomical structure of the colon. Four anatomical salient points on the colon are first automatically distinguished. Then correlation optimized warping is applied to the segments defined by the anatomical landmarks to improve the global registration based on local correlation of segments. The COW method was modified by embedding canonical correlation analysis to allow multiple features along the colon centerline to be used in our implementation. Results: We tested the COW algorithm on a CTC data set of 39 patients with 39 polyps (19 training and 20 test cases) to verify the effectiveness of the proposed COW registration method. Experimental results on the test set show that the COW method significantly reduces the average estimation error in a polyp location between supine and prone scans by 67.6%, from 46.27±52.97 to 14.98 mm±11.41 mm, compared to the normalized distance along the colon centerline algorithm (p<0.01). Conclusions: The proposed COW algorithm is more accurate for the colon centerline registration compared to the normalized distance along the colon centerline method and the dynamic time warping method. Comparison results showed that the feature combination of z-coordinate and curvature achieved lowest registration error compared to the other feature combinations used by COW. The proposed method is tolerant to centerline errors because anatomical landmarks help prevent the propagation of errors across the entire colon centerline. PMID:20095272
Registration of prone and supine CT colonography scans using correlation optimized warping and canonical correlation analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang Shijun; Yao Jianhua; Liu Jiamin

Purpose: In computed tomographic colonography (CTC), a patient will be scanned twice--Once supine and once prone--to improve the sensitivity for polyp detection. To assist radiologists in CTC reading, in this paper we propose an automated method for colon registration from supine and prone CTC scans. Methods: We propose a new colon centerline registration method for prone and supine CTC scans using correlation optimized warping (COW) and canonical correlation analysis (CCA) based on the anatomical structure of the colon. Four anatomical salient points on the colon are first automatically distinguished. Then correlation optimized warping is applied to the segments defined bymore » the anatomical landmarks to improve the global registration based on local correlation of segments. The COW method was modified by embedding canonical correlation analysis to allow multiple features along the colon centerline to be used in our implementation. Results: We tested the COW algorithm on a CTC data set of 39 patients with 39 polyps (19 training and 20 test cases) to verify the effectiveness of the proposed COW registration method. Experimental results on the test set show that the COW method significantly reduces the average estimation error in a polyp location between supine and prone scans by 67.6%, from 46.27{+-}52.97 to 14.98 mm{+-}11.41 mm, compared to the normalized distance along the colon centerline algorithm (p<0.01). Conclusions: The proposed COW algorithm is more accurate for the colon centerline registration compared to the normalized distance along the colon centerline method and the dynamic time warping method. Comparison results showed that the feature combination of z-coordinate and curvature achieved lowest registration error compared to the other feature combinations used by COW. The proposed method is tolerant to centerline errors because anatomical landmarks help prevent the propagation of errors across the entire colon centerline.« less
Two Topics in Data Analysis: Sample-based Optimal Transport and Analysis of Turbulent Spectra from Ship Track Data

NASA Astrophysics Data System (ADS)

Kuang, Simeng Max

This thesis contains two topics in data analysis. The first topic consists of the introduction of algorithms for sample-based optimal transport and barycenter problems. In chapter 1, a family of algorithms is introduced to solve both the L2 optimal transport problem and the Wasserstein barycenter problem. Starting from a theoretical perspective, the new algorithms are motivated from a key characterization of the barycenter measure, which suggests an update that reduces the total transportation cost and stops only when the barycenter is reached. A series of general theorems is given to prove the convergence of all the algorithms. We then extend the algorithms to solve sample-based optimal transport and barycenter problems, in which only finite sample sets are available instead of underlying probability distributions. A unique feature of the new approach is that it compares sample sets in terms of the expected values of a set of feature functions, which at the same time induce the function space of optimal maps and can be chosen by users to incorporate their prior knowledge of the data. All the algorithms are implemented and applied to various synthetic example and practical applications. On synthetic examples it is found that both the SOT algorithm and the SCB algorithm are able to find the true solution and often converge in a handful of iterations. On more challenging applications including Gaussian mixture models, color transfer and shape transform problems, the algorithms give very good results throughout despite the very different nature of the corresponding datasets. In chapter 2, a preconditioning procedure is developed for the L2 and more general optimal transport problems. The procedure is based on a family of affine map pairs, which transforms the original measures into two new measures that are closer to each other, while preserving the optimality of solutions. It is proved that the preconditioning procedure minimizes the remaining transportation cost among all admissible affine maps. The procedure can be used on both continuous measures and finite sample sets from distributions. In numerical examples, the procedure is applied to multivariate normal distributions, to a two-dimensional shape transform problem and to color transfer problems. For the second topic, we present an extension to anisotropic flows of the recently developed Helmholtz and wave-vortex decomposition method for one-dimensional spectra measured along ship or aircraft tracks in Buhler et al. (J. Fluid Mech., vol. 756, 2014, pp. 1007-1026). While in the original method the flow was assumed to be homogeneous and isotropic in the horizontal plane, we allow the flow to have a simple kind of horizontal anisotropy that is chosen in a self-consistent manner and can be deduced from the one-dimensional power spectra of the horizontal velocity fields and their cross-correlation. The key result is that an exact and robust Helmholtz decomposition of the horizontal kinetic energy spectrum can be achieved in this anisotropic flow setting, which then also allows the subsequent wave-vortex decomposition step. The new method is developed theoretically and tested with encouraging results on challenging synthetic data as well as on ocean data from the Gulf Stream.
Optimization of chiral lattice based metastructures for broadband vibration suppression using genetic algorithms

NASA Astrophysics Data System (ADS)

Abdeljaber, Osama; Avci, Onur; Inman, Daniel J.

2016-05-01

One of the major challenges in civil, mechanical, and aerospace engineering is to develop vibration suppression systems with high efficiency and low cost. Recent studies have shown that high damping performance at broadband frequencies can be achieved by incorporating periodic inserts with tunable dynamic properties as internal resonators in structural systems. Structures featuring these kinds of inserts are referred to as metamaterials inspired structures or metastructures. Chiral lattice inserts exhibit unique characteristics such as frequency bandgaps which can be tuned by varying the parameters that define the lattice topology. Recent analytical and experimental investigations have shown that broadband vibration attenuation can be achieved by including chiral lattices as internal resonators in beam-like structures. However, these studies have suggested that the performance of chiral lattice inserts can be maximized by utilizing an efficient optimization technique to obtain the optimal topology of the inserted lattice. In this study, an automated optimization procedure based on a genetic algorithm is applied to obtain the optimal set of parameters that will result in chiral lattice inserts tuned properly to reduce the global vibration levels of a finite-sized beam. Genetic algorithms are considered in this study due to their capability of dealing with complex and insufficiently understood optimization problems. In the optimization process, the basic parameters that govern the geometry of periodic chiral lattices including the number of circular nodes, the thickness of the ligaments, and the characteristic angle are considered. Additionally, a new set of parameters is introduced to enable the optimization process to explore non-periodic chiral designs. Numerical simulations are carried out to demonstrate the efficiency of the optimization process.
Optimized SIFTFlow for registration of whole-mount histology to reference optical images

PubMed Central

Shojaii, Rushin; Martel, Anne L.

2016-01-01

Abstract. The registration of two-dimensional histology images to reference images from other modalities is an important preprocessing step in the reconstruction of three-dimensional histology volumes. This is a challenging problem because of the differences in the appearances of histology images and other modalities, and the presence of large nonrigid deformations which occur during slide preparation. This paper shows the feasibility of using densely sampled scale-invariant feature transform (SIFT) features and a SIFTFlow deformable registration algorithm for coregistering whole-mount histology images with blockface optical images. We present a method for jointly optimizing the regularization parameters used by the SIFTFlow objective function and use it to determine the most appropriate values for the registration of breast lumpectomy specimens. We demonstrate that tuning the regularization parameters results in significant improvements in accuracy and we also show that SIFTFlow outperforms a previously described edge-based registration method. The accuracy of the histology images to blockface images registration using the optimized SIFTFlow method was assessed using an independent test set of images from five different lumpectomy specimens and the mean registration error was 0.32±0.22 mm. PMID:27774494
Wide field imaging - I. Applications of neural networks to object detection and star/galaxy classification

NASA Astrophysics Data System (ADS)

Andreon, S.; Gargiulo, G.; Longo, G.; Tagliaferri, R.; Capuano, N.

2000-12-01

Astronomical wide-field imaging performed with new large-format CCD detectors poses data reduction problems of unprecedented scale, which are difficult to deal with using traditional interactive tools. We present here NExt (Neural Extractor), a new neural network (NN) based package capable of detecting objects and performing both deblending and star/galaxy classification in an automatic way. Traditionally, in astronomical images, objects are first distinguished from the noisy background by searching for sets of connected pixels having brightnesses above a given threshold; they are then classified as stars or as galaxies through diagnostic diagrams having variables chosen according to the astronomer's taste and experience. In the extraction step, assuming that images are well sampled, NExt requires only the simplest a priori definition of `what an object is' (i.e. it keeps all structures composed of more than one pixel) and performs the detection via an unsupervised NN, approaching detection as a clustering problem that has been thoroughly studied in the artificial intelligence literature. The first part of the NExt procedure consists of an optimal compression of the redundant information contained in the pixels via a mapping from pixel intensities to a subspace individualized through principal component analysis. At magnitudes fainter than the completeness limit, stars are usually almost indistinguishable from galaxies, and therefore the parameters characterizing the two classes do not lie in disconnected subspaces, thus preventing the use of unsupervised methods. We therefore adopted a supervised NN (i.e. a NN that first finds the rules to classify objects from examples and then applies them to the whole data set). In practice, each object is classified depending on its membership of the regions mapping the input feature space in the training set. In order to obtain an objective and reliable classification, instead of using an arbitrarily defined set of features we use a NN to select the most significant features among the large number of measured ones, and then we use these selected features to perform the classification task. In order to optimize the performance of the system, we implemented and tested several different models of NN. The comparison of the NExt performance with that of the best detection and classification package known to the authors (SExtractor) shows that NExt is at least as effective as the best traditional packages.
Optimization Of Feature Weight TheVoting Feature Intervals 5 Algorithm Using Partical Swarm Optimization Algorithm

NASA Astrophysics Data System (ADS)

Hayana Hasibuan, Eka; Mawengkang, Herman; Efendi, Syahril

2017-12-01

The use of Partical Swarm Optimization Algorithm in this research is to optimize the feature weights on the Voting Feature Interval 5 algorithm so that we can find the model of using PSO algorithm with VFI 5. Optimization of feature weight on Diabetes or Dyspesia data is considered important because it is very closely related to the livelihood of many people, so if there is any inaccuracy in determining the most dominant feature weight in the data will cause death. Increased accuracy by using PSO Algorithm ie fold 1 from 92.31% to 96.15% increase accuracy of 3.8%, accuracy of fold 2 on Algorithm VFI5 of 92.52% as well as generated on PSO Algorithm means accuracy fixed, then in fold 3 increase accuracy of 85.19% Increased to 96.29% Accuracy increased by 11%. The total accuracy of all three trials increased by 14%. In general the Partical Swarm Optimization algorithm has succeeded in increasing the accuracy to several fold, therefore it can be concluded the PSO algorithm is well used in optimizing the VFI5 Classification Algorithm.
Testing of Haar-Like Feature in Region of Interest Detection for Automated Target Recognition (ATR) System

NASA Technical Reports Server (NTRS)

Zhang, Yuhan; Lu, Dr. Thomas

2010-01-01

The objectives of this project were to develop a ROI (Region of Interest) detector using Haar-like feature similar to the face detection in Intel's OpenCV library, implement it in Matlab code, and test the performance of the new ROI detector against the existing ROI detector that uses Optimal Trade-off Maximum Average Correlation Height filter (OTMACH). The ROI detector included 3 parts: 1, Automated Haar-like feature selection in finding a small set of the most relevant Haar-like features for detecting ROIs that contained a target. 2, Having the small set of Haar-like features from the last step, a neural network needed to be trained to recognize ROIs with targets by taking the Haar-like features as inputs. 3, using the trained neural network from the last step, a filtering method needed to be developed to process the neural network responses into a small set of regions of interests. This needed to be coded in Matlab. All the 3 parts needed to be coded in Matlab. The parameters in the detector needed to be trained by machine learning and tested with specific datasets. Since OpenCV library and Haar-like feature were not available in Matlab, the Haar-like feature calculation needed to be implemented in Matlab. The codes for Adaptive Boosting and max/min filters in Matlab could to be found from the Internet but needed to be integrated to serve the purpose of this project. The performance of the new detector was tested by comparing the accuracy and the speed of the new detector against the existing OTMACH detector. The speed was referred as the average speed to find the regions of interests in an image. The accuracy was measured by the number of false positives (false alarms) at the same detection rate between the two detectors.
Using statistical correlation to compare geomagnetic data sets

NASA Astrophysics Data System (ADS)

Stanton, T.

2009-04-01

The major features of data curves are often matched, to a first order, by bump and wiggle matching to arrive at an offset between data sets. This poster describes a simple statistical correlation program that has proved useful during this stage by determining the optimal correlation between geomagnetic curves using a variety of fixed and floating windows. Its utility is suggested by the fact that it is simple to run, yet generates meaningful data comparisons, often when data noise precludes the obvious matching of curve features. Data sets can be scaled, smoothed, normalised and standardised, before all possible correlations are carried out between selected overlapping portions of each curve. Best-fit offset curves can then be displayed graphically. The program was used to cross-correlate directional and palaeointensity data from Holocene lake sediments (Stanton et al., submitted) and Holocene lava flows. Some example curve matches are shown, including some that illustrate the potential of this technique when examining particularly sparse data sets. Stanton, T., Snowball, I., Zillén, L. and Wastegård, S., submitted. Detecting potential errors in varve chronology and 14C ages using palaeosecular variation curves, lead pollution history and statistical correlation. Quaternary Geochronology.
In silico modelling and molecular dynamics simulation studies of thiazolidine based PTP1B inhibitors.

PubMed

Mahapatra, Manoj Kumar; Bera, Krishnendu; Singh, Durg Vijay; Kumar, Rajnish; Kumar, Manoj

2018-04-01

Protein tyrosine phosphatase 1B (PTP1B) has been identified as a negative regulator of insulin and leptin signalling pathway; hence, it can be considered as a new therapeutic target of intervention for the treatment of type 2 diabetes. Inhibition of this molecular target takes care of both diabetes and obesity, i.e. diabestiy. In order to get more information on identification and optimization of lead, pharmacophore modelling, atom-based 3D QSAR, docking and molecular dynamics studies were carried out on a set of ligands containing thiazolidine scaffold. A six-point pharmacophore model consisting of three hydrogen bond acceptor (A), one negative ionic (N) and two aromatic rings (R) with discrete geometries as pharmacophoric features were developed for a predictive 3D QSAR model. The probable binding conformation of the ligands within the active site was studied through molecular docking. The molecular interactions and the structural features responsible for PTP1B inhibition and selectivity were further supplemented by molecular dynamics simulation study for a time scale of 30 ns. The present investigation has identified some of the indispensible structural features of thiazolidine analogues which can further be explored to optimize PTP1B inhibitors.
Predicting the amount of coke deposition on catalyst pellets through image analysis and soft computing

NASA Astrophysics Data System (ADS)

Zhang, Jingqiong; Zhang, Wenbiao; He, Yuting; Yan, Yong

2016-11-01

The amount of coke deposition on catalyst pellets is one of the most important indexes of catalytic property and service life. As a result, it is essential to measure this and analyze the active state of the catalysts during a continuous production process. This paper proposes a new method to predict the amount of coke deposition on catalyst pellets based on image analysis and soft computing. An image acquisition system consisting of a flatbed scanner and an opaque cover is used to obtain catalyst images. After imaging processing and feature extraction, twelve effective features are selected and two best feature sets are determined by the prediction tests. A neural network optimized by a particle swarm optimization algorithm is used to establish the prediction model of the coke amount based on various datasets. The root mean square error of the prediction values are all below 0.021 and the coefficient of determination R 2, for the model, are all above 78.71%. Therefore, a feasible, effective and precise method is demonstrated, which may be applied to realize the real-time measurement of coke deposition based on on-line sampling and fast image analysis.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.

PubMed

Donoho, David; Jin, Jiashun

2008-09-30

In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

PubMed Central

Donoho, David; Jin, Jiashun

2008-01-01

In important application fields today—genomics and proteomics are examples—selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, …, p, let πi denote the two-sided P-value associated with the ith feature Z-score and π(i) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p − π(i))/i/p(1−i/p). We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT. PMID:18815365

Classification of Alzheimer's disease patients with hippocampal shape wrapper-based feature selection and support vector machine

NASA Astrophysics Data System (ADS)

Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien

2012-02-01

It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.
Blended near-optimal tools for flexible water resources decision making

NASA Astrophysics Data System (ADS)

Rosenberg, David

2015-04-01

State-of-the-art systems analysis techniques focus on efficiently finding optimal solutions. Yet an optimal solution is optimal only for the static modelled issues and managers often seek near-optimal alternatives that address un-modelled or changing objectives, preferences, limits, uncertainties, and other issues. Early on, Modelling to Generate Alternatives (MGA) formalized near-optimal as performance within a tolerable deviation from the optimal objective function value and identified a few maximally-different alternatives that addressed select un-modelled issues. This paper presents new stratified, Monte Carlo Markov Chain sampling and parallel coordinate plotting tools that generate and communicate the structure and full extent of the near-optimal region to an optimization problem. Plot controls allow users to interactively explore region features of most interest. Controls also streamline the process to elicit un-modelled issues and update the model formulation in response to elicited issues. Use for a single-objective water quality management problem at Echo Reservoir, Utah identifies numerous and flexible practices to reduce the phosphorus load to the reservoir and maintain close-to-optimal performance. Compared to MGA, the new blended tools generate more numerous alternatives faster, more fully show the near-optimal region, help elicit a larger set of un-modelled issues, and offer managers greater flexibility to cope in a changing world.
Automatic trajectory measurement of large numbers of crowded objects

NASA Astrophysics Data System (ADS)

Li, Hui; Liu, Ye; Chen, Yan Qiu

2013-06-01

Complex motion patterns of natural systems, such as fish schools, bird flocks, and cell groups, have attracted great attention from scientists for years. Trajectory measurement of individuals is vital for quantitative and high-throughput study of their collective behaviors. However, such data are rare mainly due to the challenges of detection and tracking of large numbers of objects with similar visual features and frequent occlusions. We present an automatic and effective framework to measure trajectories of large numbers of crowded oval-shaped objects, such as fish and cells. We first use a novel dual ellipse locator to detect the coarse position of each individual and then propose a variance minimization active contour method to obtain the optimal segmentation results. For tracking, cost matrix of assignment between consecutive frames is trainable via a random forest classifier with many spatial, texture, and shape features. The optimal trajectories are found for the whole image sequence by solving two linear assignment problems. We evaluate the proposed method on many challenging data sets.
Social-aware data dissemination in opportunistic mobile social networks

NASA Astrophysics Data System (ADS)

Yang, Yibo; Zhao, Honglin; Ma, Jinlong; Han, Xiaowei

Opportunistic Mobile Social Networks (OMSNs), formed by mobile users with social relationships and characteristics, enhance spontaneous communication among users that opportunistically encounter each other. Such networks can be exploited to improve the performance of data forwarding. Discovering optimal relay nodes is one of the important issues for efficient data propagation in OMSNs. Although traditional centrality definitions to identify the nodes features in network, they cannot identify effectively the influential nodes for data dissemination in OMSNs. Existing protocols take advantage of spatial contact frequency and social characteristics to enhance transmission performance. However, existing protocols have not fully exploited the benefits of the relations and the effects between geographical information, social features and user interests. In this paper, we first evaluate these three characteristics of users and design a routing protocol called Geo-Social-Interest (GSI) protocol to select optimal relay nodes. We compare the performance of GSI using real INFOCOM06 data sets. The experiment results demonstrate that GSI overperforms the other protocols with highest data delivery ratio and low communication overhead.
A General-Purpose Optimization Engine for Multi-Disciplinary Design Applications

NASA Technical Reports Server (NTRS)

Patnaik, Surya N.; Hopkins, Dale A.; Berke, Laszlo

1996-01-01

A general purpose optimization tool for multidisciplinary applications, which in the literature is known as COMETBOARDS, is being developed at NASA Lewis Research Center. The modular organization of COMETBOARDS includes several analyzers and state-of-the-art optimization algorithms along with their cascading strategy. The code structure allows quick integration of new analyzers and optimizers. The COMETBOARDS code reads input information from a number of data files, formulates a design as a set of multidisciplinary nonlinear programming problems, and then solves the resulting problems. COMETBOARDS can be used to solve a large problem which can be defined through multiple disciplines, each of which can be further broken down into several subproblems. Alternatively, a small portion of a large problem can be optimized in an effort to improve an existing system. Some of the other unique features of COMETBOARDS include design variable formulation, constraint formulation, subproblem coupling strategy, global scaling technique, analysis approximation, use of either sequential or parallel computational modes, and so forth. The special features and unique strengths of COMETBOARDS assist convergence and reduce the amount of CPU time used to solve the difficult optimization problems of aerospace industries. COMETBOARDS has been successfully used to solve a number of problems, including structural design of space station components, design of nozzle components of an air-breathing engine, configuration design of subsonic and supersonic aircraft, mixed flow turbofan engines, wave rotor topped engines, and so forth. This paper introduces the COMETBOARDS design tool and its versatility, which is illustrated by citing examples from structures, aircraft design, and air-breathing propulsion engine design.
Building Extraction Based on an Optimized Stacked Sparse Autoencoder of Structure and Training Samples Using LIDAR DSM and Optical Images.

PubMed

Yan, Yiming; Tan, Zhichao; Su, Nan; Zhao, Chunhui

2017-08-24

In this paper, a building extraction method is proposed based on a stacked sparse autoencoder with an optimized structure and training samples. Building extraction plays an important role in urban construction and planning. However, some negative effects will reduce the accuracy of extraction, such as exceeding resolution, bad correction and terrain influence. Data collected by multiple sensors, as light detection and ranging (LIDAR), optical sensor etc., are used to improve the extraction. Using digital surface model (DSM) obtained from LIDAR data and optical images, traditional method can improve the extraction effect to a certain extent, but there are some defects in feature extraction. Since stacked sparse autoencoder (SSAE) neural network can learn the essential characteristics of the data in depth, SSAE was employed to extract buildings from the combined DSM data and optical image. A better setting strategy of SSAE network structure is given, and an idea of setting the number and proportion of training samples for better training of SSAE was presented. The optical data and DSM were combined as input of the optimized SSAE, and after training by an optimized samples, the appropriate network structure can extract buildings with great accuracy and has good robustness.
Medium power amplifiers covering 90 - 130 GHz for telescope local oscillators

NASA Technical Reports Server (NTRS)

Samoska, Lorene A.; Bryerton, Eric; Pukala, David; Peralta, Alejandro; Hu, Ming; Schmitz, Adele

2005-01-01

This paper describes a set of power amplifier (PA) modules containing InP High Electron Mobility Transistor (HEMT) Monolithic Millimeter-wave Integrated Circuit (MMIC) chips. The chips were designed and optimized for local oscillator sources in the 90-130 GHz band for the Atacama Large Millimeter Array telescope. The modules feature 20-45 mW of output power, to date the highest power from solid state HEMT MMIC modules above 110 GHz.
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.

PubMed

Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra

2011-01-01

Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

PubMed

Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel

2011-05-09

Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

PubMed Central

2011-01-01

Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689
Six-degree-of-freedom program to optimize simulated trajectories (6D POST). Volume 1: Formulation manual

NASA Technical Reports Server (NTRS)

Brauer, G. L.; Habeger, A. R.; Stevenson, R.

1974-01-01

The basic equations and models used in a computer program (6D POST) to optimize simulated trajectories with six degrees of freedom were documented. The 6D POST program was conceived as a direct extension of the program POST, which dealt with point masses, and considers the general motion of a rigid body with six degrees of freedom. It may be used to solve a wide variety of atmospheric flight mechanics and orbital transfer problems for powered or unpowered vehicles operating near a rotating oblate planet. Its principal features are: an easy to use NAMELIST type input procedure, an integrated set of Flight Control System (FCS) modules, and a general-purpose discrete parameter targeting and optimization capability. It was written in FORTRAN 4 for the CDC 6000 series computers.
Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

PubMed

Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

2017-01-01

A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Adaptive road crack detection system by pavement classification.

PubMed

Gavilán, Miguel; Balcones, David; Marcos, Oscar; Llorca, David F; Sotelo, Miguel A; Parra, Ignacio; Ocaña, Manuel; Aliseda, Pedro; Yarza, Pedro; Amírola, Alejandro

2011-01-01

This paper presents a road distress detection system involving the phases needed to properly deal with fully automatic road distress assessment. A vehicle equipped with line scan cameras, laser illumination and acquisition HW-SW is used to storage the digital images that will be further processed to identify road cracks. Pre-processing is firstly carried out to both smooth the texture and enhance the linear features. Non-crack features detection is then applied to mask areas of the images with joints, sealed cracks and white painting, that usually generate false positive cracking. A seed-based approach is proposed to deal with road crack detection, combining Multiple Directional Non-Minimum Suppression (MDNMS) with a symmetry check. Seeds are linked by computing the paths with the lowest cost that meet the symmetry restrictions. The whole detection process involves the use of several parameters. A correct setting becomes essential to get optimal results without manual intervention. A fully automatic approach by means of a linear SVM-based classifier ensemble able to distinguish between up to 10 different types of pavement that appear in the Spanish roads is proposed. The optimal feature vector includes different texture-based features. The parameters are then tuned depending on the output provided by the classifier. Regarding non-crack features detection, results show that the introduction of such module reduces the impact of false positives due to non-crack features up to a factor of 2. In addition, the observed performance of the crack detection system is significantly boosted by adapting the parameters to the type of pavement.
Adaptive Road Crack Detection System by Pavement Classification

PubMed Central

Gavilán, Miguel; Balcones, David; Marcos, Oscar; Llorca, David F.; Sotelo, Miguel A.; Parra, Ignacio; Ocaña, Manuel; Aliseda, Pedro; Yarza, Pedro; Amírola, Alejandro

2011-01-01

This paper presents a road distress detection system involving the phases needed to properly deal with fully automatic road distress assessment. A vehicle equipped with line scan cameras, laser illumination and acquisition HW-SW is used to storage the digital images that will be further processed to identify road cracks. Pre-processing is firstly carried out to both smooth the texture and enhance the linear features. Non-crack features detection is then applied to mask areas of the images with joints, sealed cracks and white painting, that usually generate false positive cracking. A seed-based approach is proposed to deal with road crack detection, combining Multiple Directional Non-Minimum Suppression (MDNMS) with a symmetry check. Seeds are linked by computing the paths with the lowest cost that meet the symmetry restrictions. The whole detection process involves the use of several parameters. A correct setting becomes essential to get optimal results without manual intervention. A fully automatic approach by means of a linear SVM-based classifier ensemble able to distinguish between up to 10 different types of pavement that appear in the Spanish roads is proposed. The optimal feature vector includes different texture-based features. The parameters are then tuned depending on the output provided by the classifier. Regarding non-crack features detection, results show that the introduction of such module reduces the impact of false positives due to non-crack features up to a factor of 2. In addition, the observed performance of the crack detection system is significantly boosted by adapting the parameters to the type of pavement. PMID:22163717
S3D: An interactive surface grid generation tool

NASA Technical Reports Server (NTRS)

Luh, Raymond Ching-Chung; Pierce, Lawrence E.; Yip, David

1992-01-01

S3D, an interactive software tool for surface grid generation, is described. S3D provides the means with which a geometry definition based either on a discretized curve set or a rectangular set can be quickly processed towards the generation of a surface grid for computational fluid dynamics (CFD) applications. This is made possible as a result of implementing commonly encountered surface gridding tasks in an environment with a highly efficient and user friendly graphical interface. Some of the more advanced features of S3D include surface-surface intersections, optimized surface domain decomposition and recomposition, and automated propagation of edge distributions to surrounding grids.
Decision Tree Repository and Rule Set Based Mingjiang River Estuarine Wetlands Classifaction

NASA Astrophysics Data System (ADS)

Zhang, W.; Li, X.; Xiao, W.

2018-05-01

The increasing urbanization and industrialization have led to wetland losses in estuarine area of Mingjiang River over past three decades. There has been increasing attention given to produce wetland inventories using remote sensing and GIS technology. Due to inconsistency training site and training sample, traditionally pixel-based image classification methods can't achieve a comparable result within different organizations. Meanwhile, object-oriented image classification technique shows grate potential to solve this problem and Landsat moderate resolution remote sensing images are widely used to fulfill this requirement. Firstly, the standardized atmospheric correct, spectrally high fidelity texture feature enhancement was conducted before implementing the object-oriented wetland classification method in eCognition. Secondly, we performed the multi-scale segmentation procedure, taking the scale, hue, shape, compactness and smoothness of the image into account to get the appropriate parameters, using the top and down region merge algorithm from single pixel level, the optimal texture segmentation scale for different types of features is confirmed. Then, the segmented object is used as the classification unit to calculate the spectral information such as Mean value, Maximum value, Minimum value, Brightness value and the Normalized value. The Area, length, Tightness and the Shape rule of the image object Spatial features and texture features such as Mean, Variance and Entropy of image objects are used as classification features of training samples. Based on the reference images and the sampling points of on-the-spot investigation, typical training samples are selected uniformly and randomly for each type of ground objects. The spectral, texture and spatial characteristics of each type of feature in each feature layer corresponding to the range of values are used to create the decision tree repository. Finally, with the help of high resolution reference images, the random sampling method is used to conduct the field investigation, achieve an overall accuracy of 90.31 %, and the Kappa coefficient is 0.88. The classification method based on decision tree threshold values and rule set developed by the repository, outperforms the results obtained from the traditional methodology. Our decision tree repository and rule set based object-oriented classification technique was an effective method for producing comparable and consistency wetlands data set.
Multiple target sound quality balance for hybrid electric powertrain noise

NASA Astrophysics Data System (ADS)

Mosquera-Sánchez, J. A.; Sarrazin, M.; Janssens, K.; de Oliveira, L. P. R.; Desmet, W.

2018-01-01

The integration of the electric motor to the powertrain in hybrid electric vehicles (HEVs) presents acoustic stimuli that elicit new perceptions. The large number of spectral components, as well as the wider bandwidth of this sort of noises, pose new challenges to current noise, vibration and harshness (NVH) approaches. This paper presents a framework for enhancing the sound quality (SQ) of the hybrid electric powertrain noise perceived inside the passenger compartment. Compared with current active sound quality control (ASQC) schemes, where the SQ improvement is just an effect of the control actions, the proposed technique features an optimization stage, which enables the NVH specialist to actively implement the amplitude balance of the tones that better fits into the auditory expectations. Since Loudness, Roughness, Sharpness and Tonality are the most relevant SQ metrics for interior HEV noise, they are used as performance metrics in the concurrent optimization analysis, which, eventually, drives the control design method. Thus, multichannel active sound profiling systems that feature cross-channel compensation schemes are guided by the multi-objective optimization stage, by means of optimal sets of amplitude gain factors that can be implemented at each single sensor location, while minimizing cross-channel effects that can either degrade the original SQ condition, or even hinder the implementation of independent SQ targets. The proposed framework is verified experimentally, with realistic stationary hybrid electric powertrain noise, showing SQ enhancement for multiple locations within a scaled vehicle mock-up. The results show total success rates in excess of 90%, which indicate that the proposed method is promising, not only for the improvement of the SQ of HEV noise, but also for a variety of periodic disturbances with similar features.
S-Genius, a universal software platform with versatile inverse problem resolution for scatterometry

NASA Astrophysics Data System (ADS)

Fuard, David; Troscompt, Nicolas; El Kalyoubi, Ismael; Soulan, Sébastien; Besacier, Maxime

2013-05-01

S-Genius is a new universal scatterometry platform, which gathers all the LTM-CNRS know-how regarding the rigorous electromagnetic computation and several inverse problem solver solutions. This software platform is built to be a userfriendly, light, swift, accurate, user-oriented scatterometry tool, compatible with any ellipsometric measurements to fit and any types of pattern. It aims to combine a set of inverse problem solver capabilities — via adapted Levenberg- Marquard optimization, Kriging, Neural Network solutions — that greatly improve the reliability and the velocity of the solution determination. Furthermore, as the model solution is mainly vulnerable to materials optical properties, S-Genius may be coupled with an innovative material refractive indices determination. This paper will a little bit more focuses on the modified Levenberg-Marquardt optimization, one of the indirect method solver built up in parallel with the total SGenius software coding by yours truly. This modified Levenberg-Marquardt optimization corresponds to a Newton algorithm with an adapted damping parameter regarding the definition domains of the optimized parameters. Currently, S-Genius is technically ready for scientific collaboration, python-powered, multi-platform (windows/linux/macOS), multi-core, ready for 2D- (infinite features along the direction perpendicular to the incident plane), conical, and 3D-features computation, compatible with all kinds of input data from any possible ellipsometers (angle or wavelength resolved) or reflectometers, and widely used in our laboratory for resist trimming studies, etching features characterization (such as complex stack) or nano-imprint lithography measurements for instance. The work about kriging solver, neural network solver and material refractive indices determination is done (or about to) by other LTM members and about to be integrated on S-Genius platform.
Signal Analysis of Helicopter Blade-Vortex-Interaction Acoustic Noise Data

NASA Technical Reports Server (NTRS)

Rogers, James C.; Dai, Renshou

1998-01-01

Blade-Vortex-Interaction (BVI) produces annoying high-intensity impulsive noise. NASA Ames collected several sets of BVI noise data during in-flight and wind tunnel tests. The goal of this work is to extract the essential features of the BVI signals from the in-flight data and examine the feasibility of extracting those features from BVI noise recorded inside a large wind tunnel. BVI noise generating mechanisms and BVI radiation patterns an are considered and a simple mathematical-physical model is presented. It allows the construction of simple synthetic BVI events that are comparable to free flight data. The boundary effects of the wind tunnel floor and ceiling are identified and more complex synthetic BVI events are constructed to account for features observed in the wind tunnel data. It is demonstrated that improved recording of BVI events can be attained by changing the geometry of the rotor hub, floor, ceiling and microphone. The Euclidean distance measure is used to align BVI events from each blade and improved BVI signals are obtained by time-domain averaging the aligned data. The differences between BVI events for individual blades are then apparent. Removal of wind tunnel background noise by optimal Wiener-filtering is shown to be effective provided representative noise-only data have been recorded. Elimination of wind tunnel reflections by cepstral and optimal filtering deconvolution is examined. It is seen that the cepstral method is not applicable but that a pragmatic optimal filtering approach gives encouraging results. Recommendations for further work include: altering measurement geometry, real-time data observation and evaluation, examining reflection signals (particularly those from the ceiling) and performing further analysis of expected BVI signals for flight conditions of interest so that microphone placement can be optimized for each condition.
Multivariate Protein Signatures of Pre-Clinical Alzheimer's Disease in the Alzheimer's Disease Neuroimaging Initiative (ADNI) Plasma Proteome Dataset

PubMed Central

Johnstone, Daniel; Milward, Elizabeth A.; Berretta, Regina; Moscato, Pablo

2012-01-01

Background Recent Alzheimer's disease (AD) research has focused on finding biomarkers to identify disease at the pre-clinical stage of mild cognitive impairment (MCI), allowing treatment to be initiated before irreversible damage occurs. Many studies have examined brain imaging or cerebrospinal fluid but there is also growing interest in blood biomarkers. The Alzheimer's Disease Neuroimaging Initiative (ADNI) has generated data on 190 plasma analytes in 566 individuals with MCI, AD or normal cognition. We conducted independent analyses of this dataset to identify plasma protein signatures predicting pre-clinical AD. Methods and Findings We focused on identifying signatures that discriminate cognitively normal controls (n = 54) from individuals with MCI who subsequently progress to AD (n = 163). Based on p value, apolipoprotein E (APOE) showed the strongest difference between these groups (p = 2.3×10−13). We applied a multivariate approach based on combinatorial optimization ((α,β)-k Feature Set Selection), which retains information about individual participants and maintains the context of interrelationships between different analytes, to identify the optimal set of analytes (signature) to discriminate these two groups. We identified 11-analyte signatures achieving values of sensitivity and specificity between 65% and 86% for both MCI and AD groups, depending on whether APOE was included and other factors. Classification accuracy was improved by considering “meta-features,” representing the difference in relative abundance of two analytes, with an 8-meta-feature signature consistently achieving sensitivity and specificity both over 85%. Generating signatures based on longitudinal rather than cross-sectional data further improved classification accuracy, returning sensitivities and specificities of approximately 90%. Conclusions Applying these novel analysis approaches to the powerful and well-characterized ADNI dataset has identified sets of plasma biomarkers for pre-clinical AD. While studies of independent test sets are required to validate the signatures, these analyses provide a starting point for developing a cost-effective and minimally invasive test capable of diagnosing AD in its pre-clinical stages. PMID:22485168

Invariant-feature-based adaptive automatic target recognition in obscured 3D point clouds

NASA Astrophysics Data System (ADS)

Khuon, Timothy; Kershner, Charles; Mattei, Enrico; Alverio, Arnel; Rand, Robert

2014-06-01

Target recognition and classification in a 3D point cloud is a non-trivial process due to the nature of the data collected from a sensor system. The signal can be corrupted by noise from the environment, electronic system, A/D converter, etc. Therefore, an adaptive system with a desired tolerance is required to perform classification and recognition optimally. The feature-based pattern recognition algorithm architecture as described below is particularly devised for solving a single-sensor classification non-parametrically. Feature set is extracted from an input point cloud, normalized, and classifier a neural network classifier. For instance, automatic target recognition in an urban area would require different feature sets from one in a dense foliage area. The figure above (see manuscript) illustrates the architecture of the feature based adaptive signature extraction of 3D point cloud including LIDAR, RADAR, and electro-optical data. This network takes a 3D cluster and classifies it into a specific class. The algorithm is a supervised and adaptive classifier with two modes: the training mode and the performing mode. For the training mode, a number of novel patterns are selected from actual or artificial data. A particular 3D cluster is input to the network as shown above for the decision class output. The network consists of three sequential functional modules. The first module is for feature extraction that extracts the input cluster into a set of singular value features or feature vector. Then the feature vector is input into the feature normalization module to normalize and balance it before being fed to the neural net classifier for the classification. The neural net can be trained by actual or artificial novel data until each trained output reaches the declared output within the defined tolerance. In case new novel data is added after the neural net has been learned, the training is then resumed until the neural net has incrementally learned with the new novel data. The associative memory capability of the neural net enables the incremental learning. The back propagation algorithm or support vector machine can be utilized for the classification and recognition.
Performance comparison of machine learning algorithms and number of independent components used in fMRI decoding of belief vs. disbelief.

PubMed

Douglas, P K; Harris, Sam; Yuille, Alan; Cohen, Mark S

2011-05-15

Machine learning (ML) has become a popular tool for mining functional neuroimaging data, and there are now hopes of performing such analyses efficiently in real-time. Towards this goal, we compared accuracy of six different ML algorithms applied to neuroimaging data of persons engaged in a bivariate task, asserting their belief or disbelief of a variety of propositional statements. We performed unsupervised dimension reduction and automated feature extraction using independent component (IC) analysis and extracted IC time courses. Optimization of classification hyperparameters across each classifier occurred prior to assessment. Maximum accuracy was achieved at 92% for Random Forest, followed by 91% for AdaBoost, 89% for Naïve Bayes, 87% for a J48 decision tree, 86% for K*, and 84% for support vector machine. For real-time decoding applications, finding a parsimonious subset of diagnostic ICs might be useful. We used a forward search technique to sequentially add ranked ICs to the feature subspace. For the current data set, we determined that approximately six ICs represented a meaningful basis set for classification. We then projected these six IC spatial maps forward onto a later scanning session within subject. We then applied the optimized ML algorithms to these new data instances, and found that classification accuracy results were reproducible. Additionally, we compared our classification method to our previously published general linear model results on this same data set. The highest ranked IC spatial maps show similarity to brain regions associated with contrasts for belief > disbelief, and disbelief < belief. Copyright © 2010 Elsevier Inc. All rights reserved.
Computer-aided diagnosis of prostate cancer in the peripheral zone using multiparametric MRI

NASA Astrophysics Data System (ADS)

Niaf, Emilie; Rouvière, Olivier; Mège-Lechevallier, Florence; Bratan, Flavie; Lartizien, Carole

2012-06-01

This study evaluated a computer-assisted diagnosis (CADx) system for determining a likelihood measure of prostate cancer presence in the peripheral zone (PZ) based on multiparametric magnetic resonance (MR) imaging, including T2-weighted, diffusion-weighted and dynamic contrast-enhanced MRI at 1.5 T. Based on a feature set derived from grey-level images, including first-order statistics, Haralick features, gradient features, semi-quantitative and quantitative (pharmacokinetic modelling) dynamic parameters, four kinds of classifiers were trained and compared : nonlinear support vector machine (SVM), linear discriminant analysis, k-nearest neighbours and naïve Bayes classifiers. A set of feature selection methods based on t-test, mutual information and minimum-redundancy-maximum-relevancy criteria were also compared. The aim was to discriminate between the relevant features as well as to create an efficient classifier using these features. The diagnostic performances of these different CADx schemes were evaluated based on a receiver operating characteristic (ROC) curve analysis. The evaluation database consisted of 30 sets of multiparametric MR images acquired from radical prostatectomy patients. Using histologic sections as the gold standard, both cancer and nonmalignant (but suspicious) tissues were annotated in consensus on all MR images by two radiologists, a histopathologist and a researcher. Benign tissue regions of interest (ROIs) were also delineated in the remaining prostate PZ. This resulted in a series of 42 cancer ROIs, 49 benign but suspicious ROIs and 124 nonsuspicious benign ROIs. From the outputs of all evaluated feature selection methods on the test bench, a restrictive set of about 15 highly informative features coming from all MR sequences was discriminated, thus confirming the validity of the multiparametric approach. Quantitative evaluation of the diagnostic performance yielded a maximal area under the ROC curve (AUC) of 0.89 (0.81-0.94) for the discrimination of the malignant versus nonmalignant tissues and 0.82 (0.73-0.90) for the discrimination of the malignant versus suspicious tissues when combining the t-test feature selection approach with a SVM classifier. A preliminary comparison showed that the optimal CADx scheme mimicked, in terms of AUC, the human experts in differentiating malignant from suspicious tissues, thus demonstrating its potential for assisting cancer identification in the PZ.
Precision thermometry and the quantum speed limit

NASA Astrophysics Data System (ADS)

Campbell, Steve; Genoni, Marco G.; Deffner, Sebastian

2018-04-01

We assess precision thermometry for an arbitrary single quantum system. For a d-dimensional harmonic system we show that the gap sets a single temperature that can be optimally estimated. Furthermore, we establish a simple linear relationship between the gap and this temperature, and show that the precision exhibits a quadratic relationship. We extend our analysis to explore systems with arbitrary spectra, showing that exploiting anharmonicity and degeneracy can greatly enhance the precision of thermometry. Finally, we critically assess the dynamical features of two thermometry protocols for a two level system. By calculating the quantum speed limit we find that, despite the gap fixing a preferred temperature to probe, there is no evidence of this emerging in the dynamical features.
Real time groove characterization combining partial least squares and SVR strategies: application to eddy current testing

NASA Astrophysics Data System (ADS)

Ahmed, S.; Salucci, M.; Miorelli, R.; Anselmi, N.; Oliveri, G.; Calmon, P.; Reboud, C.; Massa, A.

2017-10-01

A quasi real-time inversion strategy is presented for groove characterization of a conductive non-ferromagnetic tube structure by exploiting eddy current testing (ECT) signal. Inversion problem has been formulated by non-iterative Learning-by-Examples (LBE) strategy. Within the framework of LBE, an efficient training strategy has been adopted with the combination of feature extraction and a customized version of output space filling (OSF) adaptive sampling in order to get optimal training set during offline phase. Partial Least Squares (PLS) and Support Vector Regression (SVR) have been exploited for feature extraction and prediction technique respectively to have robust and accurate real time inversion during online phase.
Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.

PubMed

Lu, Xiaoqiang; Chen, Yaxiong; Li, Xuelong

Hashing has been an important and effective technology in image retrieval due to its computational efficiency and fast search speed. The traditional hashing methods usually learn hash functions to obtain binary codes by exploiting hand-crafted features, which cannot optimally represent the information of the sample. Recently, deep learning methods can achieve better performance, since deep learning architectures can learn more effective image representation features. However, these methods only use semantic features to generate hash codes by shallow projection but ignore texture details. In this paper, we proposed a novel hashing method, namely hierarchical recurrent neural hashing (HRNH), to exploit hierarchical recurrent neural network to generate effective hash codes. There are three contributions of this paper. First, a deep hashing method is proposed to extensively exploit both spatial details and semantic information, in which, we leverage hierarchical convolutional features to construct image pyramid representation. Second, our proposed deep network can exploit directly convolutional feature maps as input to preserve the spatial structure of convolutional feature maps. Finally, we propose a new loss function that considers the quantization error of binarizing the continuous embeddings into the discrete binary codes, and simultaneously maintains the semantic similarity and balanceable property of hash codes. Experimental results on four widely used data sets demonstrate that the proposed HRNH can achieve superior performance over other state-of-the-art hashing methods.Hashing has been an important and effective technology in image retrieval due to its computational efficiency and fast search speed. The traditional hashing methods usually learn hash functions to obtain binary codes by exploiting hand-crafted features, which cannot optimally represent the information of the sample. Recently, deep learning methods can achieve better performance, since deep learning architectures can learn more effective image representation features. However, these methods only use semantic features to generate hash codes by shallow projection but ignore texture details. In this paper, we proposed a novel hashing method, namely hierarchical recurrent neural hashing (HRNH), to exploit hierarchical recurrent neural network to generate effective hash codes. There are three contributions of this paper. First, a deep hashing method is proposed to extensively exploit both spatial details and semantic information, in which, we leverage hierarchical convolutional features to construct image pyramid representation. Second, our proposed deep network can exploit directly convolutional feature maps as input to preserve the spatial structure of convolutional feature maps. Finally, we propose a new loss function that considers the quantization error of binarizing the continuous embeddings into the discrete binary codes, and simultaneously maintains the semantic similarity and balanceable property of hash codes. Experimental results on four widely used data sets demonstrate that the proposed HRNH can achieve superior performance over other state-of-the-art hashing methods.
Multiview alignment hashing for efficient image search.

PubMed

Liu, Li; Yu, Mengyang; Shao, Ling

2015-03-01

Hashing is a popular and efficient method for nearest neighbor search in large-scale data spaces by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. For most hashing methods, the performance of retrieval heavily depends on the choice of the high-dimensional feature descriptor. Furthermore, a single type of feature cannot be descriptive enough for different images when it is used for hashing. Thus, how to combine multiple representations for learning effective hashing functions is an imminent task. In this paper, we present a novel unsupervised multiview alignment hashing approach based on regularized kernel nonnegative matrix factorization, which can find a compact representation uncovering the hidden semantics and simultaneously respecting the joint probability distribution of data. In particular, we aim to seek a matrix factorization to effectively fuse the multiple information sources meanwhile discarding the feature redundancy. Since the raised problem is regarded as nonconvex and discrete, our objective function is then optimized via an alternate way with relaxation and converges to a locally optimal solution. After finding the low-dimensional representation, the hashing functions are finally obtained through multivariable logistic regression. The proposed method is systematically evaluated on three data sets: 1) Caltech-256; 2) CIFAR-10; and 3) CIFAR-20, and the results show that our method significantly outperforms the state-of-the-art multiview hashing techniques.
Automated diagnosis of focal liver lesions using bidirectional empirical mode decomposition features.

PubMed

Acharya, U Rajendra; Koh, Joel En Wei; Hagiwara, Yuki; Tan, Jen Hong; Gertych, Arkadiusz; Vijayananthan, Anushya; Yaakup, Nur Adura; Abdullah, Basri Johan Jeet; Bin Mohd Fabell, Mohd Kamil; Yeong, Chai Hong

2018-03-01

Liver is the heaviest internal organ of the human body and performs many vital functions. Prolonged cirrhosis and fatty liver disease may lead to the formation of benign or malignant lesions in this organ, and an early and reliable evaluation of these conditions can improve treatment outcomes. Ultrasound imaging is a safe, non-invasive, and cost-effective way of diagnosing liver lesions. However, this technique has limited performance in determining the nature of the lesions. This study initiates a computer-aided diagnosis (CAD) system to aid radiologists in an objective and more reliable interpretation of ultrasound images of liver lesions. In this work, we have employed radon transform and bi-directional empirical mode decomposition (BEMD) to extract features from the focal liver lesions. After which, the extracted features were subjected to particle swarm optimization (PSO) technique for the selection of a set of optimized features for classification. Our automated CAD system can differentiate normal, malignant, and benign liver lesions using machine learning algorithms. It was trained using 78 normal, 26 benign and 36 malignant focal lesions of the liver. The accuracy, sensitivity, and specificity of lesion classification were 92.95%, 90.80%, and 97.44%, respectively. The proposed CAD system is fully automatic as no segmentation of region-of-interest (ROI) is required. Copyright © 2018 Elsevier Ltd. All rights reserved.
Nursing Student Experiences Regarding Safe Use of Electronic Health Records: A Pilot Study of the Safety and Assurance Factors for EHR Resilience Guides.

PubMed

Whitt, Karen J; Eden, Lacey; Merrill, Katreena Collette; Hughes, Mckenna

2017-01-01

Previous research has linked improper electronic health record configuration and use with adverse patient events. In response to this problem, the US Office of the National Coordinator for Health Information Technology developed the Safety and Assurance Factors for EHR Resilience guides to evaluate electronic health records for optimal use and safety features. During the course of their education, nursing students are exposed to a variety of clinical practice settings and electronic health records. This descriptive study evaluated 108 undergraduate and 51 graduate nursing students' ratings of electronic health record features and safe practices, as well as what they learned from utilizing the computerized provider order entry and clinician communication Safety and Assurance Factors for EHR Resilience guide checklists. More than 80% of the undergraduate and 70% of the graduate students reported that they experienced user problems with electronic health records in the past. More than 50% of the students felt that electronic health records contribute to adverse patient outcomes. Students reported that many of the features assessed were not fully implemented in their electronic health record. These findings highlight areas where electronic health records can be improved to optimize patient safety. The majority of students reported that utilizing the Safety and Assurance Factors for EHR Resilience guides increased their understanding of electronic health record features.
Predicting protein amidation sites by orchestrating amino acid sequence features

NASA Astrophysics Data System (ADS)

Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

2017-08-01

Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.
Fault diagnosis for analog circuits utilizing time-frequency features and improved VVRKFA

NASA Astrophysics Data System (ADS)

He, Wei; He, Yigang; Luo, Qiwu; Zhang, Chaolong

2018-04-01

This paper proposes a novel scheme for analog circuit fault diagnosis utilizing features extracted from the time-frequency representations of signals and an improved vector-valued regularized kernel function approximation (VVRKFA). First, the cross-wavelet transform is employed to yield the energy-phase distribution of the fault signals over the time and frequency domain. Since the distribution is high-dimensional, a supervised dimensionality reduction technique—the bilateral 2D linear discriminant analysis—is applied to build a concise feature set from the distributions. Finally, VVRKFA is utilized to locate the fault. In order to improve the classification performance, the quantum-behaved particle swarm optimization technique is employed to gradually tune the learning parameter of the VVRKFA classifier. The experimental results for the analog circuit faults classification have demonstrated that the proposed diagnosis scheme has an advantage over other approaches.
Computer-aided detection of bladder mass within non-contrast-enhanced region of CT Urography (CTU)

NASA Astrophysics Data System (ADS)

Cha, Kenny H.; Hadjiiski, Lubomir M.; Chan, Heang-Ping; Caoili, Elaine M.; Cohan, Richard H.; Weizer, Alon; Zhou, Chuan

2016-03-01

We are developing a computer-aided detection system for bladder cancer in CT urography (CTU). We have previously developed methods for detection of bladder masses within the contrast-enhanced region of the bladder. In this study, we investigated methods for detection of bladder masses within the non-contrast enhanced region. The bladder was first segmented using a newly developed deep-learning convolutional neural network in combination with level sets. The non-contrast-enhanced region was separated from the contrast-enhanced region with a maximum-intensityprojection- based method. The non-contrast region was smoothed and a gray level threshold was employed to segment the bladder wall and potential masses. The bladder wall was transformed into a straightened thickness profile, which was analyzed to identify lesion candidates as a prescreening step. The lesion candidates were segmented using our autoinitialized cascaded level set (AI-CALS) segmentation method, and 27 morphological features were extracted for each candidate. Stepwise feature selection with simplex optimization and leave-one-case-out resampling were used for training and validation of a false positive (FP) classifier. In each leave-one-case-out cycle, features were selected from the training cases and a linear discriminant analysis (LDA) classifier was designed to merge the selected features into a single score for classification of the left-out test case. A data set of 33 cases with 42 biopsy-proven lesions in the noncontrast enhanced region was collected. During prescreening, the system obtained 83.3% sensitivity at an average of 2.4 FPs/case. After feature extraction and FP reduction by LDA, the system achieved 81.0% sensitivity at 2.0 FPs/case, and 73.8% sensitivity at 1.5 FPs/case.
Optimally managing water resources in large river basins for an uncertain future

USGS Publications Warehouse

Edwin A. Roehl, Jr.; Conrads, Paul

2014-01-01

Managers of large river basins face conflicting needs for water resources such as wildlife habitat, water supply, wastewater assimilative capacity, flood control, hydroelectricity, and recreation. The Savannah River Basin for example, has experienced three major droughts since 2000 that resulted in record low water levels in its reservoirs, impacting local economies for years. The Savannah River Basin’s coastal area contains municipal water intakes and the ecologically sensitive freshwater tidal marshes of the Savannah National Wildlife Refuge. The Port of Savannah is the fourth busiest in the United States, and modifications to the harbor have caused saltwater to migrate upstream, reducing the freshwater marsh’s acreage more than 50 percent since the 1970s. There is a planned deepening of the harbor that includes flow-alteration features to minimize further migration of salinity. The effectiveness of the flow-alteration features will only be known after they are constructed. One of the challenges of basin management is the optimization of water use through ongoing development, droughts, and climate change. This paper describes a model of the Savannah River Basin designed to continuously optimize regulated flow to meet prioritized objectives set by resource managers and stakeholders. The model was developed from historical data by using machine learning, making it more accurate and adaptable to changing conditions than traditional models. The model is coupled to an optimization routine that computes the daily flow needed to most efficiently meet the water-resource management objectives. The model and optimization routine are packaged in a decision support system that makes it easy for managers and stakeholders to use. Simulation results show that flow can be regulated to significantly reduce salinity intrusions in the Savannah National Wildlife Refuge while conserving more water in the reservoirs. A method for using the model to assess the effectiveness of the flow-alteration features after the deepening also is demonstrated
Blended near-optimal alternative generation, visualization, and interaction for water resources decision making

NASA Astrophysics Data System (ADS)

Rosenberg, David E.

2015-04-01

State-of-the-art systems analysis techniques focus on efficiently finding optimal solutions. Yet an optimal solution is optimal only for the modeled issues and managers often seek near-optimal alternatives that address unmodeled objectives, preferences, limits, uncertainties, and other issues. Early on, Modeling to Generate Alternatives (MGA) formalized near-optimal as performance within a tolerable deviation from the optimal objective function value and identified a few maximally different alternatives that addressed some unmodeled issues. This paper presents new stratified, Monte-Carlo Markov Chain sampling and parallel coordinate plotting tools that generate and communicate the structure and extent of the near-optimal region to an optimization problem. Interactive plot controls allow users to explore region features of most interest. Controls also streamline the process to elicit unmodeled issues and update the model formulation in response to elicited issues. Use for an example, single-objective, linear water quality management problem at Echo Reservoir, Utah, identifies numerous and flexible practices to reduce the phosphorus load to the reservoir and maintain close-to-optimal performance. Flexibility is upheld by further interactive alternative generation, transforming the formulation into a multiobjective problem, and relaxing the tolerance parameter to expand the near-optimal region. Compared to MGA, the new blended tools generate more numerous alternatives faster, more fully show the near-optimal region, and help elicit a larger set of unmodeled issues.
On the ground state of Yang-Mills theory

NASA Astrophysics Data System (ADS)

Bakry, Ahmed S.; Leinweber, Derek B.; Williams, Anthony G.

2011-08-01

We investigate the overlap of the ground state meson potential with sets of mesonic-trial wave functions corresponding to different gluonic distributions. We probe the transverse structure of the flux tube through the creation of non-uniform smearing profiles for the string of glue connecting two color sources in Wilson loop operator. The non-uniformly UV-regulated flux-tube operators are found to optimize the overlap with the ground state and display interesting features in the ground state overlap.
Horror Image Recognition Based on Context-Aware Multi-Instance Learning.

PubMed

Li, Bing; Xiong, Weihua; Wu, Ou; Hu, Weiming; Maybank, Stephen; Yan, Shuicheng

2015-12-01

Horror content sharing on the Web is a growing phenomenon that can interfere with our daily life and affect the mental health of those involved. As an important form of expression, horror images have their own characteristics that can evoke extreme emotions. In this paper, we present a novel context-aware multi-instance learning (CMIL) algorithm for horror image recognition. The CMIL algorithm identifies horror images and picks out the regions that cause the sensation of horror in these horror images. It obtains contextual cues among adjacent regions in an image using a random walk on a contextual graph. Borrowing the strength of the fuzzy support vector machine (FSVM), we define a heuristic optimization procedure based on the FSVM to search for the optimal classifier for the CMIL. To improve the initialization of the CMIL, we propose a novel visual saliency model based on the tensor analysis. The average saliency value of each segmented region is set as its initial fuzzy membership in the CMIL. The advantage of the tensor-based visual saliency model is that it not only adaptively selects features, but also dynamically determines fusion weights for saliency value combination from different feature subspaces. The effectiveness of the proposed CMIL model is demonstrated by its use in horror image recognition on two large-scale image sets collected from the Internet.
Hypoglycemia alarm enhancement using data fusion.

PubMed

Skladnev, Victor N; Tarnavskii, Stanislav; McGregor, Thomas; Ghevondian, Nejhdeh; Gourlay, Steve; Jones, Timothy W

2010-01-01

The acceptance of closed-loop blood glucose (BG) control using continuous glucose monitoring systems (CGMS) is likely to improve with enhanced performance of their integral hypoglycemia alarms. This article presents an in silico analysis (based on clinical data) of a modeled CGMS alarm system with trained thresholds on type 1 diabetes mellitus (T1DM) patients that is augmented by sensor fusion from a prototype hypoglycemia alarm system (HypoMon). This prototype alarm system is based on largely independent autonomic nervous system (ANS) response features. Alarm performance was modeled using overnight BG profiles recorded previously on 98 T1DM volunteers. These data included the corresponding ANS response features detected by HypoMon (AiMedics Pty. Ltd.) systems. CGMS data and alarms were simulated by applying a probabilistic model to these overnight BG profiles. The probabilistic model developed used a mean response delay of 7.1 minutes, measurement error offsets on each sample of +/- standard deviation (SD) = 4.5 mg/dl (0.25 mmol/liter), and vertical shifts (calibration offsets) of +/- SD = 19.8 mg/dl (1.1 mmol/liter). Modeling produced 90 to 100 simulated measurements per patient. Alarm systems for all analyses were optimized on a training set of 46 patients and evaluated on the test set of 56 patients. The split between the sets was based on enrollment dates. Optimization was based on detection accuracy but not time to detection for these analyses. The contribution of this form of data fusion to hypoglycemia alarm performance was evaluated by comparing the performance of the trained CGMS and fused data algorithms on the test set under the same evaluation conditions. The simulated addition of HypoMon data produced an improvement in CGMS hypoglycemia alarm performance of 10% at equal specificity. Sensitivity improved from 87% (CGMS as stand-alone measurement) to 97% for the enhanced alarm system. Specificity was maintained constant at 85%. Positive predictive values on the test set improved from 61 to 66% with negative predictive values improving from 96 to 99%. These enhancements were stable within sensitivity analyses. Sensitivity analyses also suggested larger performance increases at lower CGMS alarm performance levels. Autonomic nervous system response features provide complementary information suitable for fusion with CGMS data to enhance nocturnal hypoglycemia alarms. 2010 Diabetes Technology Society.
Deep learning architecture for iris recognition based on optimal Gabor filters and deep belief network

NASA Astrophysics Data System (ADS)

He, Fei; Han, Ye; Wang, Han; Ji, Jinchao; Liu, Yuanning; Ma, Zhiqiang

2017-03-01

Gabor filters are widely utilized to detect iris texture information in several state-of-the-art iris recognition systems. However, the proper Gabor kernels and the generative pattern of iris Gabor features need to be predetermined in application. The traditional empirical Gabor filters and shallow iris encoding ways are incapable of dealing with such complex variations in iris imaging including illumination, aging, deformation, and device variations. Thereby, an adaptive Gabor filter selection strategy and deep learning architecture are presented. We first employ particle swarm optimization approach and its binary version to define a set of data-driven Gabor kernels for fitting the most informative filtering bands, and then capture complex pattern from the optimal Gabor filtered coefficients by a trained deep belief network. A succession of comparative experiments validate that our optimal Gabor filters may produce more distinctive Gabor coefficients and our iris deep representations be more robust and stable than traditional iris Gabor codes. Furthermore, the depth and scales of the deep learning architecture are also discussed.
Prediction of chemical biodegradability using support vector classifier optimized with differential evolution.

PubMed

Cao, Qi; Leung, K M

2014-09-22

Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
Proceedings of the 2016 Clinical Nutrition Week Research Workshop-The Optimal Dose of Protein Provided to Critically Ill Patients.

PubMed

Heyland, Daren K; Rooyakers, Olav; Mourtzakis, Marina; Stapleton, Renee D

2017-02-01

Recent literature has created considerable confusion about the optimal amount of protein/amino acids that should be provided to the critically ill patient. In fact, the evidentiary basis that directly tries to answer this question is relatively small. As a clinical nutrition research community, there is an urgent need to develop the optimal methods to assess the impact of exogenous protein/amino acid administration in the intensive care unit setting. That assessment can be conducted at various levels: (1) impact on stress response pathways, (2) impact on muscle synthesis and protein balance, (3) impact on muscle mass and function, and (4) impact on the patient's recovery. The objective of this research workshop was to review current literature relating to protein/amino acid administration for the critically ill patient and clinical outcomes and to discuss the key measurement and methodological features of future studies that should be done to inform the optimal protein/amino acid dose provided to critically ill patients.

Fast imaging of live organisms with sculpted light sheets

NASA Astrophysics Data System (ADS)

Chmielewski, Aleksander K.; Kyrsting, Anders; Mahou, Pierre; Wayland, Matthew T.; Muresan, Leila; Evers, Jan Felix; Kaminski, Clemens F.

2015-04-01

Light-sheet microscopy is an increasingly popular technique in the life sciences due to its fast 3D imaging capability of fluorescent samples with low photo toxicity compared to confocal methods. In this work we present a new, fast, flexible and simple to implement method to optimize the illumination light-sheet to the requirement at hand. A telescope composed of two electrically tuneable lenses enables us to define thickness and position of the light-sheet independently but accurately within milliseconds, and therefore optimize image quality of the features of interest interactively. We demonstrated the practical benefit of this technique by 1) assembling large field of views from tiled single exposure each with individually optimized illumination settings; 2) sculpting the light-sheet to trace complex sample shapes within single exposures. This technique proved compatible with confocal line scanning detection, further improving image contrast and resolution. Finally, we determined the effect of light-sheet optimization in the context of scattering tissue, devising procedures for balancing image quality, field of view and acquisition speed.
Insights from Classifying Visual Concepts with Multiple Kernel Learning

PubMed Central

Binder, Alexander; Nakajima, Shinichi; Kloft, Marius; Müller, Christina; Samek, Wojciech; Brefeld, Ulf; Müller, Klaus-Robert; Kawanabe, Motoaki

2012-01-01

Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, 1-norm regularized MKL variants are often observed to be outperformed by an unweighted sum kernel. The main contributions of this paper are the following: we apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks from the application domain of computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum-kernel SVM and sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. Data sets (kernel matrices) as well as further information are available at http://doc.ml.tu-berlin.de/image_mkl/(Accessed 2012 Jun 25). PMID:22936970
Classification of interstitial lung disease patterns with topological texture features

NASA Astrophysics Data System (ADS)

Huber, Markus B.; Nagarajan, Mahesh; Leinsinger, Gerda; Ray, Lawrence A.; Wismüller, Axel

2010-03-01

Topological texture features were compared in their ability to classify morphological patterns known as 'honeycombing' that are considered indicative for the presence of fibrotic interstitial lung diseases in high-resolution computed tomography (HRCT) images. For 14 patients with known occurrence of honey-combing, a stack of 70 axial, lung kernel reconstructed images were acquired from HRCT chest exams. A set of 241 regions of interest of both healthy and pathological (89) lung tissue were identified by an experienced radiologist. Texture features were extracted using six properties calculated from gray-level co-occurrence matrices (GLCM), Minkowski Dimensions (MDs), and three Minkowski Functionals (MFs, e.g. MF.euler). A k-nearest-neighbor (k-NN) classifier and a Multilayer Radial Basis Functions Network (RBFN) were optimized in a 10-fold cross-validation for each texture vector, and the classification accuracy was calculated on independent test sets as a quantitative measure of automated tissue characterization. A Wilcoxon signed-rank test was used to compare two accuracy distributions and the significance thresholds were adjusted for multiple comparisons by the Bonferroni correction. The best classification results were obtained by the MF features, which performed significantly better than all the standard GLCM and MD features (p < 0.005) for both classifiers. The highest accuracy was found for MF.euler (97.5%, 96.6%; for the k-NN and RBFN classifier, respectively). The best standard texture features were the GLCM features 'homogeneity' (91.8%, 87.2%) and 'absolute value' (90.2%, 88.5%). The results indicate that advanced topological texture features can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods.
Updated Magmatic Flux Rate Estimates for the Hawaii Plume

NASA Astrophysics Data System (ADS)

Wessel, P.

2013-12-01

Several studies have estimated the magmatic flux rate along the Hawaiian-Emperor Chain using a variety of methods and arriving at different results. These flux rate estimates have weaknesses because of incomplete data sets and different modeling assumptions, especially for the youngest portion of the chain (<3 Ma). While they generally agree on the 1st order features, there is less agreement on the magnitude and relative size of secondary flux variations. Some of these differences arise from the use of different methodologies, but the significance of this variability is difficult to assess due to a lack of confidence bounds on the estimates obtained with these disparate methods. All methods introduce some error, but to date there has been little or no quantification of error estimates for the inferred melt flux, making an assessment problematic. Here we re-evaluate the melt flux for the Hawaii plume with the latest gridded data sets (SRTM30+ and FAA 21.1) using several methods, including the optimal robust separator (ORS) and directional median filtering techniques (DiM). We also compute realistic confidence limits on the results. In particular, the DiM technique was specifically developed to aid in the estimation of surface loads that are superimposed on wider bathymetric swells and it provides error estimates on the optimal residuals. Confidence bounds are assigned separately for the estimated surface load (obtained from the ORS regional/residual separation techniques) and the inferred subsurface volume (from gravity-constrained isostasy and plate flexure optimizations). These new and robust estimates will allow us to assess which secondary features in the resulting melt flux curve are significant and should be incorporated when correlating melt flux variations with other geophysical and geochemical observations.
Update on HCDstruct - A Tool for Hybrid Wing Body Conceptual Design and Structural Optimization

NASA Technical Reports Server (NTRS)

Gern, Frank H.

2015-01-01

HCDstruct is a Matlab® based software tool to rapidly build a finite element model for structural optimization of hybrid wing body (HWB) aircraft at the conceptual design level. The tool uses outputs from a Flight Optimization System (FLOPS) performance analysis together with a conceptual outer mold line of the vehicle, e.g. created by Vehicle Sketch Pad (VSP), to generate a set of MSC Nastran® bulk data files. These files can readily be used to perform a structural optimization and weight estimation using Nastran’s® Solution 200 multidisciplinary optimization solver. Initially developed at NASA Langley Research Center to perform increased fidelity conceptual level HWB centerbody structural analyses, HCDstruct has grown into a complete HWB structural sizing and weight estimation tool, including a fully flexible aeroelastic loads analysis. Recent upgrades to the tool include the expansion to a full wing tip-to-wing tip model for asymmetric analyses like engine out conditions and dynamic overswings, as well as a fully actuated trailing edge, featuring up to 15 independently actuated control surfaces and twin tails. Several example applications of the HCDstruct tool are presented.
Towards Efficient Decoding of Multiple Classes of Motor Imagery Limb Movements Based on EEG Spectral and Time Domain Descriptors.

PubMed

Samuel, Oluwarotimi Williams; Geng, Yanjuan; Li, Xiangxin; Li, Guanglin

2017-10-28

To control multiple degrees of freedom (MDoF) upper limb prostheses, pattern recognition (PR) of electromyogram (EMG) signals has been successfully applied. This technique requires amputees to provide sufficient EMG signals to decode their limb movement intentions (LMIs). However, amputees with neuromuscular disorder/high level amputation often cannot provide sufficient EMG control signals, and thus the applicability of the EMG-PR technique is limited especially to this category of amputees. As an alternative approach, electroencephalograph (EEG) signals recorded non-invasively from the brain have been utilized to decode the LMIs of humans. However, most of the existing EEG based limb movement decoding methods primarily focus on identifying limited classes of upper limb movements. In addition, investigation on EEG feature extraction methods for the decoding of multiple classes of LMIs has rarely been considered. Therefore, 32 EEG feature extraction methods (including 12 spectral domain descriptors (SDDs) and 20 time domain descriptors (TDDs)) were used to decode multiple classes of motor imagery patterns associated with different upper limb movements based on 64-channel EEG recordings. From the obtained experimental results, the best individual TDD achieved an accuracy of 67.05 ± 3.12% as against 87.03 ± 2.26% for the best SDD. By applying a linear feature combination technique, an optimal set of combined TDDs recorded an average accuracy of 90.68% while that of the SDDs achieved an accuracy of 99.55% which were significantly higher than those of the individual TDD and SDD at p < 0.05. Our findings suggest that optimal feature set combination would yield a relatively high decoding accuracy that may improve the clinical robustness of MDoF neuroprosthesis. The study was approved by the ethics committee of Institutional Review Board of Shenzhen Institutes of Advanced Technology, and the reference number is SIAT-IRB-150515-H0077.
An Efficient Method Coupling Kernel Principal Component Analysis with Adjoint-Based Optimal Control and Its Goal-Oriented Extensions

NASA Astrophysics Data System (ADS)

Thimmisetty, C.; Talbot, C.; Tong, C. H.; Chen, X.

2016-12-01

The representativeness of available data poses a significant fundamental challenge to the quantification of uncertainty in geophysical systems. Furthermore, the successful application of machine learning methods to geophysical problems involving data assimilation is inherently constrained by the extent to which obtainable data represent the problem considered. We show how the adjoint method, coupled with optimization based on methods of machine learning, can facilitate the minimization of an objective function defined on a space of significantly reduced dimension. By considering uncertain parameters as constituting a stochastic process, the Karhunen-Loeve expansion and its nonlinear extensions furnish an optimal basis with respect to which optimization using L-BFGS can be carried out. In particular, we demonstrate that kernel PCA can be coupled with adjoint-based optimal control methods to successfully determine the distribution of material parameter values for problems in the context of channelized deformable media governed by the equations of linear elasticity. Since certain subsets of the original data are characterized by different features, the convergence rate of the method in part depends on, and may be limited by, the observations used to furnish the kernel principal component basis. By determining appropriate weights for realizations of the stochastic random field, then, one may accelerate the convergence of the method. To this end, we present a formulation of Weighted PCA combined with a gradient-based means using automatic differentiation to iteratively re-weight observations concurrent with the determination of an optimal reduced set control variables in the feature space. We demonstrate how improvements in the accuracy and computational efficiency of the weighted linear method can be achieved over existing unweighted kernel methods, and discuss nonlinear extensions of the algorithm.
A data mining framework for time series estimation.

PubMed

Hu, Xiao; Xu, Peng; Wu, Shaozhi; Asgari, Shadnaz; Bergsneider, Marvin

2010-04-01

Time series estimation techniques are usually employed in biomedical research to derive variables less accessible from a set of related and more accessible variables. These techniques are traditionally built from systems modeling approaches including simulation, blind decovolution, and state estimation. In this work, we define target time series (TTS) and its related time series (RTS) as the output and input of a time series estimation process, respectively. We then propose a novel data mining framework for time series estimation when TTS and RTS represent different sets of observed variables from the same dynamic system. This is made possible by mining a database of instances of TTS, its simultaneously recorded RTS, and the input/output dynamic models between them. The key mining strategy is to formulate a mapping function for each TTS-RTS pair in the database that translates a feature vector extracted from RTS to the dissimilarity between true TTS and its estimate from the dynamic model associated with the same TTS-RTS pair. At run time, a feature vector is extracted from an inquiry RTS and supplied to the mapping function associated with each TTS-RTS pair to calculate a dissimilarity measure. An optimal TTS-RTS pair is then selected by analyzing these dissimilarity measures. The associated input/output model of the selected TTS-RTS pair is then used to simulate the TTS given the inquiry RTS as an input. An exemplary implementation was built to address a biomedical problem of noninvasive intracranial pressure assessment. The performance of the proposed method was superior to that of a simple training-free approach of finding the optimal TTS-RTS pair by a conventional similarity-based search on RTS features. 2009 Elsevier Inc. All rights reserved.
Data classification using metaheuristic Cuckoo Search technique for Levenberg Marquardt back propagation (CSLM) algorithm

NASA Astrophysics Data System (ADS)

Nawi, Nazri Mohd.; Khan, Abdullah; Rehman, M. Z.

2015-05-01

A nature inspired behavior metaheuristic techniques which provide derivative-free solutions to solve complex problems. One of the latest additions to the group of nature inspired optimization procedure is Cuckoo Search (CS) algorithm. Artificial Neural Network (ANN) training is an optimization task since it is desired to find optimal weight set of a neural network in training process. Traditional training algorithms have some limitation such as getting trapped in local minima and slow convergence rate. This study proposed a new technique CSLM by combining the best features of two known algorithms back-propagation (BP) and Levenberg Marquardt algorithm (LM) for improving the convergence speed of ANN training and avoiding local minima problem by training this network. Some selected benchmark classification datasets are used for simulation. The experiment result show that the proposed cuckoo search with Levenberg Marquardt algorithm has better performance than other algorithm used in this study.
Automated detection of neovascularization for proliferative diabetic retinopathy screening.

PubMed

Roychowdhury, Sohini; Koozekanani, Dara D; Parhi, Keshab K

2016-08-01

Neovascularization is the primary manifestation of proliferative diabetic retinopathy (PDR) that can lead to acquired blindness. This paper presents a novel method that classifies neovascularizations in the 1-optic disc (OD) diameter region (NVD) and elsewhere (NVE) separately to achieve low false positive rates of neovascularization classification. First, the OD region and blood vessels are extracted. Next, the major blood vessel segments in the 1-OD diameter region are classified for NVD, and minor blood vessel segments elsewhere are classified for NVE. For NVD and NVE classifications, optimal region-based feature sets of 10 and 6 features, respectively, are used. The proposed method achieves classification sensitivity, specificity and accuracy for NVD and NVE of 74%, 98.2%, 87.6%, and 61%, 97.5%, 92.1%, respectively. Also, the proposed method achieves 86.4% sensitivity and 76% specificity for screening images with PDR from public and local data sets. Thus, the proposed NVD and NVE detection methods can play a key role in automated screening and prioritization of patients with diabetic retinopathy.
An approximation algorithm for the Noah's Ark problem with random feature loss.

PubMed

Hickey, Glenn; Blanchette, Mathieu; Carmi, Paz; Maheshwari, Anil; Zeh, Norbert

2011-01-01

The phylogenetic diversity (PD) of a set of species is a measure of their evolutionary distinctness based on a phylogenetic tree. PD is increasingly being adopted as an index of biodiversity in ecological conservation projects. The Noah's Ark Problem (NAP) is an NP-Hard optimization problem that abstracts a fundamental conservation challenge in asking to maximize the expected PD of a set of taxa given a fixed budget, where each taxon is associated with a cost of conservation and a probability of extinction. Only simplified instances of the problem, where one or more parameters are fixed as constants, have as of yet been addressed in the literature. Furthermore, it has been argued that PD is not an appropriate metric for models that allow information to be lost along paths in the tree. We therefore generalize the NAP to incorporate a proposed model of feature loss according to an exponential distribution and term this problem NAP with Loss (NAPL). In this paper, we present a pseudopolynomial time approximation scheme for NAPL.
Spectral-Spatial Shared Linear Regression for Hyperspectral Image Classification.

PubMed

Haoliang Yuan; Yuan Yan Tang

2017-04-01

Classification of the pixels in hyperspectral image (HSI) is an important task and has been popularly applied in many practical applications. Its major challenge is the high-dimensional small-sized problem. To deal with this problem, lots of subspace learning (SL) methods are developed to reduce the dimension of the pixels while preserving the important discriminant information. Motivated by ridge linear regression (RLR) framework for SL, we propose a spectral-spatial shared linear regression method (SSSLR) for extracting the feature representation. Comparing with RLR, our proposed SSSLR has the following two advantages. First, we utilize a convex set to explore the spatial structure for computing the linear projection matrix. Second, we utilize a shared structure learning model, which is formed by original data space and a hidden feature space, to learn a more discriminant linear projection matrix for classification. To optimize our proposed method, an efficient iterative algorithm is proposed. Experimental results on two popular HSI data sets, i.e., Indian Pines and Salinas demonstrate that our proposed methods outperform many SL methods.
Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals.

PubMed

Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Mubin, Marizan; Saad, Ismail

2016-01-01

In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak event-related EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.
Hyperspectral remote sensing image retrieval system using spectral and texture features.

PubMed

Zhang, Jing; Geng, Wenhao; Liang, Xi; Li, Jiafeng; Zhuo, Li; Zhou, Qianlan

2017-06-01

Although many content-based image retrieval systems have been developed, few studies have focused on hyperspectral remote sensing images. In this paper, a hyperspectral remote sensing image retrieval system based on spectral and texture features is proposed. The main contributions are fourfold: (1) considering the "mixed pixel" in the hyperspectral image, endmembers as spectral features are extracted by an improved automatic pixel purity index algorithm, then the texture features are extracted with the gray level co-occurrence matrix; (2) similarity measurement is designed for the hyperspectral remote sensing image retrieval system, in which the similarity of spectral features is measured with the spectral information divergence and spectral angle match mixed measurement and in which the similarity of textural features is measured with Euclidean distance; (3) considering the limited ability of the human visual system, the retrieval results are returned after synthesizing true color images based on the hyperspectral image characteristics; (4) the retrieval results are optimized by adjusting the feature weights of similarity measurements according to the user's relevance feedback. The experimental results on NASA data sets can show that our system can achieve comparable superior retrieval performance to existing hyperspectral analysis schemes.
Image processing based automatic diagnosis of glaucoma using wavelet features of segmented optic disc from fundus image.

PubMed

Singh, Anushikha; Dutta, Malay Kishore; ParthaSarathi, M; Uher, Vaclav; Burget, Radim

2016-02-01

Glaucoma is a disease of the retina which is one of the most common causes of permanent blindness worldwide. This paper presents an automatic image processing based method for glaucoma diagnosis from the digital fundus image. In this paper wavelet feature extraction has been followed by optimized genetic feature selection combined with several learning algorithms and various parameter settings. Unlike the existing research works where the features are considered from the complete fundus or a sub image of the fundus, this work is based on feature extraction from the segmented and blood vessel removed optic disc to improve the accuracy of identification. The experimental results presented in this paper indicate that the wavelet features of the segmented optic disc image are clinically more significant in comparison to features of the whole or sub fundus image in the detection of glaucoma from fundus image. Accuracy of glaucoma identification achieved in this work is 94.7% and a comparison with existing methods of glaucoma detection from fundus image indicates that the proposed approach has improved accuracy of classification. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Parallel computation of GA search for the artery shape determinants with CFD

NASA Astrophysics Data System (ADS)

Himeno, M.; Noda, S.; Fukasaku, K.; Himeno, R.

2010-06-01

We studied which factors play important role to determine the shape of arteries at the carotid artery bifurcation by performing multi-objective optimization with computation fluid dynamics (CFD) and the genetic algorithm (GA). To perform it, the most difficult problem is how to reduce turn-around time of the GA optimization with 3D unsteady computation of blood flow. We devised two levels of parallel computation method with the following features: level 1: parallel CFD computation with appropriate number of cores; level 2: parallel jobs generated by "master", which finds quickly available job cue and dispatches jobs, to reduce turn-around time. As a result, the turn-around time of one GA trial, which would have taken 462 days with one core, was reduced to less than two days on RIKEN supercomputer system, RICC, with 8192 cores. We performed a multi-objective optimization to minimize the maximum mean WSS and to minimize the sum of circumference for four different shapes and obtained a set of trade-off solutions for each shape. In addition, we found that the carotid bulb has the feature of the minimum local mean WSS and minimum local radius. We confirmed that our method is effective for examining determinants of artery shapes.
Single Neuron Optimization as a Basis for Accurate Biophysical Modeling: The Case of Cerebellar Granule Cells.

PubMed

Masoli, Stefano; Rizza, Martina F; Sgritta, Martina; Van Geit, Werner; Schürmann, Felix; D'Angelo, Egidio

2017-01-01

In realistic neuronal modeling, once the ionic channel complement has been defined, the maximum ionic conductance (G i-max ) values need to be tuned in order to match the firing pattern revealed by electrophysiological recordings. Recently, selection/mutation genetic algorithms have been proposed to efficiently and automatically tune these parameters. Nonetheless, since similar firing patterns can be achieved through different combinations of G i-max values, it is not clear how well these algorithms approximate the corresponding properties of real cells. Here we have evaluated the issue by exploiting a unique opportunity offered by the cerebellar granule cell (GrC), which is electrotonically compact and has therefore allowed the direct experimental measurement of ionic currents. Previous models were constructed using empirical tuning of G i-max values to match the original data set. Here, by using repetitive discharge patterns as a template, the optimization procedure yielded models that closely approximated the experimental G i-max values. These models, in addition to repetitive firing, captured additional features, including inward rectification, near-threshold oscillations, and resonance, which were not used as features. Thus, parameter optimization using genetic algorithms provided an efficient modeling strategy for reconstructing the biophysical properties of neurons and for the subsequent reconstruction of large-scale neuronal network models.
Texture classification of normal tissues in computed tomography using Gabor filters

NASA Astrophysics Data System (ADS)

Dettori, Lucia; Bashir, Alia; Hasemann, Julie

2007-03-01

The research presented in this article is aimed at developing an automated imaging system for classification of normal tissues in medical images obtained from Computed Tomography (CT) scans. Texture features based on a bank of Gabor filters are used to classify the following tissues of interests: liver, spleen, kidney, aorta, trabecular bone, lung, muscle, IP fat, and SQ fat. The approach consists of three steps: convolution of the regions of interest with a bank of 32 Gabor filters (4 frequencies and 8 orientations), extraction of two Gabor texture features per filter (mean and standard deviation), and creation of a Classification and Regression Tree-based classifier that automatically identifies the various tissues. The data set used consists of approximately 1000 DIACOM images from normal chest and abdominal CT scans of five patients. The regions of interest were labeled by expert radiologists. Optimal trees were generated using two techniques: 10-fold cross-validation and splitting of the data set into a training and a testing set. In both cases, perfect classification rules were obtained provided enough images were available for training (~65%). All performance measures (sensitivity, specificity, precision, and accuracy) for all regions of interest were at 100%. This significantly improves previous results that used Wavelet, Ridgelet, and Curvelet texture features, yielding accuracy values in the 85%-98% range The Gabor filters' ability to isolate features at different frequencies and orientations allows for a multi-resolution analysis of texture essential when dealing with, at times, very subtle differences in the texture of tissues in CT scans.
Pilot study of a novel tool for input-free automated identification of transition zone prostate tumors using T2- and diffusion-weighted signal and textural features.

PubMed

Stember, Joseph N; Deng, Fang-Ming; Taneja, Samir S; Rosenkrantz, Andrew B

2014-08-01

To present results of a pilot study to develop software that identifies regions suspicious for prostate transition zone (TZ) tumor, free of user input. Eight patients with TZ tumors were used to develop the model by training a Naïve Bayes classifier to detect tumors based on selection of most accurate predictors among various signal and textural features on T2-weighted imaging (T2WI) and apparent diffusion coefficient (ADC) maps. Features tested as inputs were: average signal, signal standard deviation, energy, contrast, correlation, homogeneity and entropy (all defined on T2WI); and average ADC. A forward selection scheme was used on the remaining 20% of training set supervoxels to identify important inputs. The trained model was tested on a different set of ten patients, half with TZ tumors. In training cases, the software tiled the TZ with 4 × 4-voxel "supervoxels," 80% of which were used to train the classifier. Each of 100 iterations selected T2WI energy and average ADC, which therefore were deemed the optimal model input. The two-feature model was applied blindly to the separate set of test patients, again without operator input of suspicious foci. The software correctly predicted presence or absence of TZ tumor in all test patients. Furthermore, locations of predicted tumors corresponded spatially with locations of biopsies that had confirmed their presence. Preliminary findings suggest that this tool has potential to accurately predict TZ tumor presence and location, without operator input. © 2013 Wiley Periodicals, Inc.
Comparison of SVM, RF and ELM on an Electronic Nose for the Intelligent Evaluation of Paraffin Samples.

PubMed

Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan; Liu, Jingjing

2018-01-18

Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33-100%, and ELM, with an accuracy rate of 98.01-100%. For level assessment, the R² related to the training set was above 0.97 and the R² related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016-0.3494, lower than the error of 0.5-1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level.

Bayes classification of interferometric TOPSAR data

NASA Technical Reports Server (NTRS)

Michel, T. R.; Rodriguez, E.; Houshmand, B.; Carande, R.

1995-01-01

We report the Bayes classification of terrain types at different sites using airborne interferometric synthetic aperture radar (INSAR) data. A Gaussian maximum likelihood classifier was applied on multidimensional observations derived from the SAR intensity, the terrain elevation model, and the magnitude of the interferometric correlation. Training sets for forested, urban, agricultural, or bare areas were obtained either by selecting samples with known ground truth, or by k-means clustering of random sets of samples uniformly distributed across all sites, and subsequent assignments of these clusters using ground truth. The accuracy of the classifier was used to optimize the discriminating efficiency of the set of features that was chosen. The most important features include the SAR intensity, a canopy penetration depth model, and the terrain slope. We demonstrate the classifier's performance across sites using a unique set of training classes for the four main terrain categories. The scenes examined include San Francisco (CA) (predominantly urban and water), Mount Adams (WA) (forested with clear cuts), Pasadena (CA) (urban with mountains), and Antioch Hills (CA) (water, swamps, fields). Issues related to the effects of image calibration and the robustness of the classification to calibration errors are explored. The relative performance of single polarization Interferometric data classification is contrasted against classification schemes based on polarimetric SAR data.
Sensitivity computation of the ell1 minimization problem and its application to dictionary design of ill-posed problems

NASA Astrophysics Data System (ADS)

Horesh, L.; Haber, E.

2009-09-01

The ell1 minimization problem has been studied extensively in the past few years. Recently, there has been a growing interest in its application for inverse problems. Most studies have concentrated in devising ways for sparse representation of a solution using a given prototype dictionary. Very few studies have addressed the more challenging problem of optimal dictionary construction, and even these were primarily devoted to the simplistic sparse coding application. In this paper, sensitivity analysis of the inverse solution with respect to the dictionary is presented. This analysis reveals some of the salient features and intrinsic difficulties which are associated with the dictionary design problem. Equipped with these insights, we propose an optimization strategy that alleviates these hurdles while utilizing the derived sensitivity relations for the design of a locally optimal dictionary. Our optimality criterion is based on local minimization of the Bayesian risk, given a set of training models. We present a mathematical formulation and an algorithmic framework to achieve this goal. The proposed framework offers the design of dictionaries for inverse problems that incorporate non-trivial, non-injective observation operators, where the data and the recovered parameters may reside in different spaces. We test our algorithm and show that it yields improved dictionaries for a diverse set of inverse problems in geophysics and medical imaging.
Method of interplanetary trajectory optimization for the spacecraft with low thrust and swing-bys

NASA Astrophysics Data System (ADS)

Konstantinov, M. S.; Thein, M.

2017-07-01

The method developed to avoid the complexity of solving the multipoint boundary value problem while optimizing interplanetary trajectories of the spacecraft with electric propulsion and a sequence of swing-bys is presented in the paper. This method is based on the use of the preliminary problem solutions for the impulsive trajectories. The preliminary problem analyzed at the first stage of the study is formulated so that the analysis and optimization of a particular flight path is considered as the unconstrained minimum in the space of the selectable parameters. The existing methods can effectively solve this problem and make it possible to identify rational flight paths (the sequence of swing-bys) to receive the initial approximation for the main characteristics of the flight path (dates, values of the hyperbolic excess velocity, etc.). These characteristics can be used to optimize the trajectory of the spacecraft with electric propulsion. The special feature of the work is the introduction of the second (intermediate) stage of the research. At this stage some characteristics of the analyzed flight path (e.g. dates of swing-bys) are fixed and the problem is formulated so that the trajectory of the spacecraft with electric propulsion is optimized on selected sites of the flight path. The end-to-end optimization is carried out at the third (final) stage of the research. The distinctive feature of this stage is the analysis of the full set of optimal conditions for the considered flight path. The analysis of the characteristics of the optimal flight trajectories to Jupiter with Earth, Venus and Mars swing-bys for the spacecraft with electric propulsion are presented. The paper shows that the spacecraft weighing more than 7150 kg can be delivered into the vicinity of Jupiter along the trajectory with two Earth swing-bys by use of the space transportation system based on the "Angara A5" rocket launcher, the chemical upper stage "KVTK" and the electric propulsion system with input electrical power of 100 kW.
Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

DOE PAGES

Zhang, Qian; Jun, Se -Ran; Leuze, Michael; ...

2017-01-19

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Qian; Jun, Se -Ran; Leuze, Michael

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

PubMed Central

Zhang, Qian; Jun, Se-Ran; Leuze, Michael; Ussery, David; Nookaew, Intawat

2017-01-01

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. PMID:28102365
Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties

PubMed Central

2011-01-01

Background Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors. Results This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m=22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m=28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc. Conclusions The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences. PMID:21342579
Fast, Safe, Propellant-Efficient Spacecraft Motion Planning Under Clohessy-Wiltshire-Hill Dynamics

NASA Technical Reports Server (NTRS)

Starek, Joseph A.; Schmerling, Edward; Maher, Gabriel D.; Barbee, Brent W.; Pavone, Marco

2016-01-01

This paper presents a sampling-based motion planning algorithm for real-time and propellant-optimized autonomous spacecraft trajectory generation in near-circular orbits. Specifically, this paper leverages recent algorithmic advances in the field of robot motion planning to the problem of impulsively actuated, propellant- optimized rendezvous and proximity operations under the Clohessy-Wiltshire-Hill dynamics model. The approach calls upon a modified version of the FMT* algorithm to grow a set of feasible trajectories over a deterministic, low-dispersion set of sample points covering the free state space. To enforce safety, the tree is only grown over the subset of actively safe samples, from which there exists a feasible one-burn collision-avoidance maneuver that can safely circularize the spacecraft orbit along its coasting arc under a given set of potential thruster failures. Key features of the proposed algorithm include 1) theoretical guarantees in terms of trajectory safety and performance, 2) amenability to real-time implementation, and 3) generality, in the sense that a large class of constraints can be handled directly. As a result, the proposed algorithm offers the potential for widespread application, ranging from on-orbit satellite servicing to orbital debris removal and autonomous inspection missions.
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

PubMed

Sun, Lei; Wang, Jun; Wei, Jinmao

2017-03-14

The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.
Developing a clinical utility framework to evaluate prediction models in radiogenomics

NASA Astrophysics Data System (ADS)

Wu, Yirong; Liu, Jie; Munoz del Rio, Alejandro; Page, David C.; Alagoz, Oguzhan; Peissig, Peggy; Onitilo, Adedayo A.; Burnside, Elizabeth S.

2015-03-01

Combining imaging and genetic information to predict disease presence and behavior is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics techniques have not been established. We aim to develop a clinical decision framework based on utility analysis to assess prediction models for breast cancer. Our data comes from a retrospective case-control study, collecting Gail model risk factors, genetic variants (single nucleotide polymorphisms-SNPs), and mammographic features in Breast Imaging Reporting and Data System (BI-RADS) lexicon. We first constructed three logistic regression models built on different sets of predictive features: (1) Gail, (2) Gail+SNP, and (3) Gail+SNP+BI-RADS. Then, we generated ROC curves for three models. After we assigned utility values for each category of findings (true negative, false positive, false negative and true positive), we pursued optimal operating points on ROC curves to achieve maximum expected utility (MEU) of breast cancer diagnosis. We used McNemar's test to compare the predictive performance of the three models. We found that SNPs and BI-RADS features augmented the baseline Gail model in terms of the area under ROC curve (AUC) and MEU. SNPs improved sensitivity of the Gail model (0.276 vs. 0.147) and reduced specificity (0.855 vs. 0.912). When additional mammographic features were added, sensitivity increased to 0.457 and specificity to 0.872. SNPs and mammographic features played a significant role in breast cancer risk estimation (p-value < 0.001). Our decision framework comprising utility analysis and McNemar's test provides a novel framework to evaluate prediction models in the realm of radiogenomics.
Real-time evaluation of polyphenol oxidase (PPO) activity in lychee pericarp based on weighted combination of spectral data and image features as determined by fuzzy neural network.

PubMed

Yang, Yi-Chao; Sun, Da-Wen; Wang, Nan-Nan; Xie, Anguo

2015-07-01

A novel method of using hyperspectral imaging technique with the weighted combination of spectral data and image features by fuzzy neural network (FNN) was proposed for real-time prediction of polyphenol oxidase (PPO) activity in lychee pericarp. Lychee images were obtained by a hyperspectral reflectance imaging system operating in the range of 400-1000nm. A support vector machine-recursive feature elimination (SVM-RFE) algorithm was applied to eliminating variables with no or little information for the prediction from all bands, resulting in a reduced set of optimal wavelengths. Spectral information at the optimal wavelengths and image color features were then used respectively to develop calibration models for the prediction of PPO in pericarp during storage, and the results of two models were compared. In order to improve the prediction accuracy, a decision strategy was developed based on weighted combination of spectral data and image features, in which the weights were determined by FNN for a better estimation of PPO activity. The results showed that the combined decision model was the best among all of the calibration models, with high R(2) values of 0.9117 and 0.9072 and low RMSEs of 0.45% and 0.459% for calibration and prediction, respectively. These results demonstrate that the proposed weighted combined decision method has great potential for improving model performance. The proposed technique could be used for a better prediction of other internal and external quality attributes of fruits. Copyright © 2015 Elsevier B.V. All rights reserved.
Rational manipulation of digital EEG: pearls and pitfalls.

PubMed

Seneviratne, Udaya

2014-12-01

The advent of digital EEG has provided greater flexibility and more opportunities in data analysis to optimize the diagnostic yield. Changing the filter settings, sensitivity, montages, and time-base are possible rational manipulations to achieve this goal. The options to use polygraphy, video, and quantification are additional useful features. Aliasing and loss of data are potential pitfalls in the use of digital EEG. This review illustrates some common clinical scenarios where rational manipulations can enhance the diagnostic EEG yield and potential pitfalls in the process.
Incorporation of local structure into kriging models for the prediction of atomistic properties in the water decamer.

PubMed

Davie, Stuart J; Di Pasquale, Nicodemo; Popelier, Paul L A

2016-10-15

Machine learning algorithms have been demonstrated to predict atomistic properties approaching the accuracy of quantum chemical calculations at significantly less computational cost. Difficulties arise, however, when attempting to apply these techniques to large systems, or systems possessing excessive conformational freedom. In this article, the machine learning method kriging is applied to predict both the intra-atomic and interatomic energies, as well as the electrostatic multipole moments, of the atoms of a water molecule at the center of a 10 water molecule (decamer) cluster. Unlike previous work, where the properties of small water clusters were predicted using a molecular local frame, and where training set inputs (features) were based on atomic index, a variety of feature definitions and coordinate frames are considered here to increase prediction accuracy. It is shown that, for a water molecule at the center of a decamer, no single method of defining features or coordinate schemes is optimal for every property. However, explicitly accounting for the structure of the first solvation shell in the definition of the features of the kriging training set, and centring the coordinate frame on the atom-of-interest will, in general, return better predictions than models that apply the standard methods of feature definition, or a molecular coordinate frame. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
A Novel Unsupervised Segmentation Quality Evaluation Method for Remote Sensing Images

PubMed Central

Tang, Yunwei; Jing, Linhai; Ding, Haifeng

2017-01-01

The segmentation of a high spatial resolution remote sensing image is a critical step in geographic object-based image analysis (GEOBIA). Evaluating the performance of segmentation without ground truth data, i.e., unsupervised evaluation, is important for the comparison of segmentation algorithms and the automatic selection of optimal parameters. This unsupervised strategy currently faces several challenges in practice, such as difficulties in designing effective indicators and limitations of the spectral values in the feature representation. This study proposes a novel unsupervised evaluation method to quantitatively measure the quality of segmentation results to overcome these problems. In this method, multiple spectral and spatial features of images are first extracted simultaneously and then integrated into a feature set to improve the quality of the feature representation of ground objects. The indicators designed for spatial stratified heterogeneity and spatial autocorrelation are included to estimate the properties of the segments in this integrated feature set. These two indicators are then combined into a global assessment metric as the final quality score. The trade-offs of the combined indicators are accounted for using a strategy based on the Mahalanobis distance, which can be exhibited geometrically. The method is tested on two segmentation algorithms and three testing images. The proposed method is compared with two existing unsupervised methods and a supervised method to confirm its capabilities. Through comparison and visual analysis, the results verified the effectiveness of the proposed method and demonstrated the reliability and improvements of this method with respect to other methods. PMID:29064416
Detection of pigment network in dermatoscopy images using texture analysis

PubMed Central

Anantha, Murali; Moss, Randy H.; Stoecker, William V.

2011-01-01

Dermatoscopy, also known as dermoscopy or epiluminescence microscopy (ELM), is a non-invasive, in vivo technique, which permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. ELM offers a completely new range of visual features. One such prominent feature is the pigment network. Two texture-based algorithms are developed for the detection of pigment network. These methods are applicable to various texture patterns in dermatoscopy images, including patterns that lack fine lines such as cobblestone, follicular, or thickened network patterns. Two texture algorithms, Laws energy masks and the neighborhood gray-level dependence matrix (NGLDM) large number emphasis, were optimized on a set of 155 dermatoscopy images and compared. Results suggest superiority of Laws energy masks for pigment network detection in dermatoscopy images. For both methods, a texel width of 10 pixels or approximately 0.22 mm is found for dermatoscopy images. PMID:15249068
Study on Huizhou architecture of point cloud registration based on optimized ICP algorithm

NASA Astrophysics Data System (ADS)

Zhang, Runmei; Wu, Yulu; Zhang, Guangbin; Zhou, Wei; Tao, Yuqian

2018-03-01

In view of the current point cloud registration software has high hardware requirements, heavy workload and moltiple interactive definition, the source of software with better processing effect is not open, a two--step registration method based on normal vector distribution feature and coarse feature based iterative closest point (ICP) algorithm is proposed in this paper. This method combines fast point feature histogram (FPFH) algorithm, define the adjacency region of point cloud and the calculation model of the distribution of normal vectors, setting up the local coordinate system for each key point, and obtaining the transformation matrix to finish rough registration, the rough registration results of two stations are accurately registered by using the ICP algorithm. Experimental results show that, compared with the traditional ICP algorithm, the method used in this paper has obvious time and precision advantages for large amount of point clouds.
Fuel spill identification using solid-phase extraction and solid-phase microextraction. 1. Aviation turbine fuels.

PubMed

Lavine, B K; Brzozowski, D M; Ritter, J; Moores, A J; Mayfield, H T

2001-12-01

The water-soluble fraction of aviation jet fuels is examined using solid-phase extraction and solid-phase microextraction. Gas chromatographic profiles of solid-phase extracts and solid-phase microextracts of the water-soluble fraction of kerosene- and nonkerosene-based jet fuels reveal that each jet fuel possesses a unique profile. Pattern recognition analysis reveals fingerprint patterns within the data characteristic of fuel type. By using a novel genetic algorithm (GA) that emulates human pattern recognition through machine learning, it is possible to identify features characteristic of the chromatographic profile of each fuel class. The pattern recognition GA identifies a set of features that optimize the separation of the fuel classes in a plot of the two largest principal components of the data. Because principal components maximize variance, the bulk of the information encoded by the selected features is primarily about the differences between the fuel classes.
An Individual Finger Gesture Recognition System Based on Motion-Intent Analysis Using Mechanomyogram Signal

PubMed Central

Ding, Huijun; He, Qing; Zhou, Yongjin; Dan, Guo; Cui, Song

2017-01-01

Motion-intent-based finger gesture recognition systems are crucial for many applications such as prosthesis control, sign language recognition, wearable rehabilitation system, and human–computer interaction. In this article, a motion-intent-based finger gesture recognition system is designed to correctly identify the tapping of every finger for the first time. Two auto-event annotation algorithms are firstly applied and evaluated for detecting the finger tapping frame. Based on the truncated signals, the Wavelet packet transform (WPT) coefficients are calculated and compressed as the features, followed by a feature selection method that is able to improve the performance by optimizing the feature set. Finally, three popular classifiers including naive Bayes (NBC), K-nearest neighbor (KNN), and support vector machine (SVM) are applied and evaluated. The recognition accuracy can be achieved up to 94%. The design and the architecture of the system are presented with full system characterization results. PMID:29167655
Estimating local scaling properties for the classification of interstitial lung disease patterns

NASA Astrophysics Data System (ADS)

Huber, Markus B.; Nagarajan, Mahesh B.; Leinsinger, Gerda; Ray, Lawrence A.; Wismueller, Axel

2011-03-01

Local scaling properties of texture regions were compared in their ability to classify morphological patterns known as 'honeycombing' that are considered indicative for the presence of fibrotic interstitial lung diseases in high-resolution computed tomography (HRCT) images. For 14 patients with known occurrence of honeycombing, a stack of 70 axial, lung kernel reconstructed images were acquired from HRCT chest exams. 241 regions of interest of both healthy and pathological (89) lung tissue were identified by an experienced radiologist. Texture features were extracted using six properties calculated from gray-level co-occurrence matrices (GLCM), Minkowski Dimensions (MDs), and the estimation of local scaling properties with Scaling Index Method (SIM). A k-nearest-neighbor (k-NN) classifier and a Multilayer Radial Basis Functions Network (RBFN) were optimized in a 10-fold cross-validation for each texture vector, and the classification accuracy was calculated on independent test sets as a quantitative measure of automated tissue characterization. A Wilcoxon signed-rank test was used to compare two accuracy distributions including the Bonferroni correction. The best classification results were obtained by the set of SIM features, which performed significantly better than all the standard GLCM and MD features (p < 0.005) for both classifiers with the highest accuracy (94.1%, 93.7%; for the k-NN and RBFN classifier, respectively). The best standard texture features were the GLCM features 'homogeneity' (91.8%, 87.2%) and 'absolute value' (90.2%, 88.5%). The results indicate that advanced texture features using local scaling properties can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods.
Sparse feature learning for instrument identification: Effects of sampling and pooling methods.

PubMed

Han, Yoonchang; Lee, Subin; Nam, Juhan; Lee, Kyogu

2016-05-01

Feature learning for music applications has recently received considerable attention from many researchers. This paper reports on the sparse feature learning algorithm for musical instrument identification, and in particular, focuses on the effects of the frame sampling techniques for dictionary learning and the pooling methods for feature aggregation. To this end, two frame sampling techniques are examined that are fixed and proportional random sampling. Furthermore, the effect of using onset frame was analyzed for both of proposed sampling methods. Regarding summarization of the feature activation, a standard deviation pooling method is used and compared with the commonly used max- and average-pooling techniques. Using more than 47 000 recordings of 24 instruments from various performers, playing styles, and dynamics, a number of tuning parameters are experimented including the analysis frame size, the dictionary size, and the type of frequency scaling as well as the different sampling and pooling methods. The results show that the combination of proportional sampling and standard deviation pooling achieve the best overall performance of 95.62% while the optimal parameter set varies among the instrument classes.

Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

PubMed

Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

2012-10-01

The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
A Ground-Based 2-Micron DIAL System to Profile Tropospheric CO2 and Aerosol Distributions for Atmospheric Studies

NASA Technical Reports Server (NTRS)

Ismail, Syed; Koch, Grady; Abedin, Nurul; Refaat, Tamer; Rubio, Manuel; Davis, Kenneth; Miller, Charles; Singh, Upendra

2006-01-01

System will operate at a temperature insensitive CO2 line (2050.967 nm) with side-line tuning and off-set locking. Demonstrated an order of magnitude improvement in laser line locking needed for high precision measurements, side-line operation, and simultaneously double pulsing and line locking. Detector testing of phototransistor has demonstrated sensitivity to aerosol features over long distances in the atmosphere and resolve features approx. 100m. Optical systems that collect light onto small area detectors work well. Receiver optical designs are being optimized and data acquisition systems developed. CO2 line parameter characterization in progress In situ sensor calibration in progress for validation of DIAL CO2 system.
SVM based colon polyps classifier in a wireless active stereo endoscope.

PubMed

Ayoub, J; Granado, B; Mhanna, Y; Romain, O

2010-01-01

This work focuses on the recognition of three-dimensional colon polyps captured by an active stereo vision sensor. The detection algorithm consists of SVM classifier trained on robust feature descriptors. The study is related to Cyclope, this prototype sensor allows real time 3D object reconstruction and continues to be optimized technically to improve its classification task by differentiation between hyperplastic and adenomatous polyps. Experimental results were encouraging and show correct classification rate of approximately 97%. The work contains detailed statistics about the detection rate and the computing complexity. Inspired by intensity histogram, the work shows a new approach that extracts a set of features based on depth histogram and combines stereo measurement with SVM classifiers to correctly classify benign and malignant polyps.
Predicting Presynaptic and Postsynaptic Neurotoxins by Developing Feature Selection Technique

PubMed Central

Yang, Yunchun; Zhang, Chunmei; Chen, Rong; Huang, Po

2017-01-01

Presynaptic and postsynaptic neurotoxins are proteins which act at the presynaptic and postsynaptic membrane. Correctly predicting presynaptic and postsynaptic neurotoxins will provide important clues for drug-target discovery and drug design. In this study, we developed a theoretical method to discriminate presynaptic neurotoxins from postsynaptic neurotoxins. A strict and objective benchmark dataset was constructed to train and test our proposed model. The dipeptide composition was used to formulate neurotoxin samples. The analysis of variance (ANOVA) was proposed to find out the optimal feature set which can produce the maximum accuracy. In the jackknife cross-validation test, the overall accuracy of 94.9% was achieved. We believe that the proposed model will provide important information to study neurotoxins. PMID:28303250
Clustering methods for the optimization of atomic cluster structure

NASA Astrophysics Data System (ADS)

Bagattini, Francesco; Schoen, Fabio; Tigli, Luca

2018-04-01

In this paper, we propose a revised global optimization method and apply it to large scale cluster conformation problems. In the 1990s, the so-called clustering methods were considered among the most efficient general purpose global optimization techniques; however, their usage has quickly declined in recent years, mainly due to the inherent difficulties of clustering approaches in large dimensional spaces. Inspired from the machine learning literature, we redesigned clustering methods in order to deal with molecular structures in a reduced feature space. Our aim is to show that by suitably choosing a good set of geometrical features coupled with a very efficient descent method, an effective optimization tool is obtained which is capable of finding, with a very high success rate, all known putative optima for medium size clusters without any prior information, both for Lennard-Jones and Morse potentials. The main result is that, beyond being a reliable approach, the proposed method, based on the idea of starting a computationally expensive deep local search only when it seems worth doing so, is capable of saving a huge amount of searches with respect to an analogous algorithm which does not employ a clustering phase. In this paper, we are not claiming the superiority of the proposed method compared to specific, refined, state-of-the-art procedures, but rather indicating a quite straightforward way to save local searches by means of a clustering scheme working in a reduced variable space, which might prove useful when included in many modern methods.
Learning-based 3D surface optimization from medical image reconstruction

NASA Astrophysics Data System (ADS)

Wei, Mingqiang; Wang, Jun; Guo, Xianglin; Wu, Huisi; Xie, Haoran; Wang, Fu Lee; Qin, Jing

2018-04-01

Mesh optimization has been studied from the graphical point of view: It often focuses on 3D surfaces obtained by optical and laser scanners. This is despite the fact that isosurfaced meshes of medical image reconstruction suffer from both staircases and noise: Isotropic filters lead to shape distortion, while anisotropic ones maintain pseudo-features. We present a data-driven method for automatically removing these medical artifacts while not introducing additional ones. We consider mesh optimization as a combination of vertex filtering and facet filtering in two stages: Offline training and runtime optimization. In specific, we first detect staircases based on the scanning direction of CT/MRI scanners, and design a staircase-sensitive Laplacian filter (vertex-based) to remove them; and then design a unilateral filtered facet normal descriptor (uFND) for measuring the geometry features around each facet of a given mesh, and learn the regression functions from a set of medical meshes and their high-resolution reference counterparts for mapping the uFNDs to the facet normals of the reference meshes (facet-based). At runtime, we first perform staircase-sensitive Laplacian filter on an input MC (Marching Cubes) mesh, and then filter the mesh facet normal field using the learned regression functions, and finally deform it to match the new normal field for obtaining a compact approximation of the high-resolution reference model. Tests show that our algorithm achieves higher quality results than previous approaches regarding surface smoothness and surface accuracy.
Toward Optimal Manifold Hashing via Discrete Locally Linear Embedding.

PubMed

Rongrong Ji; Hong Liu; Liujuan Cao; Di Liu; Yongjian Wu; Feiyue Huang

2017-11-01

Binary code learning, also known as hashing, has received increasing attention in large-scale visual search. By transforming high-dimensional features to binary codes, the original Euclidean distance is approximated via Hamming distance. More recently, it is advocated that it is the manifold distance, rather than the Euclidean distance, that should be preserved in the Hamming space. However, it retains as an open problem to directly preserve the manifold structure by hashing. In particular, it first needs to build the local linear embedding in the original feature space, and then quantize such embedding to binary codes. Such a two-step coding is problematic and less optimized. Besides, the off-line learning is extremely time and memory consuming, which needs to calculate the similarity matrix of the original data. In this paper, we propose a novel hashing algorithm, termed discrete locality linear embedding hashing (DLLH), which well addresses the above challenges. The DLLH directly reconstructs the manifold structure in the Hamming space, which learns optimal hash codes to maintain the local linear relationship of data points. To learn discrete locally linear embeddingcodes, we further propose a discrete optimization algorithm with an iterative parameters updating scheme. Moreover, an anchor-based acceleration scheme, termed Anchor-DLLH, is further introduced, which approximates the large similarity matrix by the product of two low-rank matrices. Experimental results on three widely used benchmark data sets, i.e., CIFAR10, NUS-WIDE, and YouTube Face, have shown superior performance of the proposed DLLH over the state-of-the-art approaches.
Automated 3D Phenotype Analysis Using Data Mining

PubMed Central

Plyusnin, Ilya; Evans, Alistair R.; Karme, Aleksis; Gionis, Aristides; Jernvall, Jukka

2008-01-01

The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases. PMID:18320060
Delineation and geometric modeling of road networks

NASA Astrophysics Data System (ADS)

Poullis, Charalambos; You, Suya

In this work we present a novel vision-based system for automatic detection and extraction of complex road networks from various sensor resources such as aerial photographs, satellite images, and LiDAR. Uniquely, the proposed system is an integrated solution that merges the power of perceptual grouping theory (Gabor filtering, tensor voting) and optimized segmentation techniques (global optimization using graph-cuts) into a unified framework to address the challenging problems of geospatial feature detection and classification. Firstly, the local precision of the Gabor filters is combined with the global context of the tensor voting to produce accurate classification of the geospatial features. In addition, the tensorial representation used for the encoding of the data eliminates the need for any thresholds, therefore removing any data dependencies. Secondly, a novel orientation-based segmentation is presented which incorporates the classification of the perceptual grouping, and results in segmentations with better defined boundaries and continuous linear segments. Finally, a set of gaussian-based filters are applied to automatically extract centerline information (magnitude, width and orientation). This information is then used for creating road segments and transforming them to their polygonal representations.
The Optimization of an eHealth Solution (Thought Spot) with Transition-Aged Youth in Postsecondary Settings: Participatory Design Research

PubMed Central

VanHeerwaarden, Nicole; Abi-Jaoude, Alexxa; Johnson, Andrew; Hollenberg, Elisa; Chaim, Gloria; Cleverley, Kristin; Eysenbach, Gunther; Henderson, Joanna; Levinson, Andrea; Robb, Janine; Sharpe, Sarah; Voineskos, Aristotle

2018-01-01

Background Seventy percent of lifetime cases of mental illness emerge before the age of 24 years, but many youth are unable to access the support and services they require in a timely and appropriate way. With most youth using the internet, electronic health (eHealth) interventions are promising tools for reaching this population. Through participatory design research (PDR) engagement methods, Thought Spot, a Web- and mobile-based platform, was redeveloped to facilitate access to mental health services by transition-aged youth (aged 16-29 years) in postsecondary settings. Objective The aim of this study was to describe the process of engaging with postsecondary students through the PDR approaches, with the ultimate goal of optimizing the Thought Spot platform. Methods Consistent with the PDR approaches, five student-led workshops, attended by 41 individuals, were facilitated to obtain feedback regarding the platform’s usability and functionality and its potential value in a postsecondary setting. Various creative engagement activities were delivered to gather experiences and opinions, including semistructured focus groups, questionnaires, personas, journey mapping, and a world café. Innovative technological features and refinements were also brainstormed during the workshops. Results By using PDR methods of engagement, participants knew that their ideas and recommendations would be applied. There was also an overall sense of respect and care integrated into each group, which facilitated an exchange of ideas and suggestions. Conclusions The process of engaging with students to redesign the Thought Spot platform through PDR has been effective. Findings from these workshops will significantly inform new technological features within the app to enable positive help-seeking behaviors among students. These behaviors will be further explored in the second phase that involves a randomized controlled trial. PMID:29510970
Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest

PubMed Central

Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

2018-01-01

Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.

PubMed

Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

2017-01-01

Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition

PubMed Central

Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

2017-01-01

Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987
DOE Office of Scientific and Technical Information (OSTI.GOV)

Eads, Damian Ryan; Rosten, Edward; Helmbold, David

The authors present BEAMER: a new spatially exploitative approach to learning object detectors which shows excellent results when applied to the task of detecting objects in greyscale aerial imagery in the presence of ambiguous and noisy data. There are four main contributions used to produce these results. First, they introduce a grammar-guided feature extraction system, enabling the exploration of a richer feature space while constraining the features to a useful subset. This is specified with a rule-based generative grammer crafted by a human expert. Second, they learn a classifier on this data using a newly proposed variant of AdaBoost whichmore » takes into account the spatially correlated nature of the data. Third, they perform another round of training to optimize the method of converting the pixel classifications generated by boosting into a high quality set of (x,y) locations. lastly, they carefully define three common problems in object detection and define two evaluation criteria that are tightly matched to these problems. Major strengths of this approach are: (1) a way of randomly searching a broad feature space, (2) its performance when evaluated on well-matched evaluation criteria, and (3) its use of the location prediction domain to learn object detectors as well as to generate detections that perform well on several tasks: object counting, tracking, and target detection. They demonstrate the efficacy of BEAMER with a comprehensive experimental evaluation on a challenging data set.« less
A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

PubMed

Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

2017-09-01

Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
TU-D-207B-02: Delta-Radiomics: The Prognostic Value of Therapy-Induced Changes in Radiomics Features for Stage III Non-Small Cell Lung Cancer Patients

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fave, X; Court, L; UT Health Science Center, Graduate School of Biomedical Sciences, Houston, TX

Purpose: To determine how radiomics features change during radiation therapy and whether those changes (delta-radiomics features) can improve prognostic models built with clinical factors. Methods: 62 radiomics features, including histogram, co-occurrence, run-length, gray-tone difference, and shape features, were calculated from pretreatment and weekly intra-treatment CTs for 107 stage III NSCLC patients (5–9 images per patient). Image preprocessing for each feature was determined using the set of pretreatment images: bit-depth resample and/or a smoothing filter were tested for their impact on volume-correlation and significance of each feature in univariate cox regression models to maximize their information content. Next, the optimized featuresmore » were calculated from the intratreatment images and tested in linear mixed-effects models to determine which features changed significantly with dose-fraction. The slopes in these significant features were defined as delta-radiomics features. To test their prognostic potential multivariate cox regression models were fitted, first using only clinical features and then clinical+delta-radiomics features for overall-survival, local-recurrence, and distant-metastases. Leave-one-out cross validation was used for model-fitting and patient predictions. Concordance indices(c-index) and p-values for the log-rank test with patients stratified at the median were calculated. Results: Approximately one-half of the 62 optimized features required no preprocessing, one-fourth required smoothing, and one-fourth required smoothing and resampling. From these, 54 changed significantly during treatment. For overall-survival, the c-index improved from 0.52 for clinical factors alone to 0.62 for clinical+delta-radiomics features. For distant-metastases, the c-index improved from 0.53 to 0.58, while for local-recurrence it did not improve. Patient stratification significantly improved (p-value<0.05) for overallsurvival and distant-metastases when delta-radiomics features were included. The delta-radiomics versions of autocorrelation, kurtosis, and compactness were selected most frequently in leave-one-out iterations. Conclusion: Weekly changes in radiomics features can potentially be used to evaluate treatment response and predict patient outcomes. High-risk patients could be recommended for dose escalation or consolidation chemotherapy. This project was funded in part by grants from the National Cancer Institute (NCI) and the Cancer Prevention Research Institute of Texas (CPRIT).« less
An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data.

PubMed

Liu, Jian; Cheng, Yuhu; Wang, Xuesong; Zhang, Lin; Liu, Hui

2017-08-17

It is urgent to diagnose colorectal cancer in the early stage. Some feature genes which are important to colorectal cancer development have been identified. However, for the early stage of colorectal cancer, less is known about the identity of specific cancer genes that are associated with advanced clinical stage. In this paper, we conducted a feature extraction method named Optimal Mean based Block Robust Feature Extraction method (OMBRFE) to identify feature genes associated with advanced colorectal cancer in clinical stage by using the integrated colorectal cancer data. Firstly, based on the optimal mean and L 2,1 -norm, a novel feature extraction method called Optimal Mean based Robust Feature Extraction method (OMRFE) is proposed to identify feature genes. Then the OMBRFE method which introduces the block ideology into OMRFE method is put forward to process the colorectal cancer integrated data which includes multiple genomic data: copy number alterations, somatic mutations, methylation expression alteration, as well as gene expression changes. Experimental results demonstrate that the OMBRFE is more effective than previous methods in identifying the feature genes. Moreover, genes identified by OMBRFE are verified to be closely associated with advanced colorectal cancer in clinical stage.
Optimizing methods for linking cinematic features to fMRI data.

PubMed

Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

2015-04-15

One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
MRM-Lasso: A Sparse Multiview Feature Selection Method via Low-Rank Analysis.

PubMed

Yang, Wanqi; Gao, Yang; Shi, Yinghuan; Cao, Longbing

2015-11-01

Learning about multiview data involves many applications, such as video understanding, image classification, and social media. However, when the data dimension increases dramatically, it is important but very challenging to remove redundant features in multiview feature selection. In this paper, we propose a novel feature selection algorithm, multiview rank minimization-based Lasso (MRM-Lasso), which jointly utilizes Lasso for sparse feature selection and rank minimization for learning relevant patterns across views. Instead of simply integrating multiple Lasso from view level, we focus on the performance of sample-level (sample significance) and introduce pattern-specific weights into MRM-Lasso. The weights are utilized to measure the contribution of each sample to the labels in the current view. In addition, the latent correlation across different views is successfully captured by learning a low-rank matrix consisting of pattern-specific weights. The alternating direction method of multipliers is applied to optimize the proposed MRM-Lasso. Experiments on four real-life data sets show that features selected by MRM-Lasso have better multiview classification performance than the baselines. Moreover, pattern-specific weights are demonstrated to be significant for learning about multiview data, compared with view-specific weights.
Diagnosing and ranking retinopathy disease level using diabetic fundus image recuperation approach.

PubMed

Somasundaram, K; Rajendran, P Alli

2015-01-01

Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time.

Neighborhood Structural Similarity Mapping for the Classification of Masses in Mammograms.

PubMed

Rabidas, Rinku; Midya, Abhishek; Chakraborty, Jayasree

2018-05-01

In this paper, two novel feature extraction methods, using neighborhood structural similarity (NSS), are proposed for the characterization of mammographic masses as benign or malignant. Since gray-level distribution of pixels is different in benign and malignant masses, more regular and homogeneous patterns are visible in benign masses compared to malignant masses; the proposed method exploits the similarity between neighboring regions of masses by designing two new features, namely, NSS-I and NSS-II, which capture global similarity at different scales. Complementary to these global features, uniform local binary patterns are computed to enhance the classification efficiency by combining with the proposed features. The performance of the features are evaluated using the images from the mini-mammographic image analysis society (mini-MIAS) and digital database for screening mammography (DDSM) databases, where a tenfold cross-validation technique is incorporated with Fisher linear discriminant analysis, after selecting the optimal set of features using stepwise logistic regression method. The best area under the receiver operating characteristic curve of 0.98 with an accuracy of is achieved with the mini-MIAS database, while the same for the DDSM database is 0.93 with accuracy .
Diagnosing and Ranking Retinopathy Disease Level Using Diabetic Fundus Image Recuperation Approach

PubMed Central

Somasundaram, K.; Alli Rajendran, P.

2015-01-01

Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time. PMID:25945362
A programmable optimization environment using the GAMESS-US and MERLIN/MCL packages. Applications on intermolecular interaction energies

NASA Astrophysics Data System (ADS)

Kalatzis, Fanis G.; Papageorgiou, Dimitrios G.; Demetropoulos, Ioannis N.

2006-09-01

The Merlin/MCL optimization environment and the GAMESS-US package were combined so as to offer an extended and efficient quantum chemistry optimization system, capable of implementing complex optimization strategies for generic molecular modeling problems. A communication and data exchange interface was established between the two packages exploiting all Merlin features such as multiple optimizers, box constraints, user extensions and a high level programming language. An important feature of the interface is its ability to perform dimer computations by eliminating the basis set superposition error using the counterpoise (CP) method of Boys and Bernardi. Furthermore it offers CP-corrected geometry optimizations using analytic derivatives. The unified optimization environment was applied to construct portions of the intermolecular potential energy surface of the weakly bound H-bonded complex C 6H 6-H 2O by utilizing the high level Merlin Control Language. The H-bonded dimer HF-H 2O was also studied by CP-corrected geometry optimization. The ab initio electronic structure energies were calculated using the 6-31G ** basis set at the Restricted Hartree-Fock and second-order Moller-Plesset levels, while all geometry optimizations were carried out using a quasi-Newton algorithm provided by Merlin. Program summaryTitle of program: MERGAM Catalogue identifier:ADYB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYB_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: The program is designed for machines running the UNIX operating system. It has been tested on the following architectures: IA32 (Linux with gcc/g77 v.3.2.3), AMD64 (Linux with the Portland group compilers v.6.0), SUN64 (SunOS 5.8 with the Sun Workshop compilers v.5.2) and SGI64 (IRIX 6.5 with the MIPSpro compilers v.7.4) Installations: University of Ioannina, Greece Operating systems or monitors under which the program has been tested: UNIX Programming language used: ANSI C, ANSI Fortran-77 No. of lines in distributed program, including test data, etc.:11 282 No. of bytes in distributed program, including test data, etc.: 49 458 Distribution format: tar.gz Memory required to execute with typical data: Memory requirements mainly depend on the selection of a GAMESS-US basis set and the number of atoms No. of bits in a word: 32 No. of processors used: 1 Has the code been vectorized or parallelized?: no Nature of physical problem: Multidimensional geometry optimization is of great importance in any ab initio calculation since it usually is one of the most CPU-intensive tasks, especially on large molecular systems. For example, the geometric and energetic description of van der Waals and weakly bound H-bonded complexes requires the construction of related important portions of the multidimensional intermolecular potential energy surface (IPES). So the various held views about the nature of these bonds can be quantitatively tested. Method of solution: The Merlin/MCL optimization environment was interconnected with the GAMESS-US package to facilitate geometry optimization in quantum chemistry problems. The important portions of the IPES require the capability to program optimization strategies. The Merlin/MCL environment was used for the implementation of such strategies. In this work, a CP-corrected geometry optimization was performed on the HF-H 2O complex and an MCL program was developed to study portions of the potential energy surface of the C 6H 6-H 2O complex. Restrictions on the complexity of the problem: The Merlin optimization environment and the GAMESS-US package must be installed. The MERGAM interface requires GAMESS-US input files that have been constructed in Cartesian coordinates. This restriction occurs from a design-time requirement to not allow reorientation of atomic coordinates; this rule holds always true when applying the COORD = UNIQUE keyword in a GAMESS-US input file. Typical running time: It depends on the size of the molecular system, the size of the basis set and the method of electron correlation. Execution of the test run took approximately 5 min on a 2.8 GHz Intel Pentium CPU.
Driver face recognition as a security and safety feature

NASA Astrophysics Data System (ADS)

Vetter, Volker; Giefing, Gerd-Juergen; Mai, Rudolf; Weisser, Hubert

1995-09-01

We present a driver face recognition system for comfortable access control and individual settings of automobiles. The primary goals are the prevention of car thefts and heavy accidents caused by unauthorized use (joy-riders), as well as the increase of safety through optimal settings, e.g. of the mirrors and the seat position. The person sitting on the driver's seat is observed automatically by a small video camera in the dashboard. All he has to do is to behave cooperatively, i.e. to look into the camera. A classification system validates his access. Only after a positive identification, the car can be used and the driver-specific environment (e.g. seat position, mirrors, etc.) may be set up to ensure the driver's comfort and safety. The driver identification system has been integrated in a Volkswagen research car. Recognition results are presented.
Automatic Feature Selection and Weighting for the Formation of Homogeneous Groups for Regional Intensity-Duration-Frequency (IDF) Curve Estimation

NASA Astrophysics Data System (ADS)

Yang, Z.; Burn, D. H.

2017-12-01

Extreme rainfall events can have devastating impacts on society. To quantify the associated risk, the IDF curve has been used to provide the essential rainfall-related information for urban planning. However, the recent changes in the rainfall climatology caused by climate change and urbanization have made the estimates provided by the traditional regional IDF approach increasingly inaccurate. This inaccuracy is mainly caused by two problems: 1) The ineffective choice of similarity indicators for the formation of a homogeneous group at different regions; and 2) An inadequate number of stations in the pooling group that does not adequately reflect the optimal balance between group size and group homogeneity or achieve the lowest uncertainty in the rainfall quantiles estimates. For the first issue, to consider the temporal difference among different meteorological and topographic indicators, a three-layer design is proposed based on three stages in the extreme rainfall formation: cloud formation, rainfall generation and change of rainfall intensity above urban surface. During the process, the impacts from climate change and urbanization are considered through the inclusion of potential relevant features at each layer. Then to consider spatial difference of similarity indicators for the homogeneous group formation at various regions, an automatic feature selection and weighting algorithm, specifically the hybrid searching algorithm of Tabu search, Lagrange Multiplier and Fuzzy C-means Clustering, is used to select the optimal combination of features for the potential optimal homogenous groups formation at a specific region. For the second issue, to compare the uncertainty of rainfall quantile estimates among potential groups, the two sample Kolmogorov-Smirnov test-based sample ranking process is used. During the process, linear programming is used to rank these groups based on the confidence intervals of the quantile estimates. The proposed methodology fills the gap of including the urbanization impacts during the pooling group formation, and challenges the traditional assumption that the same set of similarity indicators can be equally effective in generating the optimal homogeneous group for regions with different geographic and meteorological characteristics.
Exploring barriers and facilitators to the clinical use of virtual reality for post-stroke unilateral spatial neglect assessment.

PubMed

Ogourtsova, Tatiana; Archambault, Philippe S; Lamontagne, Anouk

2017-11-07

Hemineglect, defined as a failure to attend to the contralesional side of space, is a prevalent and disabling post-stroke deficit. Conventional hemineglect assessments lack sensitivity as they contain mainly non-functional tasks performed in near-extrapersonal space, using static, two-dimensional methods. This is of concern given that hemineglect is a strong predictor for functional deterioration, limited post-stroke recovery, and difficulty in community reintegration. With the emerging field of virtual reality, several virtual tools have been proposed and have reported better sensitivity in neglect-related deficits detection than conventional methods. However, these and future virtual reality-based tools are yet to be implemented in clinical practice. The present study aimed to explore the barriers/facilitators perceived by clinicians in the use of virtual reality for hemineglect assessment; and to identify features of an optimal virtual assessment. A qualitative descriptive process, in the form of focus groups, self-administered questionnaire and individual interviews was used. Two focus groups (n = 11 clinicians) were conducted and experts in the field (n = 3) were individually interviewed. Several barriers and facilitators, including personal, institutional, client suitability, and equipment factors, were identified. Clinicians and experts in the field reported numerous features for the virtual tool optimization. Factors identified through this study lay the foundation for the development of a knowledge translation initiative towards an implementation of a virtual assessment for hemineglect. Addressing the identified barriers/facilitators during implementation and incorporating the optimal features in the design of the virtual assessment could assist and promote its eventual adoption in clinical settings. Implications for rehabilitation A multimodal and active knowledge translation intervention built on the presently identified modifiable factors is suggested to be implemented to support the clinical integration of a virtual reality-based assessment for post-stroke hemineglect. To amplify application and usefulness of a virtual-reality based tool in the assessment of post-stroke hemineglect, optimal features identified in the present study should be incorporated in the design of such technology.
CUDA-based acceleration and BPN-assisted automation of bilateral filtering for brain MR image restoration.

PubMed

Chang, Herng-Hua; Chang, Yu-Ning

2017-04-01

Bilateral filters have been substantially exploited in numerous magnetic resonance (MR) image restoration applications for decades. Due to the deficiency of theoretical basis on the filter parameter setting, empirical manipulation with fixed values and noise variance-related adjustments has generally been employed. The outcome of these strategies is usually sensitive to the variation of the brain structures and not all the three parameter values are optimal. This article is in an attempt to investigate the optimal setting of the bilateral filter, from which an accelerated and automated restoration framework is developed. To reduce the computational burden of the bilateral filter, parallel computing with the graphics processing unit (GPU) architecture is first introduced. The NVIDIA Tesla K40c GPU with the compute unified device architecture (CUDA) functionality is specifically utilized to emphasize thread usages and memory resources. To correlate the filter parameters with image characteristics for automation, optimal image texture features are subsequently acquired based on the sequential forward floating selection (SFFS) scheme. Subsequently, the selected features are introduced into the back propagation network (BPN) model for filter parameter estimation. Finally, the k-fold cross validation method is adopted to evaluate the accuracy of the proposed filter parameter prediction framework. A wide variety of T1-weighted brain MR images with various scenarios of noise levels and anatomic structures were utilized to train and validate this new parameter decision system with CUDA-based bilateral filtering. For a common brain MR image volume of 256 × 256 × 256 pixels, the speed-up gain reached 284. Six optimal texture features were acquired and associated with the BPN to establish a "high accuracy" parameter prediction system, which achieved a mean absolute percentage error (MAPE) of 5.6%. Automatic restoration results on 2460 brain MR images received an average relative error in terms of peak signal-to-noise ratio (PSNR) less than 0.1%. In comparison with many state-of-the-art filters, the proposed automation framework with CUDA-based bilateral filtering provided more favorable results both quantitatively and qualitatively. Possessing unique characteristics and demonstrating exceptional performances, the proposed CUDA-based bilateral filter adequately removed random noise in multifarious brain MR images for further study in neurosciences and radiological sciences. It requires no prior knowledge of the noise variance and automatically restores MR images while preserving fine details. The strategy of exploiting the CUDA to accelerate the computation and incorporating texture features into the BPN to completely automate the bilateral filtering process is achievable and validated, from which the best performance is reached. © 2017 American Association of Physicists in Medicine.
Optimization of mass spectrometry acquisition parameters for determination of polycarbonate additives, degradation products, and colorants migrating from food contact materials to chocolate.

PubMed

Bignardi, Chiara; Cavazza, Antonella; Laganà, Carmen; Salvadeo, Paola; Corradini, Claudio

2018-01-01

The interest towards "substances of emerging concerns" referred to objects intended to come into contact with food is recently growing. Such substances can be found in traces in simulants and in food products put in contact with plastic materials. In this context, it is important to set up analytical systems characterized by high sensitivity and to improve detection parameters to enhance signals. This work was aimed at optimizing a method based on UHPLC coupled to high resolution mass spectrometry to quantify the most common plastic additives, and able to detect the presence of polymers degradation products and coloring agents migrating from plastic re-usable containers. The optimization of mass spectrometric parameter settings for quantitative analysis of additives has been achieved by a chemometric approach, using a full factorial and d-optimal experimental designs, allowing to evaluate possible interactions between the investigated parameters. Results showed that the optimized method was characterized by improved features in terms of sensitivity respect to existing methods and was successfully applied to the analysis of a complex model food system such as chocolate put in contact with 14 polycarbonate tableware samples. A new procedure for sample pre-treatment was carried out and validated, showing high reliability. Results reported, for the first time, the presence of several molecules migrating to chocolate, in particular belonging to plastic additives, such Cyasorb UV5411, Tinuvin 234, Uvitex OB, and oligomers, whose amount was found to be correlated to age and degree of damage of the containers. Copyright © 2017 John Wiley & Sons, Ltd.
Framework GRASP: routine library for optimize processing of aerosol remote sensing observation

NASA Astrophysics Data System (ADS)

Fuertes, David; Torres, Benjamin; Dubovik, Oleg; Litvinov, Pavel; Lapyonok, Tatyana; Ducos, Fabrice; Aspetsberger, Michael; Federspiel, Christian

The present the development of a Framework for the Generalized Retrieval of Aerosol and Surface Properties (GRASP) developed by Dubovik et al., (2011). The framework is a source code project that attempts to strengthen the value of the GRASP inversion algorithm by transforming it into a library that will be used later for a group of customized application modules. The functions of the independent modules include the managing of the configuration of the code execution, as well as preparation of the input and output. The framework provides a number of advantages in utilization of the code. First, it implements loading data to the core of the scientific code directly from memory without passing through intermediary files on disk. Second, the framework allows consecutive use of the inversion code without the re-initiation of the core routine when new input is received. These features are essential for optimizing performance of the data production in processing of large observation sets, such as satellite images by the GRASP. Furthermore, the framework is a very convenient tool for further development, because this open-source platform is easily extended for implementing new features. For example, it could accommodate loading of raw data directly onto the inversion code from a specific instrument not included in default settings of the software. Finally, it will be demonstrated that from the user point of view, the framework provides a flexible, powerful and informative configuration system.
An optimal transportation approach for nuclear structure-based pathology.

PubMed

Wang, Wei; Ozolek, John A; Slepčev, Dejan; Lee, Ann B; Chen, Cheng; Rohde, Gustavo K

2011-03-01

Nuclear morphology and structure as visualized from histopathology microscopy images can yield important diagnostic clues in some benign and malignant tissue lesions. Precise quantitative information about nuclear structure and morphology, however, is currently not available for many diagnostic challenges. This is due, in part, to the lack of methods to quantify these differences from image data. We describe a method to characterize and contrast the distribution of nuclear structure in different tissue classes (normal, benign, cancer, etc.). The approach is based on quantifying chromatin morphology in different groups of cells using the optimal transportation (Kantorovich-Wasserstein) metric in combination with the Fisher discriminant analysis and multidimensional scaling techniques. We show that the optimal transportation metric is able to measure relevant biological information as it enables automatic determination of the class (e.g., normal versus cancer) of a set of nuclei. We show that the classification accuracies obtained using this metric are, on average, as good or better than those obtained utilizing a set of previously described numerical features. We apply our methods to two diagnostic challenges for surgical pathology: one in the liver and one in the thyroid. Results automatically computed using this technique show potentially biologically relevant differences in nuclear structure in liver and thyroid cancers.
An optimal transportation approach for nuclear structure-based pathology

PubMed Central

Wang, Wei; Ozolek, John A.; Slepčev, Dejan; Lee, Ann B.; Chen, Cheng; Rohde, Gustavo K.

2012-01-01

Nuclear morphology and structure as visualized from histopathology microscopy images can yield important diagnostic clues in some benign and malignant tissue lesions. Precise quantitative information about nuclear structure and morphology, however, is currently not available for many diagnostic challenges. This is due, in part, to the lack of methods to quantify these differences from image data. We describe a method to characterize and contrast the distribution of nuclear structure in different tissue classes (normal, benign, cancer, etc.). The approach is based on quantifying chromatin morphology in different groups of cells using the optimal transportation (Kantorovich-Wasserstein) metric in combination with the Fisher discriminant analysis and multidimensional scaling techniques. We show that the optimal transportation metric is able to measure relevant biological information as it enables automatic determination of the class (e.g. normal vs. cancer) of a set of nuclei. We show that the classification accuracies obtained using this metric are, on average, as good or better than those obtained utilizing a set of previously described numerical features. We apply our methods to two diagnostic challenges for surgical pathology: one in the liver and one in the thyroid. Results automatically computed using this technique show potentially biologically relevant differences in nuclear structure in liver and thyroid cancers. PMID:20977984
Multiple-Objective Optimal Designs for Studying the Dose Response Function and Interesting Dose Levels

PubMed Central

Hyun, Seung Won; Wong, Weng Kee

2016-01-01

We construct an optimal design to simultaneously estimate three common interesting features in a dose-finding trial with possibly different emphasis on each feature. These features are (1) the shape of the dose-response curve, (2) the median effective dose and (3) the minimum effective dose level. A main difficulty of this task is that an optimal design for a single objective may not perform well for other objectives. There are optimal designs for dual objectives in the literature but we were unable to find optimal designs for 3 or more objectives to date with a concrete application. A reason for this is that the approach for finding a dual-objective optimal design does not work well for a 3 or more multiple-objective design problem. We propose a method for finding multiple-objective optimal designs that estimate the three features with user-specified higher efficiencies for the more important objectives. We use the flexible 4-parameter logistic model to illustrate the methodology but our approach is applicable to find multiple-objective optimal designs for other types of objectives and models. We also investigate robustness properties of multiple-objective optimal designs to mis-specification in the nominal parameter values and to a variation in the optimality criterion. We also provide computer code for generating tailor made multiple-objective optimal designs. PMID:26565557
Multiple-Objective Optimal Designs for Studying the Dose Response Function and Interesting Dose Levels.

PubMed

Hyun, Seung Won; Wong, Weng Kee

2015-11-01

We construct an optimal design to simultaneously estimate three common interesting features in a dose-finding trial with possibly different emphasis on each feature. These features are (1) the shape of the dose-response curve, (2) the median effective dose and (3) the minimum effective dose level. A main difficulty of this task is that an optimal design for a single objective may not perform well for other objectives. There are optimal designs for dual objectives in the literature but we were unable to find optimal designs for 3 or more objectives to date with a concrete application. A reason for this is that the approach for finding a dual-objective optimal design does not work well for a 3 or more multiple-objective design problem. We propose a method for finding multiple-objective optimal designs that estimate the three features with user-specified higher efficiencies for the more important objectives. We use the flexible 4-parameter logistic model to illustrate the methodology but our approach is applicable to find multiple-objective optimal designs for other types of objectives and models. We also investigate robustness properties of multiple-objective optimal designs to mis-specification in the nominal parameter values and to a variation in the optimality criterion. We also provide computer code for generating tailor made multiple-objective optimal designs.
A Comparative Theoretical and Computational Study on Robust Counterpart Optimization: I. Robust Linear Optimization and Robust Mixed Integer Linear Optimization

PubMed Central

Li, Zukui; Ding, Ran; Floudas, Christodoulos A.

2011-01-01

Robust counterpart optimization techniques for linear optimization and mixed integer linear optimization problems are studied in this paper. Different uncertainty sets, including those studied in literature (i.e., interval set; combined interval and ellipsoidal set; combined interval and polyhedral set) and new ones (i.e., adjustable box; pure ellipsoidal; pure polyhedral; combined interval, ellipsoidal, and polyhedral set) are studied in this work and their geometric relationship is discussed. For uncertainty in the left hand side, right hand side, and objective function of the optimization problems, robust counterpart optimization formulations induced by those different uncertainty sets are derived. Numerical studies are performed to compare the solutions of the robust counterpart optimization models and applications in refinery production planning and batch process scheduling problem are presented. PMID:21935263
FSMRank: feature selection algorithm for learning to rank.

PubMed

Lai, Han-Jiang; Pan, Yan; Tang, Yong; Yu, Rong

2013-06-01

In recent years, there has been growing interest in learning to rank. The introduction of feature selection into different learning problems has been proven effective. These facts motivate us to investigate the problem of feature selection for learning to rank. We propose a joint convex optimization formulation which minimizes ranking errors while simultaneously conducting feature selection. This optimization formulation provides a flexible framework in which we can easily incorporate various importance measures and similarity measures of the features. To solve this optimization problem, we use the Nesterov's approach to derive an accelerated gradient algorithm with a fast convergence rate O(1/T(2)). We further develop a generalization bound for the proposed optimization problem using the Rademacher complexities. Extensive experimental evaluations are conducted on the public LETOR benchmark datasets. The results demonstrate that the proposed method shows: 1) significant ranking performance gain compared to several feature selection baselines for ranking, and 2) very competitive performance compared to several state-of-the-art learning-to-rank algorithms.
Pareto-optimal multi-objective dimensionality reduction deep auto-encoder for mammography classification.

PubMed

Taghanaki, Saeid Asgari; Kawahara, Jeremy; Miles, Brandon; Hamarneh, Ghassan

2017-07-01

Feature reduction is an essential stage in computer aided breast cancer diagnosis systems. Multilayer neural networks can be trained to extract relevant features by encoding high-dimensional data into low-dimensional codes. Optimizing traditional auto-encoders works well only if the initial weights are close to a proper solution. They are also trained to only reduce the mean squared reconstruction error (MRE) between the encoder inputs and the decoder outputs, but do not address the classification error. The goal of the current work is to test the hypothesis that extending traditional auto-encoders (which only minimize reconstruction error) to multi-objective optimization for finding Pareto-optimal solutions provides more discriminative features that will improve classification performance when compared to single-objective and other multi-objective approaches (i.e. scalarized and sequential). In this paper, we introduce a novel multi-objective optimization of deep auto-encoder networks, in which the auto-encoder optimizes two objectives: MRE and mean classification error (MCE) for Pareto-optimal solutions, rather than just MRE. These two objectives are optimized simultaneously by a non-dominated sorting genetic algorithm. We tested our method on 949 X-ray mammograms categorized into 12 classes. The results show that the features identified by the proposed algorithm allow a classification accuracy of up to 98.45%, demonstrating favourable accuracy over the results of state-of-the-art methods reported in the literature. We conclude that adding the classification objective to the traditional auto-encoder objective and optimizing for finding Pareto-optimal solutions, using evolutionary multi-objective optimization, results in producing more discriminative features. Copyright © 2017 Elsevier B.V. All rights reserved.
Ant-cuckoo colony optimization for feature selection in digital mammogram.

PubMed

Jona, J B; Nagaveni, N

2014-01-15

Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.
Two-speed phacoemulsification for soft cataracts using optimized parameters and procedure step toolbar with the CENTURION Vision System and Balanced Tip.

PubMed

Davison, James A

2015-01-01

To present a cause of posterior capsule aspiration and a technique using optimized parameters to prevent it from happening when operating soft cataracts. A prospective list of posterior capsule aspiration cases was kept over 4,062 consecutive cases operated with the Alcon CENTURION machine and Balanced Tip. Video analysis of one case of posterior capsule aspiration was accomplished. A surgical technique was developed using empirically derived machine parameters and customized setting-selection procedure step toolbar to reduce the pace of aspiration of soft nuclear quadrants in order to prevent capsule aspiration. Two cases out of 3,238 experienced posterior capsule aspiration before use of the soft quadrant technique. Video analysis showed an attractive vortex effect with capsule aspiration occurring in 1/5 of a second. A soft quadrant removal setting was empirically derived which had a slower pace and seemed more controlled with no capsule aspiration occurring in the subsequent 824 cases. The setting featured simultaneous linear control from zero to preset maximums for: aspiration flow, 20 mL/min; and vacuum, 400 mmHg, with the addition of torsional tip amplitude up to 20% after the fluidic maximums were achieved. A new setting selection procedure step toolbar was created to increase intraoperative flexibility by providing instantaneous shifting between the soft and normal settings. A technique incorporating a reduced pace for soft quadrant acquisition and aspiration can be accomplished through the use of a dedicated setting of integrated machine parameters. Toolbar placement of the procedure button next to the normal setting procedure button provides the opportunity to instantaneously alternate between the two settings. Simultaneous surgeon control over vacuum, aspiration flow, and torsional tip motion may make removal of soft nuclear quadrants more efficient and safer.
Automated pixel-wise brain tissue segmentation of diffusion-weighted images via machine learning.

PubMed

Ciritsis, Alexander; Boss, Andreas; Rossi, Cristina

2018-04-26

The diffusion-weighted (DW) MR signal sampled over a wide range of b-values potentially allows for tissue differentiation in terms of cellularity, microstructure, perfusion, and T 2 relaxivity. This study aimed to implement a machine learning algorithm for automatic brain tissue segmentation from DW-MRI datasets, and to determine the optimal sub-set of features for accurate segmentation. DWI was performed at 3 T in eight healthy volunteers using 15 b-values and 20 diffusion-encoding directions. The pixel-wise signal attenuation, as well as the trace and fractional anisotropy (FA) of the diffusion tensor, were used as features to train a support vector machine classifier for gray matter, white matter, and cerebrospinal fluid classes. The datasets of two volunteers were used for validation. For each subject, tissue classification was also performed on 3D T 1 -weighted data sets with a probabilistic framework. Confusion matrices were generated for quantitative assessment of image classification accuracy in comparison with the reference method. DWI-based tissue segmentation resulted in an accuracy of 82.1% on the validation dataset and of 82.2% on the training dataset, excluding relevant model over-fitting. A mean Dice coefficient (DSC) of 0.79 ± 0.08 was found. About 50% of the classification performance was attributable to five features (i.e. signal measured at b-values of 5/10/500/1200 s/mm 2 and the FA). This reduced set of features led to almost identical performances for the validation (82.2%) and the training (81.4%) datasets (DSC = 0.79 ± 0.08). Machine learning techniques applied to DWI data allow for accurate brain tissue segmentation based on both morphological and functional information. Copyright © 2018 John Wiley & Sons, Ltd.
QSAR models for thiophene and imidazopyridine derivatives inhibitors of the Polo-Like Kinase 1.

PubMed

Comelli, Nieves C; Duchowicz, Pablo R; Castro, Eduardo A

2014-10-01

The inhibitory activity of 103 thiophene and 33 imidazopyridine derivatives against Polo-Like Kinase 1 (PLK1) expressed as pIC50 (-logIC50) was predicted by QSAR modeling. Multivariate linear regression (MLR) was employed to model the relationship between 0D and 3D molecular descriptors and biological activities of molecules using the replacement method (MR) as variable selection tool. The 136 compounds were separated into several training and test sets. Two splitting approaches, distribution of biological data and structural diversity, and the statistical experimental design procedure D-optimal distance were applied to the dataset. The significance of the training set models was confirmed by statistically higher values of the internal leave one out cross-validated coefficient of determination (Q2) and external predictive coefficient of determination for the test set (Rtest2). The model developed from a training set, obtained with the D-optimal distance protocol and using 3D descriptor space along with activity values, separated chemical features that allowed to distinguish high and low pIC50 values reasonably well. Then, we verified that such model was sufficient to reliably and accurately predict the activity of external diverse structures. The model robustness was properly characterized by means of standard procedures and their applicability domain (AD) was analyzed by leverage method. Copyright © 2014 Elsevier B.V. All rights reserved.

Heliocentric interplanetary low thrust trajectory optimization program, supplement 1, part 2

NASA Technical Reports Server (NTRS)

Mann, F. I.; Horsewood, J. L.

1978-01-01

The improvements made to the HILTOP electric propulsion trajectory computer program are described. A more realistic propulsion system model was implemented in which various thrust subsystem efficiencies and specific impulse are modeled as variable functions of power available to the propulsion system. The number of operating thrusters are staged, and the beam voltage is selected from a set of five (or less) constant voltages, based upon the application of variational calculus. The constant beam voltages may be optimized individually or collectively. The propulsion system logic is activated by a single program input key in such a manner as to preserve the HILTOP logic. An analysis describing these features, a complete description of program input quantities, and sample cases of computer output illustrating the program capabilities are presented.
The Affective Core of Emotion: Linking Pleasure, Subjective Well-Being, and Optimal Metastability in the Brain

PubMed Central

Kringelbach, Morten L.; Berridge, Kent C.

2017-01-01

Arguably, emotion is always valenced—either pleasant or unpleasant—and dependent on the pleasure system. This system serves adaptive evolutionary functions; relying on separable wanting, liking, and learning neural mechanisms mediated by mesocorticolimbic networks driving pleasure cycles with appetitive, consummatory, and satiation phases. Liking is generated in a small set of discrete hedonic hotspots and coldspots, while wanting is linked to dopamine and to larger distributed brain networks. Breakdown of the pleasure system can lead to anhedonia and other features of affective disorders. Eudaimonia and well-being are difficult to study empirically, yet whole-brain computational models could offer novel insights (e.g., routes to eudaimonia such as caregiving of infants or music) potentially linking eudaimonia to optimal metastability in the pleasure system. PMID:28943891
An ICA-based method for the identification of optimal FMRI features and components using combined group-discriminative techniques

PubMed Central

Sui, Jing; Adali, Tülay; Pearlson, Godfrey D.; Calhoun, Vince D.

2013-01-01

Extraction of relevant features from multitask functional MRI (fMRI) data in order to identify potential biomarkers for disease, is an attractive goal. In this paper, we introduce a novel feature-based framework, which is sensitive and accurate in detecting group differences (e.g. controls vs. patients) by proposing three key ideas. First, we integrate two goal-directed techniques: coefficient-constrained independent component analysis (CC-ICA) and principal component analysis with reference (PCA-R), both of which improve sensitivity to group differences. Secondly, an automated artifact-removal method is developed for selecting components of interest derived from CC-ICA, with an average accuracy of 91%. Finally, we propose a strategy for optimal feature/component selection, aiming to identify optimal group-discriminative brain networks as well as the tasks within which these circuits are engaged. The group-discriminating performance is evaluated on 15 fMRI feature combinations (5 single features and 10 joint features) collected from 28 healthy control subjects and 25 schizophrenia patients. Results show that a feature from a sensorimotor task and a joint feature from a Sternberg working memory (probe) task and an auditory oddball (target) task are the top two feature combinations distinguishing groups. We identified three optimal features that best separate patients from controls, including brain networks consisting of temporal lobe, default mode and occipital lobe circuits, which when grouped together provide improved capability in classifying group membership. The proposed framework provides a general approach for selecting optimal brain networks which may serve as potential biomarkers of several brain diseases and thus has wide applicability in the neuroimaging research community. PMID:19457398
SU-C-207B-05: Tissue Segmentation of Computed Tomography Images Using a Random Forest Algorithm: A Feasibility Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polan, D; Brady, S; Kaufman, R

2016-06-15

Purpose: Develop an automated Random Forest algorithm for tissue segmentation of CT examinations. Methods: Seven materials were classified for segmentation: background, lung/internal gas, fat, muscle, solid organ parenchyma, blood/contrast, and bone using Matlab and the Trainable Weka Segmentation (TWS) plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance each evaluated over a pixel radius of 2n, (n = 0–4). Also noise reduction and edge preserving filters, Gaussian, bilateral, Kuwahara, and anisotropic diffusion, were evaluated. The algorithm used 200 trees with 2 features per node. A training data set was established using anmore » anonymized patient’s (male, 20 yr, 72 kg) chest-abdomen-pelvis CT examination. To establish segmentation ground truth, the training data were manually segmented using Eclipse planning software, and an intra-observer reproducibility test was conducted. Six additional patient data sets were segmented based on classifier data generated from the training data. Accuracy of segmentation was determined by calculating the Dice similarity coefficient (DSC) between manual and auto segmented images. Results: The optimized autosegmentation algorithm resulted in 16 features calculated using maximum, mean, variance, and Gaussian blur filters with kernel radii of 1, 2, and 4 pixels, in addition to the original CT number, and Kuwahara filter (linear kernel of 19 pixels). Ground truth had a DSC of 0.94 (range: 0.90–0.99) for adult and 0.92 (range: 0.85–0.99) for pediatric data sets across all seven segmentation classes. The automated algorithm produced segmentation with an average DSC of 0.85 ± 0.04 (range: 0.81–1.00) for the adult patients, and 0.86 ± 0.03 (range: 0.80–0.99) for the pediatric patients. Conclusion: The TWS Random Forest auto-segmentation algorithm was optimized for CT environment, and able to segment seven material classes over a range of body habitus and CT protocol parameters with an average DSC of 0.86 ± 0.04 (range: 0.80–0.99).« less
Implications of crater distributions on Venus

NASA Technical Reports Server (NTRS)

Kaula, W. M.

1993-01-01

The horizontal locations of craters on Venus are consistent with randomness. However, (1) randomness does not make crater counts useless for age indications; (2) consistency does not imply necessity or optimality; and (3) horizontal location is not the only reference frame against which to test models. Re (1), the apparent smallness of resurfacing areas means that a region on the order of one percent of the planet with a typical number of craters, 5-15, will have a range of feature ages of several 100 My. Re (2), models of resurfacing somewhat similar to Earth's can be found that are also consistent and more optimal than random: i.e., resurfacing occurring in clusters, that arise and die away in lime intervals on the order of 50 My. These agree with the observation that there are more areas of high crater density, and fewer of moderate density, than optimal for random. Re (3), 799 crater elevations were tested; there are more at low elevations and fewer at high elevations than optimal for random: i.e., 54.6 percent below the median. Only one of 40 random sets of 799 was as extreme.
Optimized spatial priorities for biodiversity conservation in China: a systematic conservation planning perspective.

PubMed

Wu, Ruidong; Long, Yongcheng; Malanson, George P; Garber, Paul A; Zhang, Shuang; Li, Diqiang; Zhao, Peng; Wang, Longzhu; Duo, Hairui

2014-01-01

By addressing several key features overlooked in previous studies, i.e. human disturbance, integration of ecosystem- and species-level conservation features, and principles of complementarity and representativeness, we present the first national-scale systematic conservation planning for China to determine the optimized spatial priorities for biodiversity conservation. We compiled a spatial database on the distributions of ecosystem- and species-level conservation features, and modeled a human disturbance index (HDI) by aggregating information using several socioeconomic proxies. We ran Marxan with two scenarios (HDI-ignored and HDI-considered) to investigate the effects of human disturbance, and explored the geographic patterns of the optimized spatial conservation priorities. Compared to when HDI was ignored, the HDI-considered scenario resulted in (1) a marked reduction (∼9%) in the total HDI score and a slight increase (∼7%) in the total area of the portfolio of priority units, (2) a significant increase (∼43%) in the total irreplaceable area and (3) more irreplaceable units being identified in almost all environmental zones and highly-disturbed provinces. Thus the inclusion of human disturbance is essential for cost-effective priority-setting. Attention should be targeted to the areas that are characterized as moderately-disturbed, <2,000 m in altitude, and/or intermediately- to extremely-rugged in terrain to identify potentially important regions for implementing cost-effective conservation. We delineated 23 primary large-scale priority areas that are significant for conserving China's biodiversity, but those isolated priority units in disturbed regions are in more urgent need of conservation actions so as to prevent immediate and severe biodiversity loss. This study presents a spatially optimized national-scale portfolio of conservation priorities--effectively representing the overall biodiversity of China while minimizing conflicts with economic development. Our results offer critical insights for current conservation and strategic land-use planning in China. The approach is transferable and easy to implement by end-users, and applicable for national- and local-scale systematic conservation prioritization practices.
Optimized Spatial Priorities for Biodiversity Conservation in China: A Systematic Conservation Planning Perspective

PubMed Central

Wu, Ruidong; Long, Yongcheng; Malanson, George P.; Garber, Paul A.; Zhang, Shuang; Li, Diqiang; Zhao, Peng; Wang, Longzhu; Duo, Hairui

2014-01-01

By addressing several key features overlooked in previous studies, i.e. human disturbance, integration of ecosystem- and species-level conservation features, and principles of complementarity and representativeness, we present the first national-scale systematic conservation planning for China to determine the optimized spatial priorities for biodiversity conservation. We compiled a spatial database on the distributions of ecosystem- and species-level conservation features, and modeled a human disturbance index (HDI) by aggregating information using several socioeconomic proxies. We ran Marxan with two scenarios (HDI-ignored and HDI-considered) to investigate the effects of human disturbance, and explored the geographic patterns of the optimized spatial conservation priorities. Compared to when HDI was ignored, the HDI-considered scenario resulted in (1) a marked reduction (∼9%) in the total HDI score and a slight increase (∼7%) in the total area of the portfolio of priority units, (2) a significant increase (∼43%) in the total irreplaceable area and (3) more irreplaceable units being identified in almost all environmental zones and highly-disturbed provinces. Thus the inclusion of human disturbance is essential for cost-effective priority-setting. Attention should be targeted to the areas that are characterized as moderately-disturbed, <2,000 m in altitude, and/or intermediately- to extremely-rugged in terrain to identify potentially important regions for implementing cost-effective conservation. We delineated 23 primary large-scale priority areas that are significant for conserving China's biodiversity, but those isolated priority units in disturbed regions are in more urgent need of conservation actions so as to prevent immediate and severe biodiversity loss. This study presents a spatially optimized national-scale portfolio of conservation priorities – effectively representing the overall biodiversity of China while minimizing conflicts with economic development. Our results offer critical insights for current conservation and strategic land-use planning in China. The approach is transferable and easy to implement by end-users, and applicable for national- and local-scale systematic conservation prioritization practices. PMID:25072933
Utilizing spatial and spectral features of photoacoustic imaging for ovarian cancer detection and diagnosis

NASA Astrophysics Data System (ADS)

Li, Hai; Kumavor, Patrick; Salman Alqasemi, Umar; Zhu, Quing

2015-01-01

A composite set of ovarian tissue features extracted from photoacoustic spectral data, beam envelope, and co-registered ultrasound and photoacoustic images are used to characterize malignant and normal ovaries using logistic and support vector machine (SVM) classifiers. Normalized power spectra were calculated from the Fourier transform of the photoacoustic beamformed data, from which the spectral slopes and 0-MHz intercepts were extracted. Five features were extracted from the beam envelope and another 10 features were extracted from the photoacoustic images. These 17 features were ranked by their p-values from t-tests on which a filter type of feature selection method was used to determine the optimal feature number for final classification. A total of 169 samples from 19 ex vivo ovaries were randomly distributed into training and testing groups. Both classifiers achieved a minimum value of the mean misclassification error when the seven features with lowest p-values were selected. Using these seven features, the logistic and SVM classifiers obtained sensitivities of 96.39±3.35% and 97.82±2.26%, and specificities of 98.92±1.39% and 100%, respectively, for the training group. For the testing group, logistic and SVM classifiers achieved sensitivities of 92.71±3.55% and 92.64±3.27%, and specificities of 87.52±8.78% and 98.49±2.05%, respectively.
A multistage approach to improve performance of computer-aided detection of pulmonary embolisms depicted on CT images: preliminary investigation.

PubMed

Park, Sang Cheol; Chapman, Brian E; Zheng, Bin

2011-06-01

This study developed a computer-aided detection (CAD) scheme for pulmonary embolism (PE) detection and investigated several approaches to improve CAD performance. In the study, 20 computed tomography examinations with various lung diseases were selected, which include 44 verified PE lesions. The proposed CAD scheme consists of five basic steps: 1) lung segmentation; 2) PE candidate extraction using an intensity mask and tobogganing region growing; 3) PE candidate feature extraction; 4) false-positive (FP) reduction using an artificial neural network (ANN); and 5) a multifeature-based k-nearest neighbor for positive/negative classification. In this study, we also investigated the following additional methods to improve CAD performance: 1) grouping 2-D detected features into a single 3-D object; 2) selecting features with a genetic algorithm (GA); and 3) limiting the number of allowed suspicious lesions to be cued in one examination. The results showed that 1) CAD scheme using tobogganing, an ANN, and grouping method achieved the maximum detection sensitivity of 79.2%; 2) the maximum scoring method achieved the superior performance over other scoring fusion methods; 3) GA was able to delete "redundant" features and further improve CAD performance; and 4) limiting the maximum number of cued lesions in an examination reduced FP rate by 5.3 times. Combining these approaches, CAD scheme achieved 63.2% detection sensitivity with 18.4 FP lesions per examination. The study suggested that performance of CAD schemes for PE detection depends on many factors that include 1) optimizing the 2-D region grouping and scoring methods; 2) selecting the optimal feature set; and 3) limiting the number of allowed cueing lesions per examination.
What is the Effect of Interannual Hydroclimatic Variability on Water Supply Reservoir Operations?

NASA Astrophysics Data System (ADS)

Galelli, S.; Turner, S. W. D.

2015-12-01

Rather than deriving from a single distribution and uniform persistence structure, hydroclimatic data exhibit significant trends and shifts in their mean, variance, and lagged correlation through time. Consequentially, observed and reconstructed streamflow records are often characterized by features of interannual variability, including long-term persistence and prolonged droughts. This study examines the effect of these features on the operating performance of water supply reservoirs. We develop a Stochastic Dynamic Programming (SDP) model that can incorporate a regime-shifting climate variable. We then compare the performance of operating policies—designed with and without climate variable—to quantify the contribution of interannual variability to standard policy sub-optimality. The approach uses a discrete-time Markov chain to partition the reservoir inflow time series into small number of 'hidden' climate states. Each state defines a distinct set of inflow transition probability matrices, which are used by the SDP model to condition the release decisions on the reservoir storage, current-period inflow and hidden climate state. The experimental analysis is carried out on 99 hypothetical water supply reservoirs fed from pristine catchments in Australia—all impacted by the Millennium drought. Results show that interannual hydroclimatic variability is a major cause of sub-optimal hedging decisions. The practical import is that conventional optimization methods may misguide operators, particularly in regions susceptible to multi-year droughts.
An Integrated Ransac and Graph Based Mismatch Elimination Approach for Wide-Baseline Image Matching

NASA Astrophysics Data System (ADS)

Hasheminasab, M.; Ebadi, H.; Sedaghat, A.

2015-12-01

In this paper we propose an integrated approach in order to increase the precision of feature point matching. Many different algorithms have been developed as to optimizing the short-baseline image matching while because of illumination differences and viewpoints changes, wide-baseline image matching is so difficult to handle. Fortunately, the recent developments in the automatic extraction of local invariant features make wide-baseline image matching possible. The matching algorithms which are based on local feature similarity principle, using feature descriptor as to establish correspondence between feature point sets. To date, the most remarkable descriptor is the scale-invariant feature transform (SIFT) descriptor , which is invariant to image rotation and scale, and it remains robust across a substantial range of affine distortion, presence of noise, and changes in illumination. The epipolar constraint based on RANSAC (random sample consensus) method is a conventional model for mismatch elimination, particularly in computer vision. Because only the distance from the epipolar line is considered, there are a few false matches in the selected matching results based on epipolar geometry and RANSAC. Aguilariu et al. proposed Graph Transformation Matching (GTM) algorithm to remove outliers which has some difficulties when the mismatched points surrounded by the same local neighbor structure. In this study to overcome these limitations, which mentioned above, a new three step matching scheme is presented where the SIFT algorithm is used to obtain initial corresponding point sets. In the second step, in order to reduce the outliers, RANSAC algorithm is applied. Finally, to remove the remained mismatches, based on the adjacent K-NN graph, the GTM is implemented. Four different close range image datasets with changes in viewpoint are utilized to evaluate the performance of the proposed method and the experimental results indicate its robustness and capability.
Digital robust active control law synthesis for large order systems using constrained optimization

NASA Technical Reports Server (NTRS)

Mukhopadhyay, Vivek

1987-01-01

This paper presents a direct digital control law synthesis procedure for a large order, sampled data, linear feedback system using constrained optimization techniques to meet multiple design requirements. A linear quadratic Gaussian type cost function is minimized while satisfying a set of constraints on the design loads and responses. General expressions for gradients of the cost function and constraints, with respect to the digital control law design variables are derived analytically and computed by solving a set of discrete Liapunov equations. The designer can choose the structure of the control law and the design variables, hence a stable classical control law as well as an estimator-based full or reduced order control law can be used as an initial starting point. Selected design responses can be treated as constraints instead of lumping them into the cost function. This feature can be used to modify a control law, to meet individual root mean square response limitations as well as minimum single value restrictions. Low order, robust digital control laws were synthesized for gust load alleviation of a flexible remotely piloted drone aircraft.
Multistage classification of multispectral Earth observational data: The design approach

NASA Technical Reports Server (NTRS)

Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.

1981-01-01

An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.
Comparison of SVM, RF and ELM on an Electronic Nose for the Intelligent Evaluation of Paraffin Samples

PubMed Central

Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan

2018-01-01

Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33–100%, and ELM, with an accuracy rate of 98.01–100%. For level assessment, the R2 related to the training set was above 0.97 and the R2 related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016–0.3494, lower than the error of 0.5–1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level. PMID:29346328
Scalable Nearest Neighbor Algorithms for High Dimensional Data.

PubMed

Muja, Marius; Lowe, David G

2014-11-01

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Optimizing data collection for public health decisions: a data mining approach

PubMed Central

2014-01-01

Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost. PMID:24919484
Optimizing data collection for public health decisions: a data mining approach.

PubMed

Partington, Susan N; Papakroni, Vasil; Menzies, Tim

2014-06-12

Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
Layout optimization with assist features placement by model based rule tables for 2x node random contact

NASA Astrophysics Data System (ADS)

Jun, Jinhyuck; Park, Minwoo; Park, Chanha; Yang, Hyunjo; Yim, Donggyu; Do, Munhoe; Lee, Dongchan; Kim, Taehoon; Choi, Junghoe; Luk-Pat, Gerard; Miloslavsky, Alex

2015-03-01

As the industry pushes to ever more complex illumination schemes to increase resolution for next generation memory and logic circuits, sub-resolution assist feature (SRAF) placement requirements become increasingly severe. Therefore device manufacturers are evaluating improvements in SRAF placement algorithms which do not sacrifice main feature (MF) patterning capability. There are known-well several methods to generate SRAF such as Rule based Assist Features (RBAF), Model Based Assist Features (MBAF) and Hybrid Assisted Features combining features of the different algorithms using both RBAF and MBAF. Rule Based Assist Features (RBAF) continue to be deployed, even with the availability of Model Based Assist Features (MBAF) and Inverse Lithography Technology (ILT). Certainly for the 3x nm node, and even at the 2x nm nodes and lower, RBAF is used because it demands less run time and provides better consistency. Since RBAF is needed now and in the future, what is also needed is a faster method to create the AF rule tables. The current method typically involves making masks and printing wafers that contain several experiments, varying the main feature configurations, AF configurations, dose conditions, and defocus conditions - this is a time consuming and expensive process. In addition, as the technology node shrinks, wafer process changes and source shape redesigns occur more frequently, escalating the cost of rule table creation. Furthermore, as the demand on process margin escalates, there is a greater need for multiple rule tables: each tailored to a specific set of main-feature configurations. Model Assisted Rule Tables(MART) creates a set of test patterns, and evaluates the simulated CD at nominal conditions, defocused conditions and off-dose conditions. It also uses lithographic simulation to evaluate the likelihood of AF printing. It then analyzes the simulation data to automatically create AF rule tables. It means that analysis results display the cost of different AF configurations as the space grows between a pair of main features. In summary, model based rule tables method is able to make it much easier to create rule tables, leading to faster rule-table creation and a lower barrier to the creation of more rule tables.
Structural MRI-based detection of Alzheimer's disease using feature ranking and classification error.

PubMed

Beheshti, Iman; Demirel, Hasan; Farokhian, Farnaz; Yang, Chunlan; Matsuda, Hiroshi

2016-12-01

This paper presents an automatic computer-aided diagnosis (CAD) system based on feature ranking for detection of Alzheimer's disease (AD) using structural magnetic resonance imaging (sMRI) data. The proposed CAD system is composed of four systematic stages. First, global and local differences in the gray matter (GM) of AD patients compared to the GM of healthy controls (HCs) are analyzed using a voxel-based morphometry technique. The aim is to identify significant local differences in the volume of GM as volumes of interests (VOIs). Second, the voxel intensity values of the VOIs are extracted as raw features. Third, the raw features are ranked using a seven-feature ranking method, namely, statistical dependency (SD), mutual information (MI), information gain (IG), Pearson's correlation coefficient (PCC), t-test score (TS), Fisher's criterion (FC), and the Gini index (GI). The features with higher scores are more discriminative. To determine the number of top features, the estimated classification error based on training set made up of the AD and HC groups is calculated, with the vector size that minimized this error selected as the top discriminative feature. Fourth, the classification is performed using a support vector machine (SVM). In addition, a data fusion approach among feature ranking methods is introduced to improve the classification performance. The proposed method is evaluated using a data-set from ADNI (130 AD and 130 HC) with 10-fold cross-validation. The classification accuracy of the proposed automatic system for the diagnosis of AD is up to 92.48% using the sMRI data. An automatic CAD system for the classification of AD based on feature-ranking method and classification errors is proposed. In this regard, seven-feature ranking methods (i.e., SD, MI, IG, PCC, TS, FC, and GI) are evaluated. The optimal size of top discriminative features is determined by the classification error estimation in the training phase. The experimental results indicate that the performance of the proposed system is comparative to that of state-of-the-art classification models. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
ECG Based Heart Arrhythmia Detection Using Wavelet Coherence and Bat Algorithm

NASA Astrophysics Data System (ADS)

Kora, Padmavathi; Sri Rama Krishna, K.

2016-12-01

Atrial fibrillation (AF) is a type of heart abnormality, during the AF electrical discharges in the atrium are rapid, results in abnormal heart beat. The morphology of ECG changes due to the abnormalities in the heart. This paper consists of three major steps for the detection of heart diseases: signal pre-processing, feature extraction and classification. Feature extraction is the key process in detecting the heart abnormality. Most of the ECG detection systems depend on the time domain features for cardiac signal classification. In this paper we proposed a wavelet coherence (WTC) technique for ECG signal analysis. The WTC calculates the similarity between two waveforms in frequency domain. Parameters extracted from WTC function is used as the features of the ECG signal. These features are optimized using Bat algorithm. The Levenberg Marquardt neural network classifier is used to classify the optimized features. The performance of the classifier can be improved with the optimized features.

ConvAn: a convergence analyzing tool for optimization of biochemical networks.

PubMed

Kostromins, Andrejs; Mozga, Ivars; Stalidzans, Egils

2012-01-01

Dynamic models of biochemical networks usually are described as a system of nonlinear differential equations. In case of optimization of models for purpose of parameter estimation or design of new properties mainly numerical methods are used. That causes problems of optimization predictability as most of numerical optimization methods have stochastic properties and the convergence of the objective function to the global optimum is hardly predictable. Determination of suitable optimization method and necessary duration of optimization becomes critical in case of evaluation of high number of combinations of adjustable parameters or in case of large dynamic models. This task is complex due to variety of optimization methods, software tools and nonlinearity features of models in different parameter spaces. A software tool ConvAn is developed to analyze statistical properties of convergence dynamics for optimization runs with particular optimization method, model, software tool, set of optimization method parameters and number of adjustable parameters of the model. The convergence curves can be normalized automatically to enable comparison of different methods and models in the same scale. By the help of the biochemistry adapted graphical user interface of ConvAn it is possible to compare different optimization methods in terms of ability to find the global optima or values close to that as well as the necessary computational time to reach them. It is possible to estimate the optimization performance for different number of adjustable parameters. The functionality of ConvAn enables statistical assessment of necessary optimization time depending on the necessary optimization accuracy. Optimization methods, which are not suitable for a particular optimization task, can be rejected if they have poor repeatability or convergence properties. The software ConvAn is freely available on www.biosystems.lv/convan. Copyright Â© 2011 Elsevier Ireland Ltd. All rights reserved.
Conditioning 3D object-based models to dense well data

NASA Astrophysics Data System (ADS)

Wang, Yimin C.; Pyrcz, Michael J.; Catuneanu, Octavian; Boisvert, Jeff B.

2018-06-01

Object-based stochastic simulation models are used to generate categorical variable models with a realistic representation of complicated reservoir heterogeneity. A limitation of object-based modeling is the difficulty of conditioning to dense data. One method to achieve data conditioning is to apply optimization techniques. Optimization algorithms can utilize an objective function measuring the conditioning level of each object while also considering the geological realism of the object. Here, an objective function is optimized with implicit filtering which considers constraints on object parameters. Thousands of objects conditioned to data are generated and stored in a database. A set of objects are selected with linear integer programming to generate the final realization and honor all well data, proportions and other desirable geological features. Although any parameterizable object can be considered, objects from fluvial reservoirs are used to illustrate the ability to simultaneously condition multiple types of geologic features. Channels, levees, crevasse splays and oxbow lakes are parameterized based on location, path, orientation and profile shapes. Functions mimicking natural river sinuosity are used for the centerline model. Channel stacking pattern constraints are also included to enhance the geological realism of object interactions. Spatial layout correlations between different types of objects are modeled. Three case studies demonstrate the flexibility of the proposed optimization-simulation method. These examples include multiple channels with high sinuosity, as well as fragmented channels affected by limited preservation. In all cases the proposed method reproduces input parameters for the object geometries and matches the dense well constraints. The proposed methodology expands the applicability of object-based simulation to complex and heterogeneous geological environments with dense sampling.
Effects of band selection on endmember extraction for forestry applications

NASA Astrophysics Data System (ADS)

Karathanassi, Vassilia; Andreou, Charoula; Andronis, Vassilis; Kolokoussis, Polychronis

2014-10-01

In spectral unmixing theory, data reduction techniques play an important role as hyperspectral imagery contains an immense amount of data, posing many challenging problems such as data storage, computational efficiency, and the so called "curse of dimensionality". Feature extraction and feature selection are the two main approaches for dimensionality reduction. Feature extraction techniques are used for reducing the dimensionality of the hyperspectral data by applying transforms on hyperspectral data. Feature selection techniques retain the physical meaning of the data by selecting a set of bands from the input hyperspectral dataset, which mainly contain the information needed for spectral unmixing. Although feature selection techniques are well-known for their dimensionality reduction potentials they are rarely used in the unmixing process. The majority of the existing state-of-the-art dimensionality reduction methods set criteria to the spectral information, which is derived by the whole wavelength, in order to define the optimum spectral subspace. These criteria are not associated with any particular application but with the data statistics, such as correlation and entropy values. However, each application is associated with specific land c over materials, whose spectral characteristics present variations in specific wavelengths. In forestry for example, many applications focus on tree leaves, in which specific pigments such as chlorophyll, xanthophyll, etc. determine the wavelengths where tree species, diseases, etc., can be detected. For such applications, when the unmixing process is applied, the tree species, diseases, etc., are considered as the endmembers of interest. This paper focuses on investigating the effects of band selection on the endmember extraction by exploiting the information of the vegetation absorbance spectral zones. More precisely, it is explored whether endmember extraction can be optimized when specific sets of initial bands related to leaf spectral characteristics are selected. Experiments comprise application of well-known signal subspace estimation and endmember extraction methods on a hyperspectral imagery that presents a forest area. Evaluation of the extracted endmembers showed that more forest species can be extracted as endmembers using selected bands.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, K; Able, A

Purpose: To evaluate an Enhanced Dynamic Wedge (EDW) as part of machine commission process with feature study. Methods: The EDW system in this study was from a Truebeam, which is the Linear accelerator manufactured by Varian Medical Systems. The EDW feature vectors includes selected elements. These elements were dosimetric output spots check, field size, wedge angles, dose rate, collimator orientation, and different energy settings. Point dose measurement was done by a PTW farmer chamber, and profiles were measured by Gafchromic EBT2 films positing at different depths of the Solidwater based on the study elements. The output spot measurements were donemore » with PTW farmer chamber with Solidwater setting for all orientation and wedge angles in the EDW system. The profiles comparisons were done by IMRT measurement function in RIT software at version 6.3. And the films were scanned by Vidar scanner. Dosimetry calculation were done by using the same Solidwater scanned by GE LightSpeed CT in Eclipse Treatment Planning System (TPS). Then measurements were compared to simulation results in TPS. Results: The energy average percentage difference between chamber measurement and TPS was 0.16% with standard deviation (SD) at 0.93%. For selected features, the average percentage difference between film measurement and computation was 0.93% with SD at 1.55% in horizontal profiles, and 1.18% with SD at 0.98% at vertical profiles. The average gamma difference for film measurement and TPS computing results was at 0.924 with SD at 0.314. Conclusion: A feature vector was developed to describe the commission of EDW, and developing a complete set of features for sufficiency of commission of a LINAC function could provide optimal commission instance with acceptable confident level of clinical application of the machine. Given the institution specific vector pattern and big data process, it could provide wide range clinical outcome comparison information in application of EDW.« less
Optical design and testing: introduction.

PubMed

Liang, Chao-Wen; Koshel, John; Sasian, Jose; Breault, Robert; Wang, Yongtian; Fang, Yi Chin

2014-10-10

Optical design and testing has numerous applications in industrial, military, consumer, and medical settings. Assembling a complete imaging or nonimage optical system may require the integration of optics, mechatronics, lighting technology, optimization, ray tracing, aberration analysis, image processing, tolerance compensation, and display rendering. This issue features original research ranging from the optical design of image and nonimage optical stimuli for human perception, optics applications, bio-optics applications, 3D display, solar energy system, opto-mechatronics to novel imaging or nonimage modalities in visible and infrared spectral imaging, modulation transfer function measurement, and innovative interferometry.
Understanding the Cray X1 System

NASA Technical Reports Server (NTRS)

Cheung, Samson

2004-01-01

This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
The feature-weighted receptive field: an interpretable encoding model for complex feature spaces.

PubMed

St-Yves, Ghislain; Naselaris, Thomas

2017-06-20

We introduce the feature-weighted receptive field (fwRF), an encoding model designed to balance expressiveness, interpretability and scalability. The fwRF is organized around the notion of a feature map-a transformation of visual stimuli into visual features that preserves the topology of visual space (but not necessarily the native resolution of the stimulus). The key assumption of the fwRF model is that activity in each voxel encodes variation in a spatially localized region across multiple feature maps. This region is fixed for all feature maps; however, the contribution of each feature map to voxel activity is weighted. Thus, the model has two separable sets of parameters: "where" parameters that characterize the location and extent of pooling over visual features, and "what" parameters that characterize tuning to visual features. The "where" parameters are analogous to classical receptive fields, while "what" parameters are analogous to classical tuning functions. By treating these as separable parameters, the fwRF model complexity is independent of the resolution of the underlying feature maps. This makes it possible to estimate models with thousands of high-resolution feature maps from relatively small amounts of data. Once a fwRF model has been estimated from data, spatial pooling and feature tuning can be read-off directly with no (or very little) additional post-processing or in-silico experimentation. We describe an optimization algorithm for estimating fwRF models from data acquired during standard visual neuroimaging experiments. We then demonstrate the model's application to two distinct sets of features: Gabor wavelets and features supplied by a deep convolutional neural network. We show that when Gabor feature maps are used, the fwRF model recovers receptive fields and spatial frequency tuning functions consistent with known organizational principles of the visual cortex. We also show that a fwRF model can be used to regress entire deep convolutional networks against brain activity. The ability to use whole networks in a single encoding model yields state-of-the-art prediction accuracy. Our results suggest a wide variety of uses for the feature-weighted receptive field model, from retinotopic mapping with natural scenes, to regressing the activities of whole deep neural networks onto measured brain activity. Copyright © 2017. Published by Elsevier Inc.
Hard x-ray micro-tomography of a human head post-mortem as a gold standard to compare x-ray modalities

NASA Astrophysics Data System (ADS)

Dalstra, M.; Schulz, G.; Dagassan-Berndt, D.; Verna, C.; Müller-Gerbl, M.; Müller, B.

2016-10-01

An entire human head obtained at autopsy was micro-CT scanned in a nano/micro-CT scanner in a 6-hour long session. Despite the size of the head, it could still be scanned with a pixel size of 70 μm. The aim of this study was to obtain an optimal quality 3D data-set to be used as baseline control in a larger study comparing the image quality of various cone beam CT systems currently used in dentistry. The image quality of the micro-CT scans was indeed better than the ones of the clinical imaging modalities, both with regard to noise and streak artifacts due to metal dental implants. Bony features in the jaws, like the trabecular architecture and the thin wall of the alveolar bone were clearly visible. Therefore, the 3D micro-CT data-set can be used as the gold standard for linear, angular, and volumetric measurements of anatomical features in and around the oral cavity when comparing clinical imaging modalities.
The construction of support vector machine classifier using the firefly algorithm.

PubMed

Chao, Chih-Feng; Horng, Ming-Huwi

2015-01-01

The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

PubMed Central

Chao, Chih-Feng; Horng, Ming-Huwi

2015-01-01

The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511
Performance Analysis of Continuous Black-Box Optimization Algorithms via Footprints in Instance Space.

PubMed

Muñoz, Mario A; Smith-Miles, Kate A

2017-01-01

This article presents a method for the objective assessment of an algorithm's strengths and weaknesses. Instead of examining the performance of only one or more algorithms on a benchmark set, or generating custom problems that maximize the performance difference between two algorithms, our method quantifies both the nature of the test instances and the algorithm performance. Our aim is to gather information about possible phase transitions in performance, that is, the points in which a small change in problem structure produces algorithm failure. The method is based on the accurate estimation and characterization of the algorithm footprints, that is, the regions of instance space in which good or exceptional performance is expected from an algorithm. A footprint can be estimated for each algorithm and for the overall portfolio. Therefore, we select a set of features to generate a common instance space, which we validate by constructing a sufficiently accurate prediction model. We characterize the footprints by their area and density. Our method identifies complementary performance between algorithms, quantifies the common features of hard problems, and locates regions where a phase transition may lie.
A high-performance seizure detection algorithm based on Discrete Wavelet Transform (DWT) and EEG

PubMed Central

Chen, Duo; Wan, Suiren; Xiang, Jing; Bao, Forrest Sheng

2017-01-01

In the past decade, Discrete Wavelet Transform (DWT), a powerful time-frequency tool, has been widely used in computer-aided signal analysis of epileptic electroencephalography (EEG), such as the detection of seizures. One of the important hurdles in the applications of DWT is the settings of DWT, which are chosen empirically or arbitrarily in previous works. The objective of this study aimed to develop a framework for automatically searching the optimal DWT settings to improve accuracy and to reduce computational cost of seizure detection. To address this, we developed a method to decompose EEG data into 7 commonly used wavelet families, to the maximum theoretical level of each mother wavelet. Wavelets and decomposition levels providing the highest accuracy in each wavelet family were then searched in an exhaustive selection of frequency bands, which showed optimal accuracy and low computational cost. The selection of frequency bands and features removed approximately 40% of redundancies. The developed algorithm achieved promising performance on two well-tested EEG datasets (accuracy >90% for both datasets). The experimental results of the developed method have demonstrated that the settings of DWT affect its performance on seizure detection substantially. Compared with existing seizure detection methods based on wavelet, the new approach is more accurate and transferable among datasets. PMID:28278203
Volumetric depth peeling for medical image display

NASA Astrophysics Data System (ADS)

Borland, David; Clarke, John P.; Fielding, Julia R.; TaylorII, Russell M.

2006-01-01

Volumetric depth peeling (VDP) is an extension to volume rendering that enables display of otherwise occluded features in volume data sets. VDP decouples occlusion calculation from the volume rendering transfer function, enabling independent optimization of settings for rendering and occlusion. The algorithm is flexible enough to handle multiple regions occluding the object of interest, as well as object self-occlusion, and requires no pre-segmentation of the data set. VDP was developed as an improvement for virtual arthroscopy for the diagnosis of shoulder-joint trauma, and has been generalized for use in other simple and complex joints, and to enable non-invasive urology studies. In virtual arthroscopy, the surfaces in the joints often occlude each other, allowing limited viewpoints from which to evaluate these surfaces. In urology studies, the physician would like to position the virtual camera outside the kidney collecting system and see inside it. By rendering invisible all voxels between the observer's point of view and objects of interest, VDP enables viewing from unconstrained positions. In essence, VDP can be viewed as a technique for automatically defining an optimal data- and task-dependent clipping surface. Radiologists using VDP display have been able to perform evaluations of pathologies more easily and more rapidly than with clinical arthroscopy, standard volume rendering, or standard MRI/CT slice viewing.
Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification

PubMed Central

Wen, Tingxi; Zhang, Zhongnan

2017-01-01

Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789
Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification.

PubMed

Wen, Tingxi; Zhang, Zhongnan

2017-05-01

In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
Optimization in Radiation Therapy: Applications in Brachytherapy and Intensity Modulated Radiation Therapy

NASA Astrophysics Data System (ADS)

McGeachy, Philip David

Over 50% of cancer patients require radiation therapy (RT). RT is an optimization problem requiring maximization of the radiation damage to the tumor while minimizing the harm to the healthy tissues. This dissertation focuses on two main RT optimization problems: 1) brachytherapy and 2) intensity modulated radiation therapy (IMRT). The brachytherapy research involved solving a non-convex optimization problem by creating an open-source genetic algorithm optimizer to determine the optimal radioactive seed distribution for a given set of patient volumes and constraints, both dosimetric- and implant-based. The optimizer was tested for a set of 45 prostate brachytherapy patients. While all solutions met the clinical standards, they also benchmarked favorably with those generated by a standard commercial solver. Compared to its compatriot, the salient features of the generated solutions were: slightly reduced prostate coverage, lower dose to the urethra and rectum, and a smaller number of needles required for an implant. Historically, IMRT requires modulation of fluence while keeping the photon beam energy fixed. The IMRT-related investigation in this thesis aimed at broadening the solution space by varying photon energy. The problem therefore involved simultaneous optimization of photon beamlet energy and fluence, denoted by XMRT. Formulating the problem as convex, linear programming was applied to obtain solutions for optimal energy-dependent fluences, while achieving all clinical objectives and constraints imposed. Dosimetric advantages of XMRT over single-energy IMRT in the improved sparing of organs at risk (OARs) was demonstrated in simplified phantom studies. The XMRT algorithm was improved to include clinical dose-volume constraints and clinical studies for prostate and head and neck cancer patients were investigated. Compared to IMRT, XMRT provided improved dosimetric benefit in the prostate case, particularly within intermediate- to low-dose regions (≤ 40 Gy) for OARs. For head and neck cases, XMRT solutions showed no significant disadvantage or advantage over IMRT. The deliverability concerns for the fluence maps generated from XMRT were addressed by incorporating smoothing constraints during the optimization and through successful generation of treatment machine files. Further research is needed to explore the full potential of the XMRT approach to RT.
Research on Optimal Observation Scale for Damaged Buildings after Earthquake Based on Optimal Feature Space

NASA Astrophysics Data System (ADS)

Chen, J.; Chen, W.; Dou, A.; Li, W.; Sun, Y.

2018-04-01

A new information extraction method of damaged buildings rooted in optimal feature space is put forward on the basis of the traditional object-oriented method. In this new method, ESP (estimate of scale parameter) tool is used to optimize the segmentation of image. Then the distance matrix and minimum separation distance of all kinds of surface features are calculated through sample selection to find the optimal feature space, which is finally applied to extract the image of damaged buildings after earthquake. The overall extraction accuracy reaches 83.1 %, the kappa coefficient 0.813. The new information extraction method greatly improves the extraction accuracy and efficiency, compared with the traditional object-oriented method, and owns a good promotional value in the information extraction of damaged buildings. In addition, the new method can be used for the information extraction of different-resolution images of damaged buildings after earthquake, then to seek the optimal observation scale of damaged buildings through accuracy evaluation. It is supposed that the optimal observation scale of damaged buildings is between 1 m and 1.2 m, which provides a reference for future information extraction of damaged buildings.
PTBS segmentation scheme for synthetic aperture radar

NASA Astrophysics Data System (ADS)

Friedland, Noah S.; Rothwell, Brian J.

1995-07-01

The Image Understanding Group at Martin Marietta Technologies in Denver, Colorado has developed a model-based synthetic aperture radar (SAR) automatic target recognition (ATR) system using an integrated resource architecture (IRA). IRA, an adaptive Markov random field (MRF) environment, utilizes information from image, model, and neighborhood resources to create a discrete, 2D feature-based world description (FBWD). The IRA FBWD features are peak, target, background and shadow (PTBS). These features have been shown to be very useful for target discrimination. The FBWD is used to accrue evidence over a model hypothesis set. This paper presents the PTBS segmentation process utilizing two IRA resources. The image resource (IR) provides generic (the physics of image formation) and specific (the given image input) information. The neighborhood resource (NR) provides domain knowledge of localized FBWD site behaviors. A simulated annealing optimization algorithm is used to construct a `most likely' PTBS state. Results on simulated imagery illustrate the power of this technique to correctly segment PTBS features, even when vehicle signatures are immersed in heavy background clutter. These segmentations also suppress sidelobe effects and delineate shadows.
Detection of epileptiform activity in EEG signals based on time-frequency and non-linear analysis

PubMed Central

Gajic, Dragoljub; Djurovic, Zeljko; Gligorijevic, Jovan; Di Gennaro, Stefano; Savic-Gajic, Ivana

2015-01-01

We present a new technique for detection of epileptiform activity in EEG signals. After preprocessing of EEG signals we extract representative features in time, frequency and time-frequency domain as well as using non-linear analysis. The features are extracted in a few frequency sub-bands of clinical interest since these sub-bands showed much better discriminatory characteristics compared with the whole frequency band. Then we optimally reduce the dimension of feature space to two using scatter matrices. A decision about the presence of epileptiform activity in EEG signals is made by quadratic classifiers designed in the reduced two-dimensional feature space. The accuracy of the technique was tested on three sets of electroencephalographic (EEG) signals recorded at the University Hospital Bonn: surface EEG signals from healthy volunteers, intracranial EEG signals from the epilepsy patients during the seizure free interval from within the seizure focus and intracranial EEG signals of epileptic seizures also from within the seizure focus. An overall detection accuracy of 98.7% was achieved. PMID:25852534
X-ray optics simulation and beamline design for the APS upgrade

NASA Astrophysics Data System (ADS)

Shi, Xianbo; Reininger, Ruben; Harder, Ross; Haeffner, Dean

2017-08-01

The upgrade of the Advanced Photon Source (APS) to a Multi-Bend Achromat (MBA) will increase the brightness of the APS by between two and three orders of magnitude. The APS upgrade (APS-U) project includes a list of feature beamlines that will take full advantage of the new machine. Many of the existing beamlines will be also upgraded to profit from this significant machine enhancement. Optics simulations are essential in the design and optimization of these new and existing beamlines. In this contribution, the simulation tools used and developed at APS, ranging from analytical to numerical methods, are summarized. Three general optical layouts are compared in terms of their coherence control and focusing capabilities. The concept of zoom optics, where two sets of focusing elements (e.g., CRLs and KB mirrors) are used to provide variable beam sizes at a fixed focal plane, is optimized analytically. The effects of figure errors on the vertical spot size and on the local coherence along the vertical direction of the optimized design are investigated.

Optimal methodologies for terahertz time-domain spectroscopic analysis of traditional pigments in powder form

NASA Astrophysics Data System (ADS)

Ha, Taewoo; Lee, Howon; Sim, Kyung Ik; Kim, Jonghyeon; Jo, Young Chan; Kim, Jae Hoon; Baek, Na Yeon; Kang, Dai-ill; Lee, Han Hyoung

2017-05-01

We have established optimal methods for terahertz time-domain spectroscopic analysis of highly absorbing pigments in powder form based on our investigation of representative traditional Chinese pigments, such as azurite [blue-based color pigment], Chinese vermilion [red-based color pigment], and arsenic yellow [yellow-based color pigment]. To accurately extract the optical constants in the terahertz region of 0.1 - 3 THz, we carried out transmission measurements in such a way that intense absorption peaks did not completely suppress the transmission level. This required preparation of pellet samples with optimized thicknesses and material densities. In some cases, mixing the pigments with polyethylene powder was required to minimize absorption due to certain peak features. The resulting distortion-free terahertz spectra of the investigated set of pigment species exhibited well-defined unique spectral fingerprints. Our study will be useful to future efforts to establish non-destructive analysis methods of traditional pigments, to construct their spectral databases, and to apply these tools to restoration of cultural heritage materials.
Narcissistic personality and risk perception among Chinese aviators: The mediating role of promotion focus.

PubMed

Ju, Chengting; Ji, Ming; Lan, Jijun; You, Xuqun

2017-12-01

Optimism bias is a crucial feature of risk perception that leads to increased risk-taking behaviour, which is a particularly salient issue among pilots in aviation settings due to the high-stakes nature of flight. The current study sought to address the roles of narcissism and promotion focus on optimism bias in risk perception in aviation context. Participants were 239 male flight cadets from the Civil Aviation Flight University of China who completed the Narcissistic Personality Inventory-13, the Work Regulatory Focus Scale, and an indirect measure of unrealistic optimism in risk perception, which measured risk perception for the individual and the risk assumed by other individuals performing the same task. Higher narcissism increased the likelihood of underestimating personal risks, an effect that was mediated by high promotion focus motivation, such that high narcissism led to high promotion focus motivation. The findings have important implications for improving the accuracy of risk perception in aviation risks among aviators. © 2016 International Union of Psychological Science.
Towards an optimal flow: Density-of-states-informed replica-exchange simulations

DOE PAGES

Vogel, Thomas; Perez, Danny

2015-11-05

Here we learn that replica exchange (RE) is one of the most popular enhanced-sampling simulations technique in use today. Despite widespread successes, RE simulations can sometimes fail to converge in practical amounts of time, e.g., when sampling around phase transitions, or when a few hard-to-find configurations dominate the statistical averages. We introduce a generalized RE scheme, density-of-states-informed RE, that addresses some of these challenges. The key feature of our approach is to inform the simulation with readily available, but commonly unused, information on the density of states of the system as the RE simulation proceeds. This enables two improvements, namely,more » the introduction of resampling moves that actively move the system towards equilibrium and the continual adaptation of the optimal temperature set. As a consequence of these two innovations, we show that the configuration flow in temperature space is optimized and that the overall convergence of RE simulations can be dramatically accelerated.« less
Energy-optimal path planning by stochastic dynamically orthogonal level-set optimization

NASA Astrophysics Data System (ADS)

Subramani, Deepak N.; Lermusiaux, Pierre F. J.

2016-04-01

A stochastic optimization methodology is formulated for computing energy-optimal paths from among time-optimal paths of autonomous vehicles navigating in a dynamic flow field. Based on partial differential equations, the methodology rigorously leverages the level-set equation that governs time-optimal reachability fronts for a given relative vehicle-speed function. To set up the energy optimization, the relative vehicle-speed and headings are considered to be stochastic and new stochastic Dynamically Orthogonal (DO) level-set equations are derived. Their solution provides the distribution of time-optimal reachability fronts and corresponding distribution of time-optimal paths. An optimization is then performed on the vehicle's energy-time joint distribution to select the energy-optimal paths for each arrival time, among all stochastic time-optimal paths for that arrival time. Numerical schemes to solve the reduced stochastic DO level-set equations are obtained, and accuracy and efficiency considerations are discussed. These reduced equations are first shown to be efficient at solving the governing stochastic level-sets, in part by comparisons with direct Monte Carlo simulations. To validate the methodology and illustrate its accuracy, comparisons with semi-analytical energy-optimal path solutions are then completed. In particular, we consider the energy-optimal crossing of a canonical steady front and set up its semi-analytical solution using a energy-time nested nonlinear double-optimization scheme. We then showcase the inner workings and nuances of the energy-optimal path planning, considering different mission scenarios. Finally, we study and discuss results of energy-optimal missions in a wind-driven barotropic quasi-geostrophic double-gyre ocean circulation.
Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

PubMed Central

Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

2015-01-01

In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141
Bias in error estimation when using cross-validation for model selection.

PubMed

Varma, Sudhir; Simon, Richard

2006-02-23

Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

PubMed

Xu, Wenxuan; Zhang, Li; Lu, Yaping

2016-06-01

The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.
Ultra-deep Large Binocular Camera U-band Imaging of the GOODS-North Field: Depth Versus Resolution

NASA Astrophysics Data System (ADS)

Ashcraft, Teresa A.; Windhorst, Rogier A.; Jansen, Rolf A.; Cohen, Seth H.; Grazian, Andrea; Paris, Diego; Fontana, Adriano; Giallongo, Emanuele; Speziali, Roberto; Testa, Vincenzo; Boutsia, Konstantina; O’Connell, Robert W.; Rutkowski, Michael J.; Ryan, Russell E.; Scarlata, Claudia; Weiner, Benjamin

2018-06-01

We present a study of the trade-off between depth and resolution using a large number of U-band imaging observations in the GOODS-North field from the Large Binocular Camera (LBC) on the Large Binocular Telescope (LBT). Having acquired over 30 hr of data (315 images with 5–6 minutes exposures), we generated multiple image mosaics, starting with the best atmospheric seeing images (FWHM ≲ 0.″8), which constitute ∼10% of the total data set. For subsequent mosaics, we added in data with larger seeing values until the final, deepest mosaic included all images with FWHM ≲ 1.″8 (∼94% of the total data set). From the mosaics, we made object catalogs to compare the optimal-resolution, yet shallower image to the lower-resolution but deeper image. We show that the number counts for both images are ∼90% complete to U AB ≲ 26 mag. Fainter than U AB ∼ 27 mag, the object counts from the optimal-resolution image start to drop-off dramatically (90% between U AB = 27 and 28 mag), while the deepest image with better surface-brightness sensitivity ({μ }U{AB} ≲ 32 mag arcsec‑2) show a more gradual drop (10% between U AB ≃ 27 and 28 mag). For the brightest galaxies within the GOODS-N field, structure and clumpy features within the galaxies are more prominent in the optimal-resolution image compared to the deeper mosaics. We conclude that for studies of brighter galaxies and features within them, the optimal-resolution image should be used. However, to fully explore and understand the faintest objects, the deeper imaging with lower resolution are also required. Finally, we find—for 220 brighter galaxies with U AB ≲ 23 mag—only marginal differences in total flux between the optimal-resolution and lower-resolution light-profiles to {μ }U{AB} ≲ 32 mag arcsec‑2. In only 10% of the cases are the total-flux differences larger than 0.5 mag. This helps constrain how much flux can be missed from galaxy outskirts, which is important for studies of the Extragalactic Background Light. Based on data acquired using the Large Binocular Telescope (LBT).
Real-time hypoglycemia detection from continuous glucose monitoring data of subjects with type 1 diabetes.

PubMed

Jensen, Morten Hasselstrøm; Christensen, Toke Folke; Tarnow, Lise; Seto, Edmund; Dencker Johansen, Mette; Hejlesen, Ole Kristian

2013-07-01

Hypoglycemia is a potentially fatal condition. Continuous glucose monitoring (CGM) has the potential to detect hypoglycemia in real time and thereby reduce time in hypoglycemia and avoid any further decline in blood glucose level. However, CGM is inaccurate and shows a substantial number of cases in which the hypoglycemic event is not detected by the CGM. The aim of this study was to develop a pattern classification model to optimize real-time hypoglycemia detection. Features such as time since last insulin injection and linear regression, kurtosis, and skewness of the CGM signal in different time intervals were extracted from data of 10 male subjects experiencing 17 insulin-induced hypoglycemic events in an experimental setting. Nondiscriminative features were eliminated with SEPCOR and forward selection. The feature combinations were used in a Support Vector Machine model and the performance assessed by sample-based sensitivity and specificity and event-based sensitivity and number of false-positives. The best model was composed by using seven features and was able to detect 17 of 17 hypoglycemic events with one false-positive compared with 12 of 17 hypoglycemic events with zero false-positives for the CGM alone. Lead-time was 14 min and 0 min for the model and the CGM alone, respectively. This optimized real-time hypoglycemia detection provides a unique approach for the diabetes patient to reduce time in hypoglycemia and learn about patterns in glucose excursions. Although these results are promising, the model needs to be validated on CGM data from patients with spontaneous hypoglycemic events.
Clinical characteristics and psychiatric comorbidity of pyromania.

PubMed

Grant, Jon E; Won Kim, Suck

2007-11-01

There have been few systematic studies of individuals with pyromania, and this paucity of research has hindered our understanding and treatment of this disorder. This study details the demographic and phenomenological features of individuals with DSM-IV lifetime pyromania. Twenty-one adult and adolescent subjects (recruited from inpatient and outpatient studies of impulse-control disorders) with lifetime DSM-IV pyromania were administered a semi-structured interview to elicit demographic data and information on the phenomenology, age at onset, and associated features of the disorder. Data were collected from October 2003 to September 2006. Twenty-one subjects (10 female [47.6%]) with lifetime pyromania (mean +/- SD age = 26.1 +/- 11.8 years; range, 15-49 years) were studied. The mean +/- SD age at onset for pyromania was 18.1 +/- 5.8 years. Eighteen subjects (85.7%) reported urges to set fires. Subjects reported a mean +/- SD frequency of setting 1 fire every 5.9 +/- 3.8 weeks. Much of the fire setting did not meet the legal definition of arson. Thirteen (61.9%) had a current comorbid Axis I mood disorder, and 10 (47.6%) met criteria for a current impulse-control disorder. Pyromania appears to be associated with high rates of psychiatric comorbidity. Research is needed to optimize patient care for individuals with this disorder.
Autonomous Optimization of Targeted Stimulation of Neuronal Networks

PubMed Central

Kumar, Sreedhar S.; Wülfing, Jan; Okujeni, Samora; Boedecker, Joschka; Riedmiller, Martin

2016-01-01

Driven by clinical needs and progress in neurotechnology, targeted interaction with neuronal networks is of increasing importance. Yet, the dynamics of interaction between intrinsic ongoing activity in neuronal networks and their response to stimulation is unknown. Nonetheless, electrical stimulation of the brain is increasingly explored as a therapeutic strategy and as a means to artificially inject information into neural circuits. Strategies using regular or event-triggered fixed stimuli discount the influence of ongoing neuronal activity on the stimulation outcome and are therefore not optimal to induce specific responses reliably. Yet, without suitable mechanistic models, it is hardly possible to optimize such interactions, in particular when desired response features are network-dependent and are initially unknown. In this proof-of-principle study, we present an experimental paradigm using reinforcement-learning (RL) to optimize stimulus settings autonomously and evaluate the learned control strategy using phenomenological models. We asked how to (1) capture the interaction of ongoing network activity, electrical stimulation and evoked responses in a quantifiable ‘state’ to formulate a well-posed control problem, (2) find the optimal state for stimulation, and (3) evaluate the quality of the solution found. Electrical stimulation of generic neuronal networks grown from rat cortical tissue in vitro evoked bursts of action potentials (responses). We show that the dynamic interplay of their magnitudes and the probability to be intercepted by spontaneous events defines a trade-off scenario with a network-specific unique optimal latency maximizing stimulus efficacy. An RL controller was set to find this optimum autonomously. Across networks, stimulation efficacy increased in 90% of the sessions after learning and learned latencies strongly agreed with those predicted from open-loop experiments. Our results show that autonomous techniques can exploit quantitative relationships underlying activity-response interaction in biological neuronal networks to choose optimal actions. Simple phenomenological models can be useful to validate the quality of the resulting controllers. PMID:27509295
Autonomous Optimization of Targeted Stimulation of Neuronal Networks.

PubMed

Kumar, Sreedhar S; Wülfing, Jan; Okujeni, Samora; Boedecker, Joschka; Riedmiller, Martin; Egert, Ulrich

2016-08-01

Driven by clinical needs and progress in neurotechnology, targeted interaction with neuronal networks is of increasing importance. Yet, the dynamics of interaction between intrinsic ongoing activity in neuronal networks and their response to stimulation is unknown. Nonetheless, electrical stimulation of the brain is increasingly explored as a therapeutic strategy and as a means to artificially inject information into neural circuits. Strategies using regular or event-triggered fixed stimuli discount the influence of ongoing neuronal activity on the stimulation outcome and are therefore not optimal to induce specific responses reliably. Yet, without suitable mechanistic models, it is hardly possible to optimize such interactions, in particular when desired response features are network-dependent and are initially unknown. In this proof-of-principle study, we present an experimental paradigm using reinforcement-learning (RL) to optimize stimulus settings autonomously and evaluate the learned control strategy using phenomenological models. We asked how to (1) capture the interaction of ongoing network activity, electrical stimulation and evoked responses in a quantifiable 'state' to formulate a well-posed control problem, (2) find the optimal state for stimulation, and (3) evaluate the quality of the solution found. Electrical stimulation of generic neuronal networks grown from rat cortical tissue in vitro evoked bursts of action potentials (responses). We show that the dynamic interplay of their magnitudes and the probability to be intercepted by spontaneous events defines a trade-off scenario with a network-specific unique optimal latency maximizing stimulus efficacy. An RL controller was set to find this optimum autonomously. Across networks, stimulation efficacy increased in 90% of the sessions after learning and learned latencies strongly agreed with those predicted from open-loop experiments. Our results show that autonomous techniques can exploit quantitative relationships underlying activity-response interaction in biological neuronal networks to choose optimal actions. Simple phenomenological models can be useful to validate the quality of the resulting controllers.
Numerical optimization of three-dimensional coils for NSTX-U

NASA Astrophysics Data System (ADS)

Lazerson, S. A.; Park, J.-K.; Logan, N.; Boozer, A.

2015-10-01

A tool for the calculation of optimal three-dimensional (3D) perturbative magnetic fields in tokamaks has been developed. The IPECOPT code builds upon the stellarator optimization code STELLOPT to allow for optimization of linear ideal magnetohydrodynamic perturbed equilibrium (IPEC). This tool has been applied to NSTX-U equilibria, addressing which fields are the most effective at driving NTV torques. The NTV torque calculation is performed by the PENT code. Optimization of the normal field spectrum shows that fields with n = 1 character can drive a large core torque. It is also shown that fields with n = 3 features are capable of driving edge torque and some core torque. Coil current optimization (using the planned in-vessel and existing RWM coils) on NSTX-U suggest the planned coils set is adequate for core and edge torque control. Comparison between error field correction experiments on DIII-D and the optimizer show good agreement. Notice: This manuscript has been authored by Princeton University under Contract Number DE-AC02-09CH11466 with the U.S. Department of Energy. The publisher, by accepting the article for publication acknowledges, that the United States Government retains a non-exclusive,paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Optimal Eye-Gaze Fixation Position for Face-Related Neural Responses

PubMed Central

Zerouali, Younes; Lina, Jean-Marc; Jemel, Boutheina

2013-01-01

It is generally agreed that some features of a face, namely the eyes, are more salient than others as indexed by behavioral diagnosticity, gaze-fixation patterns and evoked-neural responses. However, because previous studies used unnatural stimuli, there is no evidence so far that the early encoding of a whole face in the human brain is based on the eyes or other facial features. To address this issue, scalp electroencephalogram (EEG) and eye gaze-fixations were recorded simultaneously in a gaze-contingent paradigm while observers viewed faces. We found that the N170 indexing the earliest face-sensitive response in the human brain was the largest when the fixation position is located around the nasion. Interestingly, for inverted faces, this optimal fixation position was more variable, but mainly clustered in the upper part of the visual field (around the mouth). These observations extend the findings of recent behavioral studies, suggesting that the early encoding of a face, as indexed by the N170, is not driven by the eyes per se, but rather arises from a general perceptual setting (upper-visual field advantage) coupled with the alignment of a face stimulus to a stored face template. PMID:23762224
Optimal eye-gaze fixation position for face-related neural responses.

PubMed

Zerouali, Younes; Lina, Jean-Marc; Jemel, Boutheina

2013-01-01

It is generally agreed that some features of a face, namely the eyes, are more salient than others as indexed by behavioral diagnosticity, gaze-fixation patterns and evoked-neural responses. However, because previous studies used unnatural stimuli, there is no evidence so far that the early encoding of a whole face in the human brain is based on the eyes or other facial features. To address this issue, scalp electroencephalogram (EEG) and eye gaze-fixations were recorded simultaneously in a gaze-contingent paradigm while observers viewed faces. We found that the N170 indexing the earliest face-sensitive response in the human brain was the largest when the fixation position is located around the nasion. Interestingly, for inverted faces, this optimal fixation position was more variable, but mainly clustered in the upper part of the visual field (around the mouth). These observations extend the findings of recent behavioral studies, suggesting that the early encoding of a face, as indexed by the N170, is not driven by the eyes per se, but rather arises from a general perceptual setting (upper-visual field advantage) coupled with the alignment of a face stimulus to a stored face template.
Gas chimney detection based on improving the performance of combined multilayer perceptron and support vector classifier

NASA Astrophysics Data System (ADS)

Hashemi, H.; Tax, D. M. J.; Duin, R. P. W.; Javaherian, A.; de Groot, P.

2008-11-01

Seismic object detection is a relatively new field in which 3-D bodies are visualized and spatial relationships between objects of different origins are studied in order to extract geologic information. In this paper, we propose a method for finding an optimal classifier with the help of a statistical feature ranking technique and combining different classifiers. The method, which has general applicability, is demonstrated here on a gas chimney detection problem. First, we evaluate a set of input seismic attributes extracted at locations labeled by a human expert using regularized discriminant analysis (RDA). In order to find the RDA score for each seismic attribute, forward and backward search strategies are used. Subsequently, two non-linear classifiers: multilayer perceptron (MLP) and support vector classifier (SVC) are run on the ranked seismic attributes. Finally, to capitalize on the intrinsic differences between both classifiers, the MLP and SVC results are combined using logical rules of maximum, minimum and mean. The proposed method optimizes the ranked feature space size and yields the lowest classification error in the final combined result. We will show that the logical minimum reveals gas chimneys that exhibit both the softness of MLP and the resolution of SVC classifiers.
Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features

PubMed Central

Zhang, Yaoyun; Jiang, Min; Wang, Jingqi; Xu, Hua

2016-01-01

Semantic role labeling (SRL), which extracts shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding clinical narratives. Since semantic roles are formed by syntactic constituents in the sentence, an effective parser, as well as an effective syntactic feature set are essential to build a practical SRL system. Our study initiates a formal evaluation and comparison of SRL performance on a clinical text corpus MiPACQ, using three state-of-the-art parsers, the Stanford parser, the Berkeley parser, and the Charniak parser. First, the original parsers trained on the open domain syntactic corpus Penn Treebank were employed. Next, those parsers were retrained on the clinical Treebank of MiPACQ for further comparison. Additionally, state-of-the-art syntactic features from open domain SRL were also examined for clinical text. Experimental results showed that retraining the parsers on clinical Treebank improved the performance significantly, with an optimal F1 measure of 71.41% achieved by the Berkeley parser. PMID:28269926
Automated extraction and analysis of rock discontinuity characteristics from 3D point clouds

NASA Astrophysics Data System (ADS)

Bianchetti, Matteo; Villa, Alberto; Agliardi, Federico; Crosta, Giovanni B.

2016-04-01

A reliable characterization of fractured rock masses requires an exhaustive geometrical description of discontinuities, including orientation, spacing, and size. These are required to describe discontinuum rock mass structure, perform Discrete Fracture Network and DEM modelling, or provide input for rock mass classification or equivalent continuum estimate of rock mass properties. Although several advanced methodologies have been developed in the last decades, a complete characterization of discontinuity geometry in practice is still challenging, due to scale-dependent variability of fracture patterns and difficult accessibility to large outcrops. Recent advances in remote survey techniques, such as terrestrial laser scanning and digital photogrammetry, allow a fast and accurate acquisition of dense 3D point clouds, which promoted the development of several semi-automatic approaches to extract discontinuity features. Nevertheless, these often need user supervision on algorithm parameters which can be difficult to assess. To overcome this problem, we developed an original Matlab tool, allowing fast, fully automatic extraction and analysis of discontinuity features with no requirements on point cloud accuracy, density and homogeneity. The tool consists of a set of algorithms which: (i) process raw 3D point clouds, (ii) automatically characterize discontinuity sets, (iii) identify individual discontinuity surfaces, and (iv) analyse their spacing and persistence. The tool operates in either a supervised or unsupervised mode, starting from an automatic preliminary exploration data analysis. The identification and geometrical characterization of discontinuity features is divided in steps. First, coplanar surfaces are identified in the whole point cloud using K-Nearest Neighbor and Principal Component Analysis algorithms optimized on point cloud accuracy and specified typical facet size. Then, discontinuity set orientation is calculated using Kernel Density Estimation and principal vector similarity criteria. Poles to points are assigned to individual discontinuity objects using easy custom vector clustering and Jaccard distance approaches, and each object is segmented into planar clusters using an improved version of the DBSCAN algorithm. Modal set orientations are then recomputed by cluster-based orientation statistics to avoid the effects of biases related to cluster size and density heterogeneity of the point cloud. Finally, spacing values are measured between individual discontinuity clusters along scanlines parallel to modal pole vectors, whereas individual feature size (persistence) is measured using 3D convex hull bounding boxes. Spacing and size are provided both as raw population data and as summary statistics. The tool is optimized for parallel computing on 64bit systems, and a Graphic User Interface (GUI) has been developed to manage data processing, provide several outputs, including reclassified point clouds, tables, plots, derived fracture intensity parameters, and export to modelling software tools. We present test applications performed both on synthetic 3D data (simple 3D solids) and real case studies, validating the results with existing geomechanical datasets.
Particle Swarm Optimization approach to defect detection in armour ceramics.

PubMed

Kesharaju, Manasa; Nagarajah, Romesh

2017-03-01

In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function. Copyright © 2016. Published by Elsevier B.V.
Linear Power-Flow Models in Multiphase Distribution Networks: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, Andrey; Dall'Anese, Emiliano

This paper considers multiphase unbalanced distribution systems and develops approximate power-flow models where bus-voltages, line-currents, and powers at the point of common coupling are linearly related to the nodal net power injections. The linearization approach is grounded on a fixed-point interpretation of the AC power-flow equations, and it is applicable to distribution systems featuring (i) wye connections; (ii) ungrounded delta connections; (iii) a combination of wye-connected and delta-connected sources/loads; and, (iv) a combination of line-to-line and line-to-grounded-neutral devices at the secondary of distribution transformers. The proposed linear models can facilitate the development of computationally-affordable optimization and control applications -- frommore » advanced distribution management systems settings to online and distributed optimization routines. Performance of the proposed models is evaluated on different test feeders.« less

Classifying EEG for Brain-Computer Interface: Learning Optimal Filters for Dynamical System Features

PubMed Central

Song, Le; Epps, Julien

2007-01-01

Classification of multichannel EEG recordings during motor imagination has been exploited successfully for brain-computer interfaces (BCI). In this paper, we consider EEG signals as the outputs of a networked dynamical system (the cortex), and exploit synchronization features from the dynamical system for classification. Herein, we also propose a new framework for learning optimal filters automatically from the data, by employing a Fisher ratio criterion. Experimental evaluations comparing the proposed dynamical system features with the CSP and the AR features reveal their competitive performance during classification. Results also show the benefits of employing the spatial and the temporal filters optimized using the proposed learning approach. PMID:18364986
Operator for object recognition and scene analysis by estimation of set occupancy with noisy and incomplete data sets

NASA Astrophysics Data System (ADS)

Rees, S. J.; Jones, Bryan F.

1992-11-01

Once feature extraction has occurred in a processed image, the recognition problem becomes one of defining a set of features which maps sufficiently well onto one of the defined shape/object models to permit a claimed recognition. This process is usually handled by aggregating features until a large enough weighting is obtained to claim membership, or an adequate number of located features are matched to the reference set. A requirement has existed for an operator or measure capable of a more direct assessment of membership/occupancy between feature sets, particularly where the feature sets may be defective representations. Such feature set errors may be caused by noise, by overlapping of objects, and by partial obscuration of features. These problems occur at the point of acquisition: repairing the data would then assume a priori knowledge of the solution. The technique described in this paper offers a set theoretical measure for partial occupancy defined in terms of the set of minimum additions to permit full occupancy and the set of locations of occupancy if such additions are made. As is shown, this technique permits recognition of partial feature sets with quantifiable degrees of uncertainty. A solution to the problems of obscuration and overlapping is therefore available.
Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

PubMed Central

Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor

2015-01-01

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757
A global optimization approach to multi-polarity sentiment analysis.

PubMed

Li, Xinmiao; Li, Jing; Wu, Yukeng

2015-01-01

Following the rapid development of social media, sentiment analysis has become an important social media mining technique. The performance of automatic sentiment analysis primarily depends on feature selection and sentiment classification. While information gain (IG) and support vector machines (SVM) are two important techniques, few studies have optimized both approaches in sentiment analysis. The effectiveness of applying a global optimization approach to sentiment analysis remains unclear. We propose a global optimization-based sentiment analysis (PSOGO-Senti) approach to improve sentiment analysis with IG for feature selection and SVM as the learning engine. The PSOGO-Senti approach utilizes a particle swarm optimization algorithm to obtain a global optimal combination of feature dimensions and parameters in the SVM. We evaluate the PSOGO-Senti model on two datasets from different fields. The experimental results showed that the PSOGO-Senti model can improve binary and multi-polarity Chinese sentiment analysis. We compared the optimal feature subset selected by PSOGO-Senti with the features in the sentiment dictionary. The results of this comparison indicated that PSOGO-Senti can effectively remove redundant and noisy features and can select a domain-specific feature subset with a higher-explanatory power for a particular sentiment analysis task. The experimental results showed that the PSOGO-Senti approach is effective and robust for sentiment analysis tasks in different domains. By comparing the improvements of two-polarity, three-polarity and five-polarity sentiment analysis results, we found that the five-polarity sentiment analysis delivered the largest improvement. The improvement of the two-polarity sentiment analysis was the smallest. We conclude that the PSOGO-Senti achieves higher improvement for a more complicated sentiment analysis task. We also compared the results of PSOGO-Senti with those of the genetic algorithm (GA) and grid search method. From the results of this comparison, we found that PSOGO-Senti is more suitable for improving a difficult multi-polarity sentiment analysis problem.
SU-E-J-254: Evaluating the Role of Mid-Treatment and Post-Treatment FDG-PET/CT in Predicting Progression-Free Survival and Distant Metastasis of Anal Cancer Patients Treated with Chemoradiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, H; Wang, J; Chuong, M

2015-06-15

Purpose: To evaluate the role of mid-treatment and post-treatment FDG-PET/CT in predicting progression-free survival (PFS) and distant metastasis (DM) of anal cancer patients treated with chemoradiotherapy (CRT). Methods: 17 anal cancer patients treated with CRT were retrospectively studied. The median prescription dose was 56 Gy (range, 50–62.5 Gy). All patients underwent FDG-PET/CT scans before and after CRT. 16 of the 17 patients had an additional FDG-PET/CT image at 3–5 weeks into the treatment (denoted as mid-treatment FDG-PET/CT). 750 features were extracted from these three sets of scans, which included both traditional PET/CT measures (SUVmax, SUVpeak, tumor diameters, etc.) and spatialtemporalmore » PET/CT features (comprehensively quantify a tumor’s FDG uptake intensity and distribution, spatial variation (texture), geometric property and their temporal changes relative to baseline). 26 clinical parameters (age, gender, TNM stage, histology, GTV dose, etc.) were also analyzed. Advanced analytics including methods to select an optimal set of predictors and a model selection engine, which identifies the most accurate machine learning algorithm for predictive analysis was developed. Results: Comparing baseline + mid-treatment PET/CT set to baseline + posttreatment PET/CT set, 14 predictors were selected from each feature group. Same three clinical parameters (tumor size, T stage and whether 5-FU was held during any cycle of chemotherapy) and two traditional measures (pre- CRT SUVmin and SUVmedian) were selected by both predictor groups. Different mix of spatial-temporal PET/CT features was selected. Using the 14 predictors and Naive Bayes, mid-treatment PET/CT set achieved 87.5% accuracy (2 PFS patients misclassified, all local recurrence and DM patients correctly classified). Post-treatment PET/CT set achieved 94.0% accuracy (all PFS and DM patients correctly predicted, 1 local recurrence patient misclassified) with logistic regression, neural network or support vector machine model. Conclusion: Applying radiomics approach to either midtreatment or post-treatment PET/CT could achieve high accuracy in predicting anal cancer treatment outcomes. This work was supported in part by the National Cancer Institute Grant R01CA172638.« less
Constraint programming based biomarker optimization.

PubMed

Zhou, Manli; Luo, Youxi; Sun, Guoquan; Mai, Guoqin; Zhou, Fengfeng

2015-01-01

Efficient and intuitive characterization of biological big data is becoming a major challenge for modern bio-OMIC based scientists. Interactive visualization and exploration of big data is proven to be one of the successful solutions. Most of the existing feature selection algorithms do not allow the interactive inputs from users in the optimizing process of feature selection. This study investigates this question as fixing a few user-input features in the finally selected feature subset and formulates these user-input features as constraints for a programming model. The proposed algorithm, fsCoP (feature selection based on constrained programming), performs well similar to or much better than the existing feature selection algorithms, even with the constraints from both literature and the existing algorithms. An fsCoP biomarker may be intriguing for further wet lab validation, since it satisfies both the classification optimization function and the biomedical knowledge. fsCoP may also be used for the interactive exploration of bio-OMIC big data by interactively adding user-defined constraints for modeling.
Electromyographic Patterns during Golf Swing: Activation Sequence Profiling and Prediction of Shot Effectiveness.

PubMed

Verikas, Antanas; Vaiciukynas, Evaldas; Gelzinis, Adas; Parker, James; Olsson, M Charlotte

2016-04-23

This study analyzes muscle activity, recorded in an eight-channel electromyographic (EMG) signal stream, during the golf swing using a 7-iron club and exploits information extracted from EMG dynamics to predict the success of the resulting shot. Muscles of the arm and shoulder on both the left and right sides, namely flexor carpi radialis, extensor digitorum communis, rhomboideus and trapezius, are considered for 15 golf players (∼5 shots each). The method using Gaussian filtering is outlined for EMG onset time estimation in each channel and activation sequence profiling. Shots of each player revealed a persistent pattern of muscle activation. Profiles were plotted and insights with respect to player effectiveness were provided. Inspection of EMG dynamics revealed a pair of highest peaks in each channel as the hallmark of golf swing, and a custom application of peak detection for automatic extraction of swing segment was introduced. Various EMG features, encompassing 22 feature sets, were constructed. Feature sets were used individually and also in decision-level fusion for the prediction of shot effectiveness. The prediction of the target attribute, such as club head speed or ball carry distance, was investigated using random forest as the learner in detection and regression tasks. Detection evaluates the personal effectiveness of a shot with respect to the player-specific average, whereas regression estimates the value of target attribute, using EMG features as predictors. Fusion after decision optimization provided the best results: the equal error rate in detection was 24.3% for the speed and 31.7% for the distance; the mean absolute percentage error in regression was 3.2% for the speed and 6.4% for the distance. Proposed EMG feature sets were found to be useful, especially when used in combination. Rankings of feature sets indicated statistics for muscle activity in both the left and right body sides, correlation-based analysis of EMG dynamics and features derived from the properties of two highest peaks as important predictors of personal shot effectiveness. Activation sequence profiles helped in analyzing muscle orchestration during golf shot, exposing a specific avalanche pattern, but data from more players are needed for stronger conclusions. Results demonstrate that information arising from an EMG signal stream is useful for predicting golf shot success, in terms of club head speed and ball carry distance, with acceptable accuracy. Surface EMG data, collected with a goal to automatically evaluate golf player's performance, enables wearable computing in the field of ambient intelligence and has potential to enhance exercising of a long carry distance drive.
Electromyographic Patterns during Golf Swing: Activation Sequence Profiling and Prediction of Shot Effectiveness

PubMed Central

Verikas, Antanas; Vaiciukynas, Evaldas; Gelzinis, Adas; Parker, James; Olsson, M. Charlotte

2016-01-01

This study analyzes muscle activity, recorded in an eight-channel electromyographic (EMG) signal stream, during the golf swing using a 7-iron club and exploits information extracted from EMG dynamics to predict the success of the resulting shot. Muscles of the arm and shoulder on both the left and right sides, namely flexor carpi radialis, extensor digitorum communis, rhomboideus and trapezius, are considered for 15 golf players (∼5 shots each). The method using Gaussian filtering is outlined for EMG onset time estimation in each channel and activation sequence profiling. Shots of each player revealed a persistent pattern of muscle activation. Profiles were plotted and insights with respect to player effectiveness were provided. Inspection of EMG dynamics revealed a pair of highest peaks in each channel as the hallmark of golf swing, and a custom application of peak detection for automatic extraction of swing segment was introduced. Various EMG features, encompassing 22 feature sets, were constructed. Feature sets were used individually and also in decision-level fusion for the prediction of shot effectiveness. The prediction of the target attribute, such as club head speed or ball carry distance, was investigated using random forest as the learner in detection and regression tasks. Detection evaluates the personal effectiveness of a shot with respect to the player-specific average, whereas regression estimates the value of target attribute, using EMG features as predictors. Fusion after decision optimization provided the best results: the equal error rate in detection was 24.3% for the speed and 31.7% for the distance; the mean absolute percentage error in regression was 3.2% for the speed and 6.4% for the distance. Proposed EMG feature sets were found to be useful, especially when used in combination. Rankings of feature sets indicated statistics for muscle activity in both the left and right body sides, correlation-based analysis of EMG dynamics and features derived from the properties of two highest peaks as important predictors of personal shot effectiveness. Activation sequence profiles helped in analyzing muscle orchestration during golf shot, exposing a specific avalanche pattern, but data from more players are needed for stronger conclusions. Results demonstrate that information arising from an EMG signal stream is useful for predicting golf shot success, in terms of club head speed and ball carry distance, with acceptable accuracy. Surface EMG data, collected with a goal to automatically evaluate golf player’s performance, enables wearable computing in the field of ambient intelligence and has potential to enhance exercising of a long carry distance drive. PMID:27120604
Tempest: Accelerated MS/MS database search software for heterogeneous computing platforms

PubMed Central

Adamo, Mark E.; Gerber, Scott A.

2017-01-01

MS/MS database search algorithms derive a set of candidate peptide sequences from in-silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU generates peptide candidates that are asynchronously sent to a discrete GPU to be scored against experimental spectra in parallel (Milloy et al., 2012). The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. PMID:27603022
An efficient global energy optimization approach for robust 3D plane segmentation of point clouds

NASA Astrophysics Data System (ADS)

Dong, Zhen; Yang, Bisheng; Hu, Pingbo; Scherer, Sebastian

2018-03-01

Automatic 3D plane segmentation is necessary for many applications including point cloud registration, building information model (BIM) reconstruction, simultaneous localization and mapping (SLAM), and point cloud compression. However, most of the existing 3D plane segmentation methods still suffer from low precision and recall, and inaccurate and incomplete boundaries, especially for low-quality point clouds collected by RGB-D sensors. To overcome these challenges, this paper formulates the plane segmentation problem as a global energy optimization because it is robust to high levels of noise and clutter. First, the proposed method divides the raw point cloud into multiscale supervoxels, and considers planar supervoxels and individual points corresponding to nonplanar supervoxels as basic units. Then, an efficient hybrid region growing algorithm is utilized to generate initial plane set by incrementally merging adjacent basic units with similar features. Next, the initial plane set is further enriched and refined in a mutually reinforcing manner under the framework of global energy optimization. Finally, the performances of the proposed method are evaluated with respect to six metrics (i.e., plane precision, plane recall, under-segmentation rate, over-segmentation rate, boundary precision, and boundary recall) on two benchmark datasets. Comprehensive experiments demonstrate that the proposed method obtained good performances both in high-quality TLS point clouds (i.e., http://SEMANTIC3D.NET)
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.

PubMed

Subasi, Abdulhamit

2013-06-01

Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.
Compartmental model of 18F-choline

NASA Astrophysics Data System (ADS)

Janzen, T.; Tavola, F.; Giussani, A.; Cantone, M. C.; Uusijärvi, H.; Mattsson, S.; Zankl, M.; Petoussi-Henß, N.; Hoeschen, C.

2010-03-01

The MADEIRA Project (Minimizing Activity and Dose with Enhanced Image quality by Radiopharmaceutical Administrations), aims to improve the efficacy and safety of 3D functional imaging by optimizing, among others, the knowledge of the temporal variation of the radiopharmaceuticals' uptake in and clearance from tumor and healthy tissues. With the help of compartmental modeling it is intended to optimize the time schedule for data collection and improve the evaluation of the organ doses to the patients. Administration of 18F-choline to screen for recurrence or the occurrence of metastases in prostate cancer patients is one of the diagnostic applications under consideration in the frame of the project. PET and CT images have been acquired up to four hours after injection of 18F-choline. Additionally blood and urine samples have been collected and measured in a gamma counter. The radioactivity concentration in different organs and data of plasma clearance and elimination into urine were used to set-up a compartmental model of the biokinetics of the radiopharmaceutical. It features a central compartment (blood) exchanging with organs. The structure describes explicitly liver, kidneys, spleen, plasma and bladder as separate units with a forcing function approach. The model is presented together with an evaluation of the individual and population kinetic parameters, and a revised time schedule for data collection is proposed. This optimized time schedule will be validated in a further set of patient studies.
Optimized Kernel Entropy Components.

PubMed

Izquierdo-Verdiguier, Emma; Laparra, Valero; Jenssen, Robert; Gomez-Chova, Luis; Camps-Valls, Gustau

2017-06-01

This brief addresses two main issues of the standard kernel entropy component analysis (KECA) algorithm: the optimization of the kernel decomposition and the optimization of the Gaussian kernel parameter. KECA roughly reduces to a sorting of the importance of kernel eigenvectors by entropy instead of variance, as in the kernel principal components analysis. In this brief, we propose an extension of the KECA method, named optimized KECA (OKECA), that directly extracts the optimal features retaining most of the data entropy by means of compacting the information in very few features (often in just one or two). The proposed method produces features which have higher expressive power. In particular, it is based on the independent component analysis framework, and introduces an extra rotation to the eigen decomposition, which is optimized via gradient-ascent search. This maximum entropy preservation suggests that OKECA features are more efficient than KECA features for density estimation. In addition, a critical issue in both the methods is the selection of the kernel parameter, since it critically affects the resulting performance. Here, we analyze the most common kernel length-scale selection criteria. The results of both the methods are illustrated in different synthetic and real problems. Results show that OKECA returns projections with more expressive power than KECA, the most successful rule for estimating the kernel parameter is based on maximum likelihood, and OKECA is more robust to the selection of the length-scale parameter in kernel density estimation.
Vascular Leiomyoma and Geniculate Ganglion

PubMed Central

Magliulo, Giuseppe; Iannella, Giannicola; Valente, Michele; Greco, Antonio; Appiani, Mario Ciniglio

2013-01-01

Objectives Discussion of a rare case of angioleiomyoma involving the geniculate ganglion and the intratemporal facial nerve segment and its surgical treatment. Design Case report. Setting Presence of an expansive lesion englobing the geniculate ganglion without any lesion to the cerebellopontine angle. Participants A 45-year-old man with a grade III facial paralysis according to the House-Brackmann scale of evaluation. Main Outcomes Measure Surgical pathology, radiologic appearance, histological features, and postoperative facial function. Results Removal of the entire lesion was achieved, preserving the anatomic integrity of the nerve; no nerve graft was necessary. Postoperative histology and immunohistochemical studies revealed features indicative of solid vascular leiomyoma. Conclusion Angioleiomyoma should be considered in the differential diagnosis of geniculate ganglion lesions. Optimal postoperative facial function is possible only by preserving the anatomical and functional integrity of the facial nerve. PMID:23943721
Kernel-Based Relevance Analysis with Enhanced Interpretability for Detection of Brain Activity Patterns

PubMed Central

Alvarez-Meza, Andres M.; Orozco-Gutierrez, Alvaro; Castellanos-Dominguez, German

2017-01-01

We introduce Enhanced Kernel-based Relevance Analysis (EKRA) that aims to support the automatic identification of brain activity patterns using electroencephalographic recordings. EKRA is a data-driven strategy that incorporates two kernel functions to take advantage of the available joint information, associating neural responses to a given stimulus condition. Regarding this, a Centered Kernel Alignment functional is adjusted to learning the linear projection that best discriminates the input feature set, optimizing the required free parameters automatically. Our approach is carried out in two scenarios: (i) feature selection by computing a relevance vector from extracted neural features to facilitating the physiological interpretation of a given brain activity task, and (ii) enhanced feature selection to perform an additional transformation of relevant features aiming to improve the overall identification accuracy. Accordingly, we provide an alternative feature relevance analysis strategy that allows improving the system performance while favoring the data interpretability. For the validation purpose, EKRA is tested in two well-known tasks of brain activity: motor imagery discrimination and epileptic seizure detection. The obtained results show that the EKRA approach estimates a relevant representation space extracted from the provided supervised information, emphasizing the salient input features. As a result, our proposal outperforms the state-of-the-art methods regarding brain activity discrimination accuracy with the benefit of enhanced physiological interpretation about the task at hand. PMID:29056897
Efficient hyperspectral image segmentation using geometric active contour formulation

NASA Astrophysics Data System (ADS)

Albalooshi, Fatema A.; Sidike, Paheding; Asari, Vijayan K.

2014-10-01

In this paper, we present a new formulation of geometric active contours that embeds the local hyperspectral image information for an accurate object region and boundary extraction. We exploit self-organizing map (SOM) unsupervised neural network to train our model. The segmentation process is achieved by the construction of a level set cost functional, in which, the dynamic variable is the best matching unit (BMU) coming from SOM map. In addition, we use Gaussian filtering to discipline the deviation of the level set functional from a signed distance function and this actually helps to get rid of the re-initialization step that is computationally expensive. By using the properties of the collective computational ability and energy convergence capability of the active control models (ACM) energy functional, our method optimizes the geometric ACM energy functional with lower computational time and smoother level set function. The proposed algorithm starts with feature extraction from raw hyperspectral images. In this step, the principal component analysis (PCA) transformation is employed, and this actually helps in reducing dimensionality and selecting best sets of the significant spectral bands. Then the modified geometric level set functional based ACM is applied on the optimal number of spectral bands determined by the PCA. By introducing local significant spectral band information, our proposed method is capable to force the level set functional to be close to a signed distance function, and therefore considerably remove the need of the expensive re-initialization procedure. To verify the effectiveness of the proposed technique, we use real-life hyperspectral images and test our algorithm in varying textural regions. This framework can be easily adapted to different applications for object segmentation in aerial hyperspectral imagery.
A stereo remote sensing feature selection method based on artificial bee colony algorithm

NASA Astrophysics Data System (ADS)

Yan, Yiming; Liu, Pigang; Zhang, Ye; Su, Nan; Tian, Shu; Gao, Fengjiao; Shen, Yi

2014-05-01

To improve the efficiency of stereo information for remote sensing classification, a stereo remote sensing feature selection method is proposed in this paper presents, which is based on artificial bee colony algorithm. Remote sensing stereo information could be described by digital surface model (DSM) and optical image, which contain information of the three-dimensional structure and optical characteristics, respectively. Firstly, three-dimensional structure characteristic could be analyzed by 3D-Zernike descriptors (3DZD). However, different parameters of 3DZD could descript different complexity of three-dimensional structure, and it needs to be better optimized selected for various objects on the ground. Secondly, features for representing optical characteristic also need to be optimized. If not properly handled, when a stereo feature vector composed of 3DZD and image features, that would be a lot of redundant information, and the redundant information may not improve the classification accuracy, even cause adverse effects. To reduce information redundancy while maintaining or improving the classification accuracy, an optimized frame for this stereo feature selection problem is created, and artificial bee colony algorithm is introduced for solving this optimization problem. Experimental results show that the proposed method can effectively improve the computational efficiency, improve the classification accuracy.
Maximizing the probability of satisfying the clinical goals in radiation therapy treatment planning under setup uncertainty

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fredriksson, Albin, E-mail: albin.fredriksson@raysearchlabs.com; Hårdemark, Björn; Forsgren, Anders

2015-07-15

Purpose: This paper introduces a method that maximizes the probability of satisfying the clinical goals in intensity-modulated radiation therapy treatments subject to setup uncertainty. Methods: The authors perform robust optimization in which the clinical goals are constrained to be satisfied whenever the setup error falls within an uncertainty set. The shape of the uncertainty set is included as a variable in the optimization. The goal of the optimization is to modify the shape of the uncertainty set in order to maximize the probability that the setup error will fall within the modified set. Because the constraints enforce the clinical goalsmore » to be satisfied under all setup errors within the uncertainty set, this is equivalent to maximizing the probability of satisfying the clinical goals. This type of robust optimization is studied with respect to photon and proton therapy applied to a prostate case and compared to robust optimization using an a priori defined uncertainty set. Results: Slight reductions of the uncertainty sets resulted in plans that satisfied a larger number of clinical goals than optimization with respect to a priori defined uncertainty sets, both within the reduced uncertainty sets and within the a priori, nonreduced, uncertainty sets. For the prostate case, the plans taking reduced uncertainty sets into account satisfied 1.4 (photons) and 1.5 (protons) times as many clinical goals over the scenarios as the method taking a priori uncertainty sets into account. Conclusions: Reducing the uncertainty sets enabled the optimization to find better solutions with respect to the errors within the reduced as well as the nonreduced uncertainty sets and thereby achieve higher probability of satisfying the clinical goals. This shows that asking for a little less in the optimization sometimes leads to better overall plan quality.« less
Did Groundwater Processes Shape the Saharan Landscape during the Previous Wet Periods? a Remote Sensing and Geostatistical Approach

NASA Astrophysics Data System (ADS)

Farag, A. Z. A.; Sultan, M.; Elkadiri, R.; Abdelhalim, A.

2014-12-01

An integrated approach using remote sensing, landscape analysis and statistical methods was conducted to assess the role of groundwater sapping in shaping the Saharan landscape. A GIS-based logistic regression model was constructed to automatically delineate the spatial distribution of the sapping features over areas occupied by the Nubian Sandstone Aquifer System (NSAS): (1) an inventory was compiled of known locations of sapping features identified either in the field or from satellite datasets (e.g. Orbview-3 and Google Earth Digital Globe imagery); (2) spatial analyses were conducted in a GIS environment and seven geomorphological and geological predisposing factors (i.e. slope, stream density, cross-sectional and profile curvature, minimum and maximum curvature, and lithology) were identified; (3) a binary logistic regression model was constructed, optimized and validated to describe the relationship between the sapping locations and the set of controlling factors and (4) the generated model (prediction accuracy: 90.1%) was used to produce a regional sapping map over the NSAS. Model outputs indicate: (1) groundwater discharge and structural control played an important role in excavating the Saharan natural depressions as evidenced by the wide distribution of sapping features (areal extent: 1180 km2) along the fault-controlled escarpments of the Libyan Plateau; (2) proximity of mapped sapping features to reported paleolake and tufa deposits suggesting a causal effect. Our preliminary observations (from satellite imagery) and statistical analyses together with previous studies in the North Western Sahara Aquifer System (North Africa), Sinai Peninsula, Negev Desert, and The Plateau of Najd (Saudi Arabia) indicate extensive occurrence of sapping features along the escarpments bordering the northern margins of the Saharan-Arabian Desert; these areas share similar hydrologic settings with the NSAS domains and they too witnessed wet climatic periods in the Mid-Late Quaternary.
Adaptive fusion of infrared and visible images in dynamic scene

NASA Astrophysics Data System (ADS)

Yang, Guang; Yin, Yafeng; Man, Hong; Desai, Sachi

2011-11-01

Multiple modalities sensor fusion has been widely employed in various surveillance and military applications. A variety of image fusion techniques including PCA, wavelet, curvelet and HSV has been proposed in recent years to improve human visual perception for object detection. One of the main challenges for visible and infrared image fusion is to automatically determine an optimal fusion strategy for different input scenes along with an acceptable computational cost. This paper, we propose a fast and adaptive feature selection based image fusion method to obtain high a contrast image from visible and infrared sensors for targets detection. At first, fuzzy c-means clustering is applied on the infrared image to highlight possible hotspot regions, which will be considered as potential targets' locations. After that, the region surrounding the target area is segmented as the background regions. Then image fusion is locally applied on the selected target and background regions by computing different linear combination of color components from registered visible and infrared images. After obtaining different fused images, histogram distributions are computed on these local fusion images as the fusion feature set. The variance ratio which is based on Linear Discriminative Analysis (LDA) measure is employed to sort the feature set and the most discriminative one is selected for the whole image fusion. As the feature selection is performed over time, the process will dynamically determine the most suitable feature for the image fusion in different scenes. Experiment is conducted on the OSU Color-Thermal database, and TNO Human Factor dataset. The fusion results indicate that our proposed method achieved a competitive performance compared with other fusion algorithms at a relatively low computational cost.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.