EEG feature selection method based on decision tree.
Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun
2015-01-01
This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.
Integrated feature extraction and selection for neuroimage classification
NASA Astrophysics Data System (ADS)
Fan, Yong; Shen, Dinggang
2009-02-01
Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.
NASA Astrophysics Data System (ADS)
Li, Yifan; Liang, Xihui; Lin, Jianhui; Chen, Yuejian; Liu, Jianxin
2018-02-01
This paper presents a novel signal processing scheme, feature selection based multi-scale morphological filter (MMF), for train axle bearing fault detection. In this scheme, more than 30 feature indicators of vibration signals are calculated for axle bearings with different conditions and the features which can reflect fault characteristics more effectively and representatively are selected using the max-relevance and min-redundancy principle. Then, a filtering scale selection approach for MMF based on feature selection and grey relational analysis is proposed. The feature selection based MMF method is tested on diagnosis of artificially created damages of rolling bearings of railway trains. Experimental results show that the proposed method has a superior performance in extracting fault features of defective train axle bearings. In addition, comparisons are performed with the kurtosis criterion based MMF and the spectral kurtosis criterion based MMF. The proposed feature selection based MMF method outperforms these two methods in detection of train axle bearing faults.
Quantum-enhanced feature selection with forward selection and backward elimination
NASA Astrophysics Data System (ADS)
He, Zhimin; Li, Lvzhou; Huang, Zhiming; Situ, Haozhen
2018-07-01
Feature selection is a well-known preprocessing technique in machine learning, which can remove irrelevant features to improve the generalization capability of a classifier and reduce training and inference time. However, feature selection is time-consuming, particularly for the applications those have thousands of features, such as image retrieval, text mining and microarray data analysis. It is crucial to accelerate the feature selection process. We propose a quantum version of wrapper-based feature selection, which converts a classical feature selection to its quantum counterpart. It is valuable for machine learning on quantum computer. In this paper, we focus on two popular kinds of feature selection methods, i.e., wrapper-based forward selection and backward elimination. The proposed feature selection algorithm can quadratically accelerate the classical one.
Chen, Qiang; Chen, Yunhao; Jiang, Weiguo
2016-07-30
In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.
NASA Astrophysics Data System (ADS)
Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang
2017-01-01
Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods.
Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang
2017-01-01
Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods. PMID:28120883
Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo
2016-02-01
Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.
Chen, Qiang; Chen, Yunhao; Jiang, Weiguo
2016-01-01
In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm. PMID:27483285
The fate of task-irrelevant visual motion: perceptual load versus feature-based attention.
Taya, Shuichiro; Adams, Wendy J; Graf, Erich W; Lavie, Nilli
2009-11-18
We tested contrasting predictions derived from perceptual load theory and from recent feature-based selection accounts. Observers viewed moving, colored stimuli and performed low or high load tasks associated with one stimulus feature, either color or motion. The resultant motion aftereffect (MAE) was used to evaluate attentional allocation. We found that task-irrelevant visual features received less attention than co-localized task-relevant features of the same objects. Moreover, when color and motion features were co-localized yet perceived to belong to two distinct surfaces, feature-based selection was further increased at the expense of object-based co-selection. Load theory predicts that the MAE for task-irrelevant motion would be reduced with a higher load color task. However, this was not seen for co-localized features; perceptual load only modulated the MAE for task-irrelevant motion when this was spatially separated from the attended color location. Our results suggest that perceptual load effects are mediated by spatial selection and do not generalize to the feature domain. Feature-based selection operates to suppress processing of task-irrelevant, co-localized features, irrespective of perceptual load.
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2017-04-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation
Siraj, Maheyzah Md; Zainal, Anazida; Elshoush, Huwaida Tagelsir; Elhaj, Fatin
2016-01-01
Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience, which lead to the less accurate identification of attack steps and inconsistent performance of clustering accuracy. Furthermore, the existing alert correlation systems deal with a huge amount of data that contains null values, incomplete information, and irrelevant features causing the analysis of the alerts to be tedious, time-consuming and error-prone. Therefore, this paper focuses on selecting accurate and significant features of alerts that are appropriate to represent the attack steps, thus, enhancing the structural-based alert correlation model. A two-tier feature selection method is proposed to obtain the significant features. The first tier aims at ranking the subset of features based on high information gain entropy in decreasing order. The second tier extends additional features with a better discriminative ability than the initially ranked features. Performance analysis results show the significance of the selected features in terms of the clustering accuracy using 2000 DARPA intrusion detection scenario-specific dataset. PMID:27893821
NASA Astrophysics Data System (ADS)
Diamant, Idit; Shalhon, Moran; Goldberger, Jacob; Greenspan, Hayit
2016-03-01
Classification of clustered breast microcalcifications into benign and malignant categories is an extremely challenging task for computerized algorithms and expert radiologists alike. In this paper we present a novel method for feature selection based on mutual information (MI) criterion for automatic classification of microcalcifications. We explored the MI based feature selection for various texture features. The proposed method was evaluated on a standardized digital database for screening mammography (DDSM). Experimental results demonstrate the effectiveness and the advantage of using the MI-based feature selection to obtain the most relevant features for the task and thus to provide for improved performance as compared to using all features.
McTwo: a two-step feature selection algorithm based on maximal information coefficient.
Ge, Ruiquan; Zhou, Manli; Luo, Youxi; Meng, Qinghan; Mai, Guoqin; Ma, Dongli; Wang, Guoqing; Zhou, Fengfeng
2016-03-23
High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.
Max-AUC Feature Selection in Computer-Aided Detection of Polyps in CT Colonography
Xu, Jian-Wu; Suzuki, Kenji
2014-01-01
We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level. PMID:24608058
Max-AUC feature selection in computer-aided detection of polyps in CT colonography.
Xu, Jian-Wu; Suzuki, Kenji
2014-03-01
We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level.
Rough sets and Laplacian score based cost-sensitive feature selection
Yu, Shenglong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884
Rough sets and Laplacian score based cost-sensitive feature selection.
Yu, Shenglong; Zhao, Hong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Xiaojia; Mao Qirong; Zhan Yongzhao
There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less
Chen, Yifei; Sun, Yuxing; Han, Bing-Qing
2015-01-01
Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.
Mala, S.; Latha, K.
2014-01-01
Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition. PMID:25574185
Mala, S; Latha, K
2014-01-01
Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.
NASA Astrophysics Data System (ADS)
Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na
2016-10-01
Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.
Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S
2017-10-01
The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.
Andersen, Søren K; Müller, Matthias M; Hillyard, Steven A
2015-07-08
Experiments that study feature-based attention have often examined situations in which selection is based on a single feature (e.g., the color red). However, in more complex situations relevant stimuli may not be set apart from other stimuli by a single defining property but by a specific combination of features. Here, we examined sustained attentional selection of stimuli defined by conjunctions of color and orientation. Human observers attended to one out of four concurrently presented superimposed fields of randomly moving horizontal or vertical bars of red or blue color to detect brief intervals of coherent motion. Selective stimulus processing in early visual cortex was assessed by recordings of steady-state visual evoked potentials (SSVEPs) elicited by each of the flickering fields of stimuli. We directly contrasted attentional selection of single features and feature conjunctions and found that SSVEP amplitudes on conditions in which selection was based on a single feature only (color or orientation) exactly predicted the magnitude of attentional enhancement of SSVEPs when attending to a conjunction of both features. Furthermore, enhanced SSVEP amplitudes elicited by attended stimuli were accompanied by equivalent reductions of SSVEP amplitudes elicited by unattended stimuli in all cases. We conclude that attentional selection of a feature-conjunction stimulus is accomplished by the parallel and independent facilitation of its constituent feature dimensions in early visual cortex. The ability to perceive the world is limited by the brain's processing capacity. Attention affords adaptive behavior by selectively prioritizing processing of relevant stimuli based on their features (location, color, orientation, etc.). We found that attentional mechanisms for selection of different features belonging to the same object operate independently and in parallel: concurrent attentional selection of two stimulus features is simply the sum of attending to each of those features separately. This result is key to understanding attentional selection in complex (natural) scenes, where relevant stimuli are likely to be defined by a combination of stimulus features. Copyright © 2015 the authors 0270-6474/15/359912-08$15.00/0.
NASA Astrophysics Data System (ADS)
Zhang, Zhifen; Chen, Huabin; Xu, Yanling; Zhong, Jiyong; Lv, Na; Chen, Shanben
2015-08-01
Multisensory data fusion-based online welding quality monitoring has gained increasing attention in intelligent welding process. This paper mainly focuses on the automatic detection of typical welding defect for Al alloy in gas tungsten arc welding (GTAW) by means of analzing arc spectrum, sound and voltage signal. Based on the developed algorithms in time and frequency domain, 41 feature parameters were successively extracted from these signals to characterize the welding process and seam quality. Then, the proposed feature selection approach, i.e., hybrid fisher-based filter and wrapper was successfully utilized to evaluate the sensitivity of each feature and reduce the feature dimensions. Finally, the optimal feature subset with 19 features was selected to obtain the highest accuracy, i.e., 94.72% using established classification model. This study provides a guideline for feature extraction, selection and dynamic modeling based on heterogeneous multisensory data to achieve a reliable online defect detection system in arc welding.
Deng, Changjian; Lv, Kun; Shi, Debo; Yang, Bo; Yu, Song; He, Zhiyi; Yan, Jia
2018-06-12
In this paper, a novel feature selection and fusion framework is proposed to enhance the discrimination ability of gas sensor arrays for odor identification. Firstly, we put forward an efficient feature selection method based on the separability and the dissimilarity to determine the feature selection order for each type of feature when increasing the dimension of selected feature subsets. Secondly, the K-nearest neighbor (KNN) classifier is applied to determine the dimensions of the optimal feature subsets for different types of features. Finally, in the process of establishing features fusion, we come up with a classification dominance feature fusion strategy which conducts an effective basic feature. Experimental results on two datasets show that the recognition rates of Database I and Database II achieve 97.5% and 80.11%, respectively, when k = 1 for KNN classifier and the distance metric is correlation distance (COR), which demonstrates the superiority of the proposed feature selection and fusion framework in representing signal features. The novel feature selection method proposed in this paper can effectively select feature subsets that are conducive to the classification, while the feature fusion framework can fuse various features which describe the different characteristics of sensor signals, for enhancing the discrimination ability of gas sensors and, to a certain extent, suppressing drift effect.
Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah
2018-02-01
Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.
NASA Astrophysics Data System (ADS)
Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah
2018-02-01
Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.
Matsukura, Michi; Vecera, Shaun P
2011-02-01
Attention selects objects as well as locations. When attention selects an object's features, observers identify two features from a single object more accurately than two features from two different objects (object-based effect of attention; e.g., Duncan, Journal of Experimental Psychology: General, 113, 501-517, 1984). Several studies have demonstrated that object-based attention can operate at a late visual processing stage that is independent of objects' spatial information (Awh, Dhaliwal, Christensen, & Matsukura, Psychological Science, 12, 329-334, 2001; Matsukura & Vecera, Psychonomic Bulletin & Review, 16, 529-536, 2009; Vecera, Journal of Experimental Psychology: General, 126, 14-18, 1997; Vecera & Farah, Journal of Experimental Psychology: General, 123, 146-160, 1994). In the present study, we asked two questions regarding this late object-based selection mechanism. In Part I, we investigated how observers' foreknowledge of to-be-reported features allows attention to select objects, as opposed to individual features. Using a feature-report task, a significant object-based effect was observed when to-be-reported features were known in advance but not when this advance knowledge was absent. In Part II, we examined what drives attention to select objects rather than individual features in the absence of observers' foreknowledge of to-be-reported features. Results suggested that, when there was no opportunity for observers to direct their attention to objects that possess to-be-reported features at the time of stimulus presentation, these stimuli must retain strong perceptual cues to establish themselves as separate objects.
Relevance popularity: A term event model based feature selection scheme for text classification.
Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao
2017-01-01
Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.
Wang, Jie; Feng, Zuren; Lu, Na; Luo, Jing
2018-06-01
Feature selection plays an important role in the field of EEG signals based motor imagery pattern classification. It is a process that aims to select an optimal feature subset from the original set. Two significant advantages involved are: lowering the computational burden so as to speed up the learning procedure and removing redundant and irrelevant features so as to improve the classification performance. Therefore, feature selection is widely employed in the classification of EEG signals in practical brain-computer interface systems. In this paper, we present a novel statistical model to select the optimal feature subset based on the Kullback-Leibler divergence measure, and automatically select the optimal subject-specific time segment. The proposed method comprises four successive stages: a broad frequency band filtering and common spatial pattern enhancement as preprocessing, features extraction by autoregressive model and log-variance, the Kullback-Leibler divergence based optimal feature and time segment selection and linear discriminate analysis classification. More importantly, this paper provides a potential framework for combining other feature extraction models and classification algorithms with the proposed method for EEG signals classification. Experiments on single-trial EEG signals from two public competition datasets not only demonstrate that the proposed method is effective in selecting discriminative features and time segment, but also show that the proposed method yields relatively better classification results in comparison with other competitive methods. Copyright © 2018 Elsevier Ltd. All rights reserved.
Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.
Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki
2016-07-01
We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.
Gomez-Ramirez, Manuel; Trzcinski, Natalie K.; Mihalas, Stefan; Niebur, Ernst
2014-01-01
Studies in vision show that attention enhances the firing rates of cells when it is directed towards their preferred stimulus feature. However, it is unknown whether other sensory systems employ this mechanism to mediate feature selection within their modalities. Moreover, whether feature-based attention modulates the correlated activity of a population is unclear. Indeed, temporal correlation codes such as spike-synchrony and spike-count correlations (rsc) are believed to play a role in stimulus selection by increasing the signal and reducing the noise in a population, respectively. Here, we investigate (1) whether feature-based attention biases the correlated activity between neurons when attention is directed towards their common preferred feature, (2) the interplay between spike-synchrony and rsc during feature selection, and (3) whether feature attention effects are common across the visual and tactile systems. Single-unit recordings were made in secondary somatosensory cortex of three non-human primates while animals engaged in tactile feature (orientation and frequency) and visual discrimination tasks. We found that both firing rate and spike-synchrony between neurons with similar feature selectivity were enhanced when attention was directed towards their preferred feature. However, attention effects on spike-synchrony were twice as large as those on firing rate, and had a tighter relationship with behavioral performance. Further, we observed increased rsc when attention was directed towards the visual modality (i.e., away from touch). These data suggest that similar feature selection mechanisms are employed in vision and touch, and that temporal correlation codes such as spike-synchrony play a role in mediating feature selection. We posit that feature-based selection operates by implementing multiple mechanisms that reduce the overall noise levels in the neural population and synchronize activity across subpopulations that encode the relevant features of sensory stimuli. PMID:25423284
USDA-ARS?s Scientific Manuscript database
Due to the availability of numerous spectral, spatial, and contextual features, the determination of optimal features and class separabilities can be a time consuming process in object-based image analysis (OBIA). While several feature selection methods have been developed to assist OBIA, a robust c...
Constraint programming based biomarker optimization.
Zhou, Manli; Luo, Youxi; Sun, Guoquan; Mai, Guoqin; Zhou, Fengfeng
2015-01-01
Efficient and intuitive characterization of biological big data is becoming a major challenge for modern bio-OMIC based scientists. Interactive visualization and exploration of big data is proven to be one of the successful solutions. Most of the existing feature selection algorithms do not allow the interactive inputs from users in the optimizing process of feature selection. This study investigates this question as fixing a few user-input features in the finally selected feature subset and formulates these user-input features as constraints for a programming model. The proposed algorithm, fsCoP (feature selection based on constrained programming), performs well similar to or much better than the existing feature selection algorithms, even with the constraints from both literature and the existing algorithms. An fsCoP biomarker may be intriguing for further wet lab validation, since it satisfies both the classification optimization function and the biomedical knowledge. fsCoP may also be used for the interactive exploration of bio-OMIC big data by interactively adding user-defined constraints for modeling.
Natural image classification driven by human brain activity
NASA Astrophysics Data System (ADS)
Zhang, Dai; Peng, Hanyang; Wang, Jinqiao; Tang, Ming; Xue, Rong; Zuo, Zhentao
2016-03-01
Natural image classification has been a hot topic in computer vision and pattern recognition research field. Since the performance of an image classification system can be improved by feature selection, many image feature selection methods have been developed. However, the existing supervised feature selection methods are typically driven by the class label information that are identical for different samples from the same class, ignoring with-in class image variability and therefore degrading the feature selection performance. In this study, we propose a novel feature selection method, driven by human brain activity signals collected using fMRI technique when human subjects were viewing natural images of different categories. The fMRI signals associated with subjects viewing different images encode the human perception of natural images, and therefore may capture image variability within- and cross- categories. We then select image features with the guidance of fMRI signals from brain regions with active response to image viewing. Particularly, bag of words features based on GIST descriptor are extracted from natural images for classification, and a sparse regression base feature selection method is adapted to select image features that can best predict fMRI signals. Finally, a classification model is built on the select image features to classify images without fMRI signals. The validation experiments for classifying images from 4 categories of two subjects have demonstrated that our method could achieve much better classification performance than the classifiers built on image feature selected by traditional feature selection methods.
Li, Jiangeng; Su, Lei; Pang, Zenan
2015-12-01
Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.
USDA-ARS?s Scientific Manuscript database
The availability of numerous spectral, spatial, and contextual features with object-based image analysis (OBIA) renders the selection of optimal features a time consuming and subjective process. While several feature election methods have been used in conjunction with OBIA, a robust comparison of th...
Multi-level gene/MiRNA feature selection using deep belief nets and active learning.
Ibrahim, Rania; Yousri, Noha A; Ismail, Mohamed A; El-Makky, Nagwa M
2014-01-01
Selecting the most discriminative genes/miRNAs has been raised as an important task in bioinformatics to enhance disease classifiers and to mitigate the dimensionality curse problem. Original feature selection methods choose genes/miRNAs based on their individual features regardless of how they perform together. Considering group features instead of individual ones provides a better view for selecting the most informative genes/miRNAs. Recently, deep learning has proven its ability in representing the data in multiple levels of abstraction, allowing for better discrimination between different classes. However, the idea of using deep learning for feature selection is not widely used in the bioinformatics field yet. In this paper, a novel multi-level feature selection approach named MLFS is proposed for selecting genes/miRNAs based on expression profiles. The approach is based on both deep and active learning. Moreover, an extension to use the technique for miRNAs is presented by considering the biological relation between miRNAs and genes. Experimental results show that the approach was able to outperform classical feature selection methods in hepatocellular carcinoma (HCC) by 9%, lung cancer by 6% and breast cancer by around 10% in F1-measure. Results also show the enhancement in F1-measure of our approach over recently related work in [1] and [2].
Liu, Shengyu; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Fan, Xiaoming
2015-01-01
Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.
Ortega, Julio; Asensio-Cubero, Javier; Gan, John Q; Ortiz, Andrés
2016-07-15
Brain-computer interfacing (BCI) applications based on the classification of electroencephalographic (EEG) signals require solving high-dimensional pattern classification problems with such a relatively small number of training patterns that curse of dimensionality problems usually arise. Multiresolution analysis (MRA) has useful properties for signal analysis in both temporal and spectral analysis, and has been broadly used in the BCI field. However, MRA usually increases the dimensionality of the input data. Therefore, some approaches to feature selection or feature dimensionality reduction should be considered for improving the performance of the MRA based BCI. This paper investigates feature selection in the MRA-based frameworks for BCI. Several wrapper approaches to evolutionary multiobjective feature selection are proposed with different structures of classifiers. They are evaluated by comparing with baseline methods using sparse representation of features or without feature selection. The statistical analysis, by applying the Kolmogorov-Smirnoff and Kruskal-Wallis tests to the means of the Kappa values evaluated by using the test patterns in each approach, has demonstrated some advantages of the proposed approaches. In comparison with the baseline MRA approach used in previous studies, the proposed evolutionary multiobjective feature selection approaches provide similar or even better classification performances, with significant reduction in the number of features that need to be computed.
Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio
2014-01-01
Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686
Adaptive runtime for a multiprocessing API
Antao, Samuel F.; Bertolli, Carlo; Eichenberger, Alexandre E.; O'Brien, John K.
2016-11-15
A computer-implemented method includes selecting a runtime for executing a program. The runtime includes a first combination of feature implementations, where each feature implementation implements a feature of an application programming interface (API). Execution of the program is monitored, and the execution uses the runtime. Monitor data is generated based on the monitoring. A second combination of feature implementations are selected, by a computer processor, where the selection is based at least in part on the monitor data. The runtime is modified by activating the second combination of feature implementations to replace the first combination of feature implementations.
Adaptive runtime for a multiprocessing API
Antao, Samuel F.; Bertolli, Carlo; Eichenberger, Alexandre E.; O'Brien, John K.
2016-10-11
A computer-implemented method includes selecting a runtime for executing a program. The runtime includes a first combination of feature implementations, where each feature implementation implements a feature of an application programming interface (API). Execution of the program is monitored, and the execution uses the runtime. Monitor data is generated based on the monitoring. A second combination of feature implementations are selected, by a computer processor, where the selection is based at least in part on the monitor data. The runtime is modified by activating the second combination of feature implementations to replace the first combination of feature implementations.
Infrared face recognition based on LBP histogram and KW feature selection
NASA Astrophysics Data System (ADS)
Xie, Zhihua
2014-07-01
The conventional LBP-based feature as represented by the local binary pattern (LBP) histogram still has room for performance improvements. This paper focuses on the dimension reduction of LBP micro-patterns and proposes an improved infrared face recognition method based on LBP histogram representation. To extract the local robust features in infrared face images, LBP is chosen to get the composition of micro-patterns of sub-blocks. Based on statistical test theory, Kruskal-Wallis (KW) feature selection method is proposed to get the LBP patterns which are suitable for infrared face recognition. The experimental results show combination of LBP and KW features selection improves the performance of infrared face recognition, the proposed method outperforms the traditional methods based on LBP histogram, discrete cosine transform(DCT) or principal component analysis(PCA).
Li, Jing; Hong, Wenxue
2014-12-01
The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.
Sun, Lei; Wang, Jun; Wei, Jinmao
2017-03-14
The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.
NASA Astrophysics Data System (ADS)
Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin
2017-01-01
We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.
A feature selection approach towards progressive vector transmission over the Internet
NASA Astrophysics Data System (ADS)
Miao, Ru; Song, Jia; Feng, Min
2017-09-01
WebGIS has been applied for visualizing and sharing geospatial information popularly over the Internet. In order to improve the efficiency of the client applications, the web-based progressive vector transmission approach is proposed. Important features should be selected and transferred firstly, and the methods for measuring the importance of features should be further considered in the progressive transmission. However, studies on progressive transmission for large-volume vector data have mostly focused on map generalization in the field of cartography, but rarely discussed on the selection of geographic features quantitatively. This paper applies information theory for measuring the feature importance of vector maps. A measurement model for the amount of information of vector features is defined based upon the amount of information for dealing with feature selection issues. The measurement model involves geometry factor, spatial distribution factor and thematic attribute factor. Moreover, a real-time transport protocol (RTP)-based progressive transmission method is then presented to improve the transmission of vector data. To clearly demonstrate the essential methodology and key techniques, a prototype for web-based progressive vector transmission is presented, and an experiment of progressive selection and transmission for vector features is conducted. The experimental results indicate that our approach clearly improves the performance and end-user experience of delivering and manipulating large vector data over the Internet.
HIV-1 protease cleavage site prediction based on two-stage feature selection method.
Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong
2013-03-01
Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.
A novel feature extraction approach for microarray data based on multi-algorithm fusion
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277
A novel feature extraction approach for microarray data based on multi-algorithm fusion.
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.
System Complexity Reduction via Feature Selection
ERIC Educational Resources Information Center
Deng, Houtao
2011-01-01
This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree…
IMMAN: free software for information theory-based chemometric analysis.
Urias, Ricardo W Pino; Barigye, Stephen J; Marrero-Ponce, Yovani; García-Jacas, César R; Valdes-Martiní, José R; Perez-Gimenez, Facundo
2015-05-01
The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. Graphic representation for Shannon's distribution of MD calculating software.
Feature selection for elderly faller classification based on wearable sensors.
Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D
2017-05-30
Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.
A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease
NASA Astrophysics Data System (ADS)
Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas
2017-08-01
The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.
Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272
Zhao, Yu-Xiang; Chou, Chien-Hsing
2016-01-01
In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346
Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.
Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor
2010-08-01
Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.
Gao, JianZhao; Tao, Xue-Wen; Zhao, Jia; Feng, Yuan-Ming; Cai, Yu-Dong; Zhang, Ning
2017-01-01
Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
NASA Astrophysics Data System (ADS)
Belciug, Smaranda; Serbanescu, Mircea-Sebastian
2015-09-01
Feature selection is considered a key factor in classifications/decision problems. It is currently used in designing intelligent decision systems to choose the best features which allow the best performance. This paper proposes a regression-based approach to select the most important predictors to significantly increase the classification performance. Application to breast cancer detection and recurrence using publically available datasets proved the efficiency of this technique.
Minimizing the semantic gap in biomedical content-based image retrieval
NASA Astrophysics Data System (ADS)
Guan, Haiying; Antani, Sameer; Long, L. Rodney; Thoma, George R.
2010-03-01
A major challenge in biomedical Content-Based Image Retrieval (CBIR) is to achieve meaningful mappings that minimize the semantic gap between the high-level biomedical semantic concepts and the low-level visual features in images. This paper presents a comprehensive learning-based scheme toward meeting this challenge and improving retrieval quality. The article presents two algorithms: a learning-based feature selection and fusion algorithm and the Ranking Support Vector Machine (Ranking SVM) algorithm. The feature selection algorithm aims to select 'good' features and fuse them using different similarity measurements to provide a better representation of the high-level concepts with the low-level image features. Ranking SVM is applied to learn the retrieval rank function and associate the selected low-level features with query concepts, given the ground-truth ranking of the training samples. The proposed scheme addresses four major issues in CBIR to improve the retrieval accuracy: image feature extraction, selection and fusion, similarity measurements, the association of the low-level features with high-level concepts, and the generation of the rank function to support high-level semantic image retrieval. It models the relationship between semantic concepts and image features, and enables retrieval at the semantic level. We apply it to the problem of vertebra shape retrieval from a digitized spine x-ray image set collected by the second National Health and Nutrition Examination Survey (NHANES II). The experimental results show an improvement of up to 41.92% in the mean average precision (MAP) over conventional image similarity computation methods.
Lin, Kuan-Cheng; Hsieh, Yi-Hsiu
2015-10-01
The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.
Kesharaju, Manasa; Nagarajah, Romesh
2015-09-01
The motivation for this research stems from a need for providing a non-destructive testing method capable of detecting and locating any defects and microstructural variations within armour ceramic components before issuing them to the soldiers who rely on them for their survival. The development of an automated ultrasonic inspection based classification system would make possible the checking of each ceramic component and immediately alert the operator about the presence of defects. Generally, in many classification problems a choice of features or dimensionality reduction is significant and simultaneously very difficult, as a substantial computational effort is required to evaluate possible feature subsets. In this research, a combination of artificial neural networks and genetic algorithms are used to optimize the feature subset used in classification of various defects in reaction-sintered silicon carbide ceramic components. Initially wavelet based feature extraction is implemented from the region of interest. An Artificial Neural Network classifier is employed to evaluate the performance of these features. Genetic Algorithm based feature selection is performed. Principal Component Analysis is a popular technique used for feature selection and is compared with the genetic algorithm based technique in terms of classification accuracy and selection of optimal number of features. The experimental results confirm that features identified by Principal Component Analysis lead to improved performance in terms of classification percentage with 96% than Genetic algorithm with 94%. Copyright © 2015 Elsevier B.V. All rights reserved.
Feature-based and spatial attentional selection in visual working memory.
Heuer, Anna; Schubö, Anna
2016-05-01
The contents of visual working memory (VWM) can be modulated by spatial cues presented during the maintenance interval ("retrocues"). Here, we examined whether attentional selection of representations in VWM can also be based on features. In addition, we investigated whether the mechanisms of feature-based and spatial attention in VWM differ with respect to parallel access to noncontiguous locations. In two experiments, we tested the efficacy of valid retrocues relying on different kinds of information. Specifically, participants were presented with a typical spatial retrocue pointing to two locations, a symbolic spatial retrocue (numbers mapping onto two locations), and two feature-based retrocues: a color retrocue (a blob of the same color as two of the items) and a shape retrocue (an outline of the shape of two of the items). The two cued items were presented at either contiguous or noncontiguous locations. Overall retrocueing benefits, as compared to a neutral condition, were observed for all retrocue types. Whereas feature-based retrocues yielded benefits for cued items presented at both contiguous and noncontiguous locations, spatial retrocues were only effective when the cued items had been presented at contiguous locations. These findings demonstrate that attentional selection and updating in VWM can operate on different kinds of information, allowing for a flexible and efficient use of this limited system. The observation that the representations of items presented at noncontiguous locations could only be reliably selected with feature-based retrocues suggests that feature-based and spatial attentional selection in VWM rely on different mechanisms, as has been shown for attentional orienting in the external world.
Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos
2017-04-13
Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis classifiers compared to individual feature selection methods, producing a robust system that achieved over 90% accuracy in all classifiers. The results suggest that the proposed methods can be used to support specialists in the classification of migraines in patients undergoing magnetic resonance imaging.
Effect of feature-selective attention on neuronal responses in macaque area MT
Chen, X.; Hoffmann, K.-P.; Albright, T. D.
2012-01-01
Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color). PMID:22170961
Effect of feature-selective attention on neuronal responses in macaque area MT.
Chen, X; Hoffmann, K-P; Albright, T D; Thiele, A
2012-03-01
Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color).
An improved wrapper-based feature selection method for machinery fault diagnosis
2017-01-01
A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks. PMID:29261689
Zhang, Chuncheng; Song, Sutao; Wen, Xiaotong; Yao, Li; Long, Zhiying
2015-04-30
Feature selection plays an important role in improving the classification accuracy of multivariate classification techniques in the context of fMRI-based decoding due to the "few samples and large features" nature of functional magnetic resonance imaging (fMRI) data. Recently, several sparse representation methods have been applied to the voxel selection of fMRI data. Despite the low computational efficiency of the sparse representation methods, they still displayed promise for applications that select features from fMRI data. In this study, we proposed the Laplacian smoothed L0 norm (LSL0) approach for feature selection of fMRI data. Based on the fast sparse decomposition using smoothed L0 norm (SL0) (Mohimani, 2007), the LSL0 method used the Laplacian function to approximate the L0 norm of sources. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of LSL0 for the sparse source estimation and feature selection. Simulated results indicated that LSL0 produced more accurate source estimation than SL0 at high noise levels. The classification accuracy using voxels that were selected by LSL0 was higher than that by SL0 in both simulated and real fMRI experiment. Moreover, both LSL0 and SL0 showed higher classification accuracy and required less time than ICA and t-test for the fMRI decoding. LSL0 outperformed SL0 in sparse source estimation at high noise level and in feature selection. Moreover, LSL0 and SL0 showed better performance than ICA and t-test for feature selection. Copyright © 2015 Elsevier B.V. All rights reserved.
On the use of feature selection to improve the detection of sea oil spills in SAR images
NASA Astrophysics Data System (ADS)
Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo
2017-03-01
Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to this category.
Islam, Md Rabiul; Tanaka, Toshihisa; Molla, Md Khademul Islam
2018-05-08
When designing multiclass motor imagery-based brain-computer interface (MI-BCI), a so-called tangent space mapping (TSM) method utilizing the geometric structure of covariance matrices is an effective technique. This paper aims to introduce a method using TSM for finding accurate operational frequency bands related brain activities associated with MI tasks. A multichannel electroencephalogram (EEG) signal is decomposed into multiple subbands, and tangent features are then estimated on each subband. A mutual information analysis-based effective algorithm is implemented to select subbands containing features capable of improving motor imagery classification accuracy. Thus obtained features of selected subbands are combined to get feature space. A principal component analysis-based approach is employed to reduce the features dimension and then the classification is accomplished by a support vector machine (SVM). Offline analysis demonstrates the proposed multiband tangent space mapping with subband selection (MTSMS) approach outperforms state-of-the-art methods. It acheives the highest average classification accuracy for all datasets (BCI competition dataset 2a, IIIa, IIIb, and dataset JK-HH1). The increased classification accuracy of MI tasks with the proposed MTSMS approach can yield effective implementation of BCI. The mutual information-based subband selection method is implemented to tune operation frequency bands to represent actual motor imagery tasks.
Vessel Classification in Cosmo-Skymed SAR Data Using Hierarchical Feature Selection
NASA Astrophysics Data System (ADS)
Makedonas, A.; Theoharatos, C.; Tsagaris, V.; Anastasopoulos, V.; Costicoglou, S.
2015-04-01
SAR based ship detection and classification are important elements of maritime monitoring applications. Recently, high-resolution SAR data have opened new possibilities to researchers for achieving improved classification results. In this work, a hierarchical vessel classification procedure is presented based on a robust feature extraction and selection scheme that utilizes scale, shape and texture features in a hierarchical way. Initially, different types of feature extraction algorithms are implemented in order to form the utilized feature pool, able to represent the structure, material, orientation and other vessel type characteristics. A two-stage hierarchical feature selection algorithm is utilized next in order to be able to discriminate effectively civilian vessels into three distinct types, in COSMO-SkyMed SAR images: cargos, small ships and tankers. In our analysis, scale and shape features are utilized in order to discriminate smaller types of vessels present in the available SAR data, or shape specific vessels. Then, the most informative texture and intensity features are incorporated in order to be able to better distinguish the civilian types with high accuracy. A feature selection procedure that utilizes heuristic measures based on features' statistical characteristics, followed by an exhaustive research with feature sets formed by the most qualified features is carried out, in order to discriminate the most appropriate combination of features for the final classification. In our analysis, five COSMO-SkyMed SAR data with 2.2m x 2.2m resolution were used to analyse the detailed characteristics of these types of ships. A total of 111 ships with available AIS data were used in the classification process. The experimental results show that this method has good performance in ship classification, with an overall accuracy reaching 83%. Further investigation of additional features and proper feature selection is currently in progress.
[Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].
Zhou, Jinzhi; Tang, Xiaofang
2015-08-01
In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.
Neural mechanisms of selective attention in the somatosensory system.
Gomez-Ramirez, Manuel; Hysaj, Kristjana; Niebur, Ernst
2016-09-01
Selective attention allows organisms to extract behaviorally relevant information while ignoring distracting stimuli that compete for the limited resources of their central nervous systems. Attention is highly flexible, and it can be harnessed to select information based on sensory modality, within-modality feature(s), spatial location, object identity, and/or temporal properties. In this review, we discuss the body of work devoted to understanding mechanisms of selective attention in the somatosensory system. In particular, we describe the effects of attention on tactile behavior and corresponding neural activity in somatosensory cortex. Our focus is on neural mechanisms that select tactile stimuli based on their location on the body (somatotopic-based attention) or their sensory feature (feature-based attention). We highlight parallels between selection mechanisms in touch and other sensory systems and discuss several putative neural coding schemes employed by cortical populations to signal the behavioral relevance of sensory inputs. Specifically, we contrast the advantages and disadvantages of using a gain vs. spike-spike correlation code for representing attended sensory stimuli. We favor a neural network model of tactile attention that is composed of frontal, parietal, and subcortical areas that controls somatosensory cells encoding the relevant stimulus features to enable preferential processing throughout the somatosensory hierarchy. Our review is based on data from noninvasive electrophysiological and imaging data in humans as well as single-unit recordings in nonhuman primates. Copyright © 2016 the American Physiological Society.
Neural mechanisms of selective attention in the somatosensory system
Hysaj, Kristjana; Niebur, Ernst
2016-01-01
Selective attention allows organisms to extract behaviorally relevant information while ignoring distracting stimuli that compete for the limited resources of their central nervous systems. Attention is highly flexible, and it can be harnessed to select information based on sensory modality, within-modality feature(s), spatial location, object identity, and/or temporal properties. In this review, we discuss the body of work devoted to understanding mechanisms of selective attention in the somatosensory system. In particular, we describe the effects of attention on tactile behavior and corresponding neural activity in somatosensory cortex. Our focus is on neural mechanisms that select tactile stimuli based on their location on the body (somatotopic-based attention) or their sensory feature (feature-based attention). We highlight parallels between selection mechanisms in touch and other sensory systems and discuss several putative neural coding schemes employed by cortical populations to signal the behavioral relevance of sensory inputs. Specifically, we contrast the advantages and disadvantages of using a gain vs. spike-spike correlation code for representing attended sensory stimuli. We favor a neural network model of tactile attention that is composed of frontal, parietal, and subcortical areas that controls somatosensory cells encoding the relevant stimulus features to enable preferential processing throughout the somatosensory hierarchy. Our review is based on data from noninvasive electrophysiological and imaging data in humans as well as single-unit recordings in nonhuman primates. PMID:27334956
Sparse Zero-Sum Games as Stable Functional Feature Selection
Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel
2015-01-01
In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268
An ant colony optimization based feature selection for web page classification.
Saraç, Esra; Özel, Selma Ayşe
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
Diagnosing and ranking retinopathy disease level using diabetic fundus image recuperation approach.
Somasundaram, K; Rajendran, P Alli
2015-01-01
Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time.
Diagnosing and Ranking Retinopathy Disease Level Using Diabetic Fundus Image Recuperation Approach
Somasundaram, K.; Alli Rajendran, P.
2015-01-01
Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time. PMID:25945362
A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.
Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho
2014-10-01
Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
2015-01-01
Retinal fundus images are widely used in diagnosing and providing treatment for several eye diseases. Prior works using retinal fundus images detected the presence of exudation with the aid of publicly available dataset using extensive segmentation process. Though it was proved to be computationally efficient, it failed to create a diabetic retinopathy feature selection system for transparently diagnosing the disease state. Also the diagnosis of diseases did not employ machine learning methods to categorize candidate fundus images into true positive and true negative ratio. Several candidate fundus images did not include more detailed feature selection technique for diabetic retinopathy. To apply machine learning methods and classify the candidate fundus images on the basis of sliding window a method called, Diabetic Fundus Image Recuperation (DFIR) is designed in this paper. The initial phase of DFIR method select the feature of optic cup in digital retinal fundus images based on Sliding Window Approach. With this, the disease state for diabetic retinopathy is assessed. The feature selection in DFIR method uses collection of sliding windows to obtain the features based on the histogram value. The histogram based feature selection with the aid of Group Sparsity Non-overlapping function provides more detailed information of features. Using Support Vector Model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy diseases. The ranking of disease level for each candidate set provides a much promising result for developing practically automated diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, specificity rate, ranking efficiency and feature selection time. PMID:25974230
A universal deep learning approach for modeling the flow of patients under different severities.
Jiang, Shancheng; Chin, Kwai-Sang; Tsui, Kwok L
2018-02-01
The Accident and Emergency Department (A&ED) is the frontline for providing emergency care in hospitals. Unfortunately, relative A&ED resources have failed to keep up with continuously increasing demand in recent years, which leads to overcrowding in A&ED. Knowing the fluctuation of patient arrival volume in advance is a significant premise to relieve this pressure. Based on this motivation, the objective of this study is to explore an integrated framework with high accuracy for predicting A&ED patient flow under different triage levels, by combining a novel feature selection process with deep neural networks. Administrative data is collected from an actual A&ED and categorized into five groups based on different triage levels. A genetic algorithm (GA)-based feature selection algorithm is improved and implemented as a pre-processing step for this time-series prediction problem, in order to explore key features affecting patient flow. In our improved GA, a fitness-based crossover is proposed to maintain the joint information of multiple features during iterative process, instead of traditional point-based crossover. Deep neural networks (DNN) is employed as the prediction model to utilize their universal adaptability and high flexibility. In the model-training process, the learning algorithm is well-configured based on a parallel stochastic gradient descent algorithm. Two effective regularization strategies are integrated in one DNN framework to avoid overfitting. All introduced hyper-parameters are optimized efficiently by grid-search in one pass. As for feature selection, our improved GA-based feature selection algorithm has outperformed a typical GA and four state-of-the-art feature selection algorithms (mRMR, SAFS, VIFR, and CFR). As for the prediction accuracy of proposed integrated framework, compared with other frequently used statistical models (GLM, seasonal-ARIMA, ARIMAX, and ANN) and modern machine models (SVM-RBF, SVM-linear, RF, and R-LASSO), the proposed integrated "DNN-I-GA" framework achieves higher prediction accuracy on both MAPE and RMSE metrics in pairwise comparisons. The contribution of our study is two-fold. Theoretically, the traditional GA-based feature selection process is improved to have less hyper-parameters and higher efficiency, and the joint information of multiple features is maintained by fitness-based crossover operator. The universal property of DNN is further enhanced by merging different regularization strategies. Practically, features selected by our improved GA can be used to acquire an underlying relationship between patient flows and input features. Predictive values are significant indicators of patients' demand and can be used by A&ED managers to make resource planning and allocation. High accuracy achieved by the present framework in different cases enhances the reliability of downstream decision makings. Copyright © 2017 Elsevier B.V. All rights reserved.
Li, Baopu; Meng, Max Q-H
2012-05-01
Tumor in digestive tract is a common disease and wireless capsule endoscopy (WCE) is a relatively new technology to examine diseases for digestive tract especially for small intestine. This paper addresses the problem of automatic recognition of tumor for WCE images. Candidate color texture feature that integrates uniform local binary pattern and wavelet is proposed to characterize WCE images. The proposed features are invariant to illumination change and describe multiresolution characteristics of WCE images. Two feature selection approaches based on support vector machine, sequential forward floating selection and recursive feature elimination, are further employed to refine the proposed features for improving the detection accuracy. Extensive experiments validate that the proposed computer-aided diagnosis system achieves a promising tumor recognition accuracy of 92.4% in WCE images on our collected data.
Tan, Maxine; Pu, Jiantao; Zheng, Bin
2014-01-01
Purpose: Improving radiologists’ performance in classification between malignant and benign breast lesions is important to increase cancer detection sensitivity and reduce false-positive recalls. For this purpose, developing computer-aided diagnosis (CAD) schemes has been attracting research interest in recent years. In this study, we investigated a new feature selection method for the task of breast mass classification. Methods: We initially computed 181 image features based on mass shape, spiculation, contrast, presence of fat or calcifications, texture, isodensity, and other morphological features. From this large image feature pool, we used a sequential forward floating selection (SFFS)-based feature selection method to select relevant features, and analyzed their performance using a support vector machine (SVM) model trained for the classification task. On a database of 600 benign and 600 malignant mass regions of interest (ROIs), we performed the study using a ten-fold cross-validation method. Feature selection and optimization of the SVM parameters were conducted on the training subsets only. Results: The area under the receiver operating characteristic curve (AUC) = 0.805±0.012 was obtained for the classification task. The results also showed that the most frequently-selected features by the SFFS-based algorithm in 10-fold iterations were those related to mass shape, isodensity and presence of fat, which are consistent with the image features frequently used by radiologists in the clinical environment for mass classification. The study also indicated that accurately computing mass spiculation features from the projection mammograms was difficult, and failed to perform well for the mass classification task due to tissue overlap within the benign mass regions. Conclusions: In conclusion, this comprehensive feature analysis study provided new and valuable information for optimizing computerized mass classification schemes that may have potential to be useful as a “second reader” in future clinical practice. PMID:24664267
The optional selection of micro-motion feature based on Support Vector Machine
NASA Astrophysics Data System (ADS)
Li, Bo; Ren, Hongmei; Xiao, Zhi-he; Sheng, Jing
2017-11-01
Micro-motion form of target is multiple, different micro-motion forms are apt to be modulated, which makes it difficult for feature extraction and recognition. Aiming at feature extraction of cone-shaped objects with different micro-motion forms, this paper proposes the best selection method of micro-motion feature based on support vector machine. After the time-frequency distribution of radar echoes, comparing the time-frequency spectrum of objects with different micro-motion forms, features are extracted based on the differences between the instantaneous frequency variations of different micro-motions. According to the methods based on SVM (Support Vector Machine) features are extracted, then the best features are acquired. Finally, the result shows the method proposed in this paper is feasible under the test condition of certain signal-to-noise ratio(SNR).
A ROC-based feature selection method for computer-aided detection and diagnosis
NASA Astrophysics Data System (ADS)
Wang, Songyuan; Zhang, Guopeng; Liao, Qimei; Zhang, Junying; Jiao, Chun; Lu, Hongbing
2014-03-01
Image-based computer-aided detection and diagnosis (CAD) has been a very active research topic aiming to assist physicians to detect lesions and distinguish them from benign to malignant. However, the datasets fed into a classifier usually suffer from small number of samples, as well as significantly less samples available in one class (have a disease) than the other, resulting in the classifier's suboptimal performance. How to identifying the most characterizing features of the observed data for lesion detection is critical to improve the sensitivity and minimize false positives of a CAD system. In this study, we propose a novel feature selection method mR-FAST that combines the minimal-redundancymaximal relevance (mRMR) framework with a selection metric FAST (feature assessment by sliding thresholds) based on the area under a ROC curve (AUC) generated on optimal simple linear discriminants. With three feature datasets extracted from CAD systems for colon polyps and bladder cancer, we show that the space of candidate features selected by mR-FAST is more characterizing for lesion detection with higher AUC, enabling to find a compact subset of superior features at low cost.
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.
Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing
2009-03-06
Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.
Automatic parameter selection for feature-based multi-sensor image registration
NASA Astrophysics Data System (ADS)
DelMarco, Stephen; Tom, Victor; Webb, Helen; Chao, Alan
2006-05-01
Accurate image registration is critical for applications such as precision targeting, geo-location, change-detection, surveillance, and remote sensing. However, the increasing volume of image data is exceeding the current capacity of human analysts to perform manual registration. This image data glut necessitates the development of automated approaches to image registration, including algorithm parameter value selection. Proper parameter value selection is crucial to the success of registration techniques. The appropriate algorithm parameters can be highly scene and sensor dependent. Therefore, robust algorithm parameter value selection approaches are a critical component of an end-to-end image registration algorithm. In previous work, we developed a general framework for multisensor image registration which includes feature-based registration approaches. In this work we examine the problem of automated parameter selection. We apply the automated parameter selection approach of Yitzhaky and Peli to select parameters for feature-based registration of multisensor image data. The approach consists of generating multiple feature-detected images by sweeping over parameter combinations and using these images to generate estimated ground truth. The feature-detected images are compared to the estimated ground truth images to generate ROC points associated with each parameter combination. We develop a strategy for selecting the optimal parameter set by choosing the parameter combination corresponding to the optimal ROC point. We present numerical results showing the effectiveness of the approach using registration of collected SAR data to reference EO data.
NASA Astrophysics Data System (ADS)
Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen
2018-01-01
Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.
NetProt: Complex-based Feature Selection.
Goh, Wilson Wen Bin; Wong, Limsoon
2017-08-04
Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .
Feature-based attentional modulations in the absence of direct visual stimulation.
Serences, John T; Boynton, Geoffrey M
2007-07-19
When faced with a crowded visual scene, observers must selectively attend to behaviorally relevant objects to avoid sensory overload. Often this selection process is guided by prior knowledge of a target-defining feature (e.g., the color red when looking for an apple), which enhances the firing rate of visual neurons that are selective for the attended feature. Here, we used functional magnetic resonance imaging and a pattern classification algorithm to predict the attentional state of human observers as they monitored a visual feature (one of two directions of motion). We find that feature-specific attention effects spread across the visual field-even to regions of the scene that do not contain a stimulus. This spread of feature-based attention to empty regions of space may facilitate the perception of behaviorally relevant stimuli by increasing sensitivity to attended features at all locations in the visual field.
Nankali, Saber; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-01
In external‐beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation‐based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two “Genetic” and “Ranker” searching procedures. The performance of these algorithms has been evaluated using four‐dimensional extended cardiac‐torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro‐fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F‐test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation‐based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers. PACS numbers: 87.55.km, 87.56.Fc PMID:26894358
Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-08
In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers.
EEG-based mild depressive detection using feature selection methods and classifiers.
Li, Xiaowei; Hu, Bin; Sun, Shuting; Cai, Hanshu
2016-11-01
Depression has become a major health burden worldwide, and effectively detection of such disorder is a great challenge which requires latest technological tool, such as Electroencephalography (EEG). This EEG-based research seeks to find prominent frequency band and brain regions that are most related to mild depression, as well as an optimal combination of classification algorithms and feature selection methods which can be used in future mild depression detection. An experiment based on facial expression viewing task (Emo_block and Neu_block) was conducted, and EEG data of 37 university students were collected using a 128 channel HydroCel Geodesic Sensor Net (HCGSN). For discriminating mild depressive patients and normal controls, BayesNet (BN), Support Vector Machine (SVM), Logistic Regression (LR), k-nearest neighbor (KNN) and RandomForest (RF) classifiers were used. And BestFirst (BF), GreedyStepwise (GSW), GeneticSearch (GS), LinearForwordSelection (LFS) and RankSearch (RS) based on Correlation Features Selection (CFS) were applied for linear and non-linear EEG features selection. Independent Samples T-test with Bonferroni correction was used to find the significantly discriminant electrodes and features. Data mining results indicate that optimal performance is achieved using a combination of feature selection method GSW based on CFS and classifier KNN for beta frequency band. Accuracies achieved 92.00% and 98.00%, and AUC achieved 0.957 and 0.997, for Emo_block and Neu_block beta band data respectively. T-test results validate the effectiveness of selected features by search method GSW. Simplified EEG system with only FP1, FP2, F3, O2, T3 electrodes was also explored with linear features, which yielded accuracies of 91.70% and 96.00%, AUC of 0.952 and 0.972, for Emo_block and Neu_block respectively. Classification results obtained by GSW + KNN are encouraging and better than previously published results. In the spatial distribution of features, we find that left parietotemporal lobe in beta EEG frequency band has greater effect on mild depression detection. And fewer EEG channels (FP1, FP2, F3, O2 and T3) combined with linear features may be good candidates for usage in portable systems for mild depression detection. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
An Ant Colony Optimization Based Feature Selection for Web Page Classification
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678
Hypergraph Based Feature Selection Technique for Medical Diagnosis.
Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar
2016-11-01
The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.
Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei
2018-02-01
Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.
Image search engine with selective filtering and feature-element-based classification
NASA Astrophysics Data System (ADS)
Li, Qing; Zhang, Yujin; Dai, Shengyang
2001-12-01
With the growth of Internet and storage capability in recent years, image has become a widespread information format in World Wide Web. However, it has become increasingly harder to search for images of interest, and effective image search engine for the WWW needs to be developed. We propose in this paper a selective filtering process and a novel approach for image classification based on feature element in the image search engine we developed for the WWW. First a selective filtering process is embedded in a general web crawler to filter out the meaningless images with GIF format. Two parameters that can be obtained easily are used in the filtering process. Our classification approach first extract feature elements from images instead of feature vectors. Compared with feature vectors, feature elements can better capture visual meanings of the image according to subjective perception of human beings. Different from traditional image classification method, our classification approach based on feature element doesn't calculate the distance between two vectors in the feature space, while trying to find associations between feature element and class attribute of the image. Experiments are presented to show the efficiency of the proposed approach.
Influence of time and length size feature selections for human activity sequences recognition.
Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran
2014-01-01
In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.
The Speed of Feature-Based Attention: Attentional Advantage Is Slow, but Selection Is Fast
ERIC Educational Resources Information Center
Huang, Liqiang
2010-01-01
When paying attention to a feature (e.g., red), no attentional advantage is gained in perceiving items with this feature in very brief displays. Therefore, feature-based attention seems to be slow. In previous feature-based attention studies, attention has often been measured as the difference in performance in a secondary task. In our recent work…
Hadoux, Xavier; Kumar, Dinesh Kant; Sarossy, Marc G; Roger, Jean-Michel; Gorretta, Nathalie
2016-05-19
Visible and near-infrared (Vis-NIR) spectra are generated by the combination of numerous low resolution features. Spectral variables are thus highly correlated, which can cause problems for selecting the most appropriate ones for a given application. Some decomposition bases such as Fourier or wavelet generally help highlighting spectral features that are important, but are by nature constraint to have both positive and negative components. Thus, in addition to complicating the selected features interpretability, it impedes their use for application-dedicated sensors. In this paper we have proposed a new method for feature selection: Application-Dedicated Selection of Filters (ADSF). This method relaxes the shape constraint by enabling the selection of any type of user defined custom features. By considering only relevant features, based on the underlying nature of the data, high regularization of the final model can be obtained, even in the small sample size context often encountered in spectroscopic applications. For larger scale deployment of application-dedicated sensors, these predefined feature constraints can lead to application specific optical filters, e.g., lowpass, highpass, bandpass or bandstop filters with positive only coefficients. In a similar fashion to Partial Least Squares, ADSF successively selects features using covariance maximization and deflates their influences using orthogonal projection in order to optimally tune the selection to the data with limited redundancy. ADSF is well suited for spectroscopic data as it can deal with large numbers of highly correlated variables in supervised learning, even with many correlated responses. Copyright © 2016 Elsevier B.V. All rights reserved.
Efficient Iris Recognition Based on Optimal Subfeature Selection and Weighted Subregion Fusion
Deng, Ning
2014-01-01
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, andMMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity. PMID:24683317
Efficient iris recognition based on optimal subfeature selection and weighted subregion fusion.
Chen, Ying; Liu, Yuanning; Zhu, Xiaodong; He, Fei; Wang, Hongye; Deng, Ning
2014-01-01
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, and MMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity.
Unbiased feature selection in learning random forests for high-dimensional data.
Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi
2015-01-01
Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter Je; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong
2017-11-01
Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan.
Adaptive feature selection using v-shaped binary particle swarm optimization.
Teng, Xuyang; Dong, Hongbin; Zhou, Xiurong
2017-01-01
Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers.
Adaptive feature selection using v-shaped binary particle swarm optimization
Dong, Hongbin; Zhou, Xiurong
2017-01-01
Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850
Investigating a memory-based account of negative priming: support for selection-feature mismatch.
MacDonald, P A; Joordens, S
2000-08-01
Using typical and modified negative priming tasks, the selection-feature mismatch account of negative priming was tested. In the modified task, participants performed selections on the basis of a semantic feature (e.g., referent size). This procedure has been shown to enhance negative priming (P. A. MacDonald, S. Joordens, & K. N. Seergobin, 1999). Across 3 experiments, negative priming occurred only when the repeated item mismatched in terms of the feature used as the basis for selections. When the repeated item was congruent on the selection feature across the prime and probe displays, positive priming arose. This pattern of results appeared in both the ignored- and the attended-repetition conditions. Negative priming does not result from previously ignoring an item. These findings strongly support the selection-feature mismatch account of negative priming and refute both the distractor inhibition and the episodic-retrieval explanations.
Patterson, Brent R.; Anderson, Morgan L.; Rodgers, Arthur R.; Vander Vennen, Lucas M.; Fryxell, John M.
2017-01-01
Woodland caribou (Rangifer tarandus caribou) in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water) linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon) over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression) at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism–a compensatory functional response to anthropogenic linear feature density resulting in decreased use of natural travel corridors–has negative consequences for the viability of woodland caribou. PMID:29117234
Newton, Erica J; Patterson, Brent R; Anderson, Morgan L; Rodgers, Arthur R; Vander Vennen, Lucas M; Fryxell, John M
2017-01-01
Woodland caribou (Rangifer tarandus caribou) in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water) linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon) over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression) at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism-a compensatory functional response to anthropogenic linear feature density resulting in decreased use of natural travel corridors-has negative consequences for the viability of woodland caribou.
2012-01-01
Background Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. Results We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension – UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. Conclusions Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/. PMID:22551170
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali
2015-01-01
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141
Effective traffic features selection algorithm for cyber-attacks samples
NASA Astrophysics Data System (ADS)
Li, Yihong; Liu, Fangzheng; Du, Zhenyu
2018-05-01
By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
2001-10-25
neural network (ANN) has been adopted for the human chromosome classification. It is important to select optimum features for training neural network...Many studies for computer-based chromosome analysis have shown that it is possible to classify chromosomes into 24 subgroups. In addition, artificial
Feature Selection Methods for Zero-Shot Learning of Neural Activity.
Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.
Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai
2017-12-26
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui
2016-06-01
Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.
The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.
Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo
2018-05-17
The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.
Selecting relevant 3D image features of margin sharpness and texture for lung nodule retrieval.
Ferreira, José Raniery; de Azevedo-Marques, Paulo Mazzoncini; Oliveira, Marcelo Costa
2017-03-01
Lung cancer is the leading cause of cancer-related deaths in the world. Its diagnosis is a challenge task to specialists due to several aspects on the classification of lung nodules. Therefore, it is important to integrate content-based image retrieval methods on the lung nodule classification process, since they are capable of retrieving similar cases from databases that were previously diagnosed. However, this mechanism depends on extracting relevant image features in order to obtain high efficiency. The goal of this paper is to perform the selection of 3D image features of margin sharpness and texture that can be relevant on the retrieval of similar cancerous and benign lung nodules. A total of 48 3D image attributes were extracted from the nodule volume. Border sharpness features were extracted from perpendicular lines drawn over the lesion boundary. Second-order texture features were extracted from a cooccurrence matrix. Relevant features were selected by a correlation-based method and a statistical significance analysis. Retrieval performance was assessed according to the nodule's potential malignancy on the 10 most similar cases and by the parameters of precision and recall. Statistical significant features reduced retrieval performance. Correlation-based method selected 2 margin sharpness attributes and 6 texture attributes and obtained higher precision compared to all 48 extracted features on similar nodule retrieval. Feature space dimensionality reduction of 83 % obtained higher retrieval performance and presented to be a computationaly low cost method of retrieving similar nodules for the diagnosis of lung cancer.
Wen, Tingxi; Zhang, Zhongnan
2017-01-01
Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789
Wen, Tingxi; Zhang, Zhongnan
2017-05-01
In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
Feature-Selective Attention Adaptively Shifts Noise Correlations in Primary Auditory Cortex.
Downer, Joshua D; Rapone, Brittany; Verhein, Jessica; O'Connor, Kevin N; Sutter, Mitchell L
2017-05-24
Sensory environments often contain an overwhelming amount of information, with both relevant and irrelevant information competing for neural resources. Feature attention mediates this competition by selecting the sensory features needed to form a coherent percept. How attention affects the activity of populations of neurons to support this process is poorly understood because population coding is typically studied through simulations in which one sensory feature is encoded without competition. Therefore, to study the effects of feature attention on population-based neural coding, investigations must be extended to include stimuli with both relevant and irrelevant features. We measured noise correlations ( r noise ) within small neural populations in primary auditory cortex while rhesus macaques performed a novel feature-selective attention task. We found that the effect of feature-selective attention on r noise depended not only on the population tuning to the attended feature, but also on the tuning to the distractor feature. To attempt to explain how these observed effects might support enhanced perceptual performance, we propose an extension of a simple and influential model in which shifts in r noise can simultaneously enhance the representation of the attended feature while suppressing the distractor. These findings present a novel mechanism by which attention modulates neural populations to support sensory processing in cluttered environments. SIGNIFICANCE STATEMENT Although feature-selective attention constitutes one of the building blocks of listening in natural environments, its neural bases remain obscure. To address this, we developed a novel auditory feature-selective attention task and measured noise correlations ( r noise ) in rhesus macaque A1 during task performance. Unlike previous studies showing that the effect of attention on r noise depends on population tuning to the attended feature, we show that the effect of attention depends on the tuning to the distractor feature as well. We suggest that these effects represent an efficient process by which sensory cortex simultaneously enhances relevant information and suppresses irrelevant information. Copyright © 2017 the authors 0270-6474/17/375378-15$15.00/0.
Feature-Selective Attention Adaptively Shifts Noise Correlations in Primary Auditory Cortex
2017-01-01
Sensory environments often contain an overwhelming amount of information, with both relevant and irrelevant information competing for neural resources. Feature attention mediates this competition by selecting the sensory features needed to form a coherent percept. How attention affects the activity of populations of neurons to support this process is poorly understood because population coding is typically studied through simulations in which one sensory feature is encoded without competition. Therefore, to study the effects of feature attention on population-based neural coding, investigations must be extended to include stimuli with both relevant and irrelevant features. We measured noise correlations (rnoise) within small neural populations in primary auditory cortex while rhesus macaques performed a novel feature-selective attention task. We found that the effect of feature-selective attention on rnoise depended not only on the population tuning to the attended feature, but also on the tuning to the distractor feature. To attempt to explain how these observed effects might support enhanced perceptual performance, we propose an extension of a simple and influential model in which shifts in rnoise can simultaneously enhance the representation of the attended feature while suppressing the distractor. These findings present a novel mechanism by which attention modulates neural populations to support sensory processing in cluttered environments. SIGNIFICANCE STATEMENT Although feature-selective attention constitutes one of the building blocks of listening in natural environments, its neural bases remain obscure. To address this, we developed a novel auditory feature-selective attention task and measured noise correlations (rnoise) in rhesus macaque A1 during task performance. Unlike previous studies showing that the effect of attention on rnoise depends on population tuning to the attended feature, we show that the effect of attention depends on the tuning to the distractor feature as well. We suggest that these effects represent an efficient process by which sensory cortex simultaneously enhances relevant information and suppresses irrelevant information. PMID:28432139
Zhang, Ao; Tian, Suyan
2018-05-01
Pathway-based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway-based feature selection algorithms into three major categories-penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes' connectivity information-based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real-world application, we have demonstrated that when the data-driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge-based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution
2013-01-01
Background Glycoproteins are involved in a diverse range of biochemical and biological processes. Changes in protein glycosylation are believed to occur in many diseases, particularly during cancer initiation and progression. The identification of biomarkers for human disease states is becoming increasingly important, as early detection is key to improving survival and recovery rates. To this end, the serum glycome has been proposed as a potential source of biomarkers for different types of cancers. High-throughput hydrophilic interaction liquid chromatography (HILIC) technology for glycan analysis allows for the detailed quantification of the glycan content in human serum. However, the experimental data from this analysis is compositional by nature. Compositional data are subject to a constant-sum constraint, which restricts the sample space to a simplex. Statistical analysis of glycan chromatography datasets should account for their unusual mathematical properties. As the volume of glycan HILIC data being produced increases, there is a considerable need for a framework to support appropriate statistical analysis. Proposed here is a methodology for feature selection in compositional data. The principal objective is to provide a template for the analysis of glycan chromatography data that may be used to identify potential glycan biomarkers. Results A greedy search algorithm, based on the generalized Dirichlet distribution, is carried out over the feature space to search for the set of “grouping variables” that best discriminate between known group structures in the data, modelling the compositional variables using beta distributions. The algorithm is applied to two glycan chromatography datasets. Statistical classification methods are used to test the ability of the selected features to differentiate between known groups in the data. Two well-known methods are used for comparison: correlation-based feature selection (CFS) and recursive partitioning (rpart). CFS is a feature selection method, while recursive partitioning is a learning tree algorithm that has been used for feature selection in the past. Conclusions The proposed feature selection method performs well for both glycan chromatography datasets. It is computationally slower, but results in a lower misclassification rate and a higher sensitivity rate than both correlation-based feature selection and the classification tree method. PMID:23651459
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Shiju; Qian, Wei; Guan, Yubao
2016-06-15
Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less
Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation
NASA Astrophysics Data System (ADS)
Tangaro, Sabina; Amoroso, Nicola; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Paolo, Inglese; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto
2015-01-01
Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic resonance imaging (MRI) scans can show these variations and therefore can be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust, and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach; for each voxel a number of local features were calculated. In this paper, we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) sequential forward selection and (iii) sequential backward elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 feature for each voxel (sequential backward elimination) we obtained comparable state-of-the-art performances with respect to the standard tool FreeSurfer.
Hybrid feature selection for supporting lightweight intrusion detection systems
NASA Astrophysics Data System (ADS)
Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin
2017-08-01
Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.
Feature Selection for Classification of Polar Regions Using a Fuzzy Expert System
NASA Technical Reports Server (NTRS)
Penaloza, Mauel A.; Welch, Ronald M.
1996-01-01
Labeling, feature selection, and the choice of classifier are critical elements for classification of scenes and for image understanding. This study examines several methods for feature selection in polar regions, including the list, of a fuzzy logic-based expert system for further refinement of a set of selected features. Six Advanced Very High Resolution Radiometer (AVHRR) Local Area Coverage (LAC) arctic scenes are classified into nine classes: water, snow / ice, ice cloud, land, thin stratus, stratus over water, cumulus over water, textured snow over water, and snow-covered mountains. Sixty-seven spectral and textural features are computed and analyzed by the feature selection algorithms. The divergence, histogram analysis, and discriminant analysis approaches are intercompared for their effectiveness in feature selection. The fuzzy expert system method is used not only to determine the effectiveness of each approach in classifying polar scenes, but also to further reduce the features into a more optimal set. For each selection method,features are ranked from best to worst, and the best half of the features are selected. Then, rules using these selected features are defined. The results of running the fuzzy expert system with these rules show that the divergence method produces the best set features, not only does it produce the highest classification accuracy, but also it has the lowest computation requirements. A reduction of the set of features produced by the divergence method using the fuzzy expert system results in an overall classification accuracy of over 95 %. However, this increase of accuracy has a high computation cost.
NASA Astrophysics Data System (ADS)
Lim, Meng-Hui; Teoh, Andrew Beng Jin
2011-12-01
Biometric discretization derives a binary string for each user based on an ordered set of biometric features. This representative string ought to be discriminative, informative, and privacy protective when it is employed as a cryptographic key in various security applications upon error correction. However, it is commonly believed that satisfying the first and the second criteria simultaneously is not feasible, and a tradeoff between them is always definite. In this article, we propose an effective fixed bit allocation-based discretization approach which involves discriminative feature extraction, discriminative feature selection, unsupervised quantization (quantization that does not utilize class information), and linearly separable subcode (LSSC)-based encoding to fulfill all the ideal properties of a binary representation extracted for cryptographic applications. In addition, we examine a number of discriminative feature-selection measures for discretization and identify the proper way of setting an important feature-selection parameter. Encouraging experimental results vindicate the feasibility of our approach.
Shi, Lei; Wan, Youchuan; Gao, Xianjun
2018-01-01
In object-based image analysis of high-resolution images, the number of features can reach hundreds, so it is necessary to perform feature reduction prior to classification. In this paper, a feature selection method based on the combination of a genetic algorithm (GA) and tabu search (TS) is presented. The proposed GATS method aims to reduce the premature convergence of the GA by the use of TS. A prematurity index is first defined to judge the convergence situation during the search. When premature convergence does take place, an improved mutation operator is executed, in which TS is performed on individuals with higher fitness values. As for the other individuals with lower fitness values, mutation with a higher probability is carried out. Experiments using the proposed GATS feature selection method and three other methods, a standard GA, the multistart TS method, and ReliefF, were conducted on WorldView-2 and QuickBird images. The experimental results showed that the proposed method outperforms the other methods in terms of the final classification accuracy. PMID:29581721
A Feature Selection Method Based on Fisher's Discriminant Ratio for Text Sentiment Classification
NASA Astrophysics Data System (ADS)
Wang, Suge; Li, Deyu; Wei, Yingjie; Li, Hongxia
With the rapid growth of e-commerce, product reviews on the Web have become an important information source for customers' decision making when they intend to buy some product. As the reviews are often too many for customers to go through, how to automatically classify them into different sentiment orientation categories (i.e. positive/negative) has become a research problem. In this paper, based on Fisher's discriminant ratio, an effective feature selection method is proposed for product review text sentiment classification. In order to validate the validity of the proposed method, we compared it with other methods respectively based on information gain and mutual information while support vector machine is adopted as the classifier. In this paper, 6 subexperiments are conducted by combining different feature selection methods with 2 kinds of candidate feature sets. Under 1006 review documents of cars, the experimental results indicate that the Fisher's discriminant ratio based on word frequency estimation has the best performance with F value 83.3% while the candidate features are the words which appear in both positive and negative texts.
Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection
Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M.
2016-01-01
Nondestructive Testing (NDT) assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms. PMID:27104533
An audiovisual emotion recognition system
NASA Astrophysics Data System (ADS)
Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun
2007-12-01
Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Ni, Qianwu; Chen, Lei
2017-01-01
Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
2013-01-01
Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313
A feature-based approach to modeling protein-protein interaction hot spots.
Cho, Kyu-il; Kim, Dongsup; Lee, Doheon
2009-05-01
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions.
Facial recognition using multisensor images based on localized kernel eigen spaces.
Gundimada, Satyanadh; Asari, Vijayan K
2009-06-01
A feature selection technique along with an information fusion procedure for improving the recognition accuracy of a visual and thermal image-based facial recognition system is presented in this paper. A novel modular kernel eigenspaces approach is developed and implemented on the phase congruency feature maps extracted from the visual and thermal images individually. Smaller sub-regions from a predefined neighborhood within the phase congruency images of the training samples are merged to obtain a large set of features. These features are then projected into higher dimensional spaces using kernel methods. The proposed localized nonlinear feature selection procedure helps to overcome the bottlenecks of illumination variations, partial occlusions, expression variations and variations due to temperature changes that affect the visual and thermal face recognition techniques. AR and Equinox databases are used for experimentation and evaluation of the proposed technique. The proposed feature selection procedure has greatly improved the recognition accuracy for both the visual and thermal images when compared to conventional techniques. Also, a decision level fusion methodology is presented which along with the feature selection procedure has outperformed various other face recognition techniques in terms of recognition accuracy.
Feature Selection Methods for Robust Decoding of Finger Movements in a Non-human Primate
Padmanaban, Subash; Baker, Justin; Greger, Bradley
2018-01-01
Objective: The performance of machine learning algorithms used for neural decoding of dexterous tasks may be impeded due to problems arising when dealing with high-dimensional data. The objective of feature selection algorithms is to choose a near-optimal subset of features from the original feature space to improve the performance of the decoding algorithm. The aim of our study was to compare the effects of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis (PCA), and Mutual Information Maximization on SVM classification performance for a dexterous decoding task. Approach: A nonhuman primate (NHP) was trained to perform small coordinated movements—similar to typing. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials (AP) during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon AP firing rates. We used the SVM classification to examine the functional parameters of (i) robustness to simulated failure and (ii) longevity of classification. We also compared the effect of using isolated-neuron and multi-unit firing rates as the feature vector supplied to the SVM. Main results: The average decoding accuracy for multi-unit features and single-unit features using Mutual Information Maximization (MIM) across 47 sessions was 96.74 ± 3.5% and 97.65 ± 3.36% respectively. The reduction in decoding accuracy between using 100% of the features and 10% of features based on MIM was 45.56% (from 93.7 to 51.09%) and 4.75% (from 95.32 to 90.79%) for multi-unit and single-unit features respectively. MIM had best performance compared to other feature selection methods. Significance: These results suggest improved decoding performance can be achieved by using optimally selected features. The results based on clinically relevant performance metrics also suggest that the decoding algorithm can be made robust by using optimal features and feature selection algorithms. We believe that even a few percent increase in performance is important and improves the decoding accuracy of the machine learning algorithm potentially increasing the ease of use of a brain machine interface. PMID:29467602
Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer
NASA Astrophysics Data System (ADS)
Zhang, Yucheng; Oikonomou, Anastasia; Wong, Alexander; Haider, Masoom A.; Khalvati, Farzad
2017-04-01
Radiomics characterizes tumor phenotypes by extracting large numbers of quantitative features from radiological images. Radiomic features have been shown to provide prognostic value in predicting clinical outcomes in several studies. However, several challenges including feature redundancy, unbalanced data, and small sample sizes have led to relatively low predictive accuracy. In this study, we explore different strategies for overcoming these challenges and improving predictive performance of radiomics-based prognosis for non-small cell lung cancer (NSCLC). CT images of 112 patients (mean age 75 years) with NSCLC who underwent stereotactic body radiotherapy were used to predict recurrence, death, and recurrence-free survival using a comprehensive radiomics analysis. Different feature selection and predictive modeling techniques were used to determine the optimal configuration of prognosis analysis. To address feature redundancy, comprehensive analysis indicated that Random Forest models and Principal Component Analysis were optimum predictive modeling and feature selection methods, respectively, for achieving high prognosis performance. To address unbalanced data, Synthetic Minority Over-sampling technique was found to significantly increase predictive accuracy. A full analysis of variance showed that data endpoints, feature selection techniques, and classifiers were significant factors in affecting predictive accuracy, suggesting that these factors must be investigated when building radiomics-based predictive models for cancer prognosis.
NASA Astrophysics Data System (ADS)
Lestari, A. W.; Rustam, Z.
2017-07-01
In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.
Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image
NASA Astrophysics Data System (ADS)
Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI
2017-01-01
This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.
Padma, A; Sukanesh, R
2013-01-01
A computer software system is designed for the segmentation and classification of benign from malignant tumour slices in brain computed tomography (CT) images. This paper presents a method to find and select both the dominant run length and co-occurrence texture features of region of interest (ROI) of the tumour region of each slice to be segmented by Fuzzy c means clustering (FCM) and evaluate the performance of support vector machine (SVM)-based classifiers in classifying benign and malignant tumour slices. Two hundred and six tumour confirmed CT slices are considered in this study. A total of 17 texture features are extracted by a feature extraction procedure, and six features are selected using Principal Component Analysis (PCA). This study constructed the SVM-based classifier with the selected features and by comparing the segmentation results with the experienced radiologist labelled ground truth (target). Quantitative analysis between ground truth and segmented tumour is presented in terms of segmentation accuracy, segmentation error and overlap similarity measures such as the Jaccard index. The classification performance of the SVM-based classifier with the same selected features is also evaluated using a 10-fold cross-validation method. The proposed system provides some newly found texture features have an important contribution in classifying benign and malignant tumour slices efficiently and accurately with less computational time. The experimental results showed that the proposed system is able to achieve the highest segmentation and classification accuracy effectiveness as measured by jaccard index and sensitivity and specificity.
Perceptual quality estimation of H.264/AVC videos using reduced-reference and no-reference models
NASA Astrophysics Data System (ADS)
Shahid, Muhammad; Pandremmenou, Katerina; Kondi, Lisimachos P.; Rossholm, Andreas; Lövström, Benny
2016-09-01
Reduced-reference (RR) and no-reference (NR) models for video quality estimation, using features that account for the impact of coding artifacts, spatio-temporal complexity, and packet losses, are proposed. The purpose of this study is to analyze a number of potentially quality-relevant features in order to select the most suitable set of features for building the desired models. The proposed sets of features have not been used in the literature and some of the features are used for the first time in this study. The features are employed by the least absolute shrinkage and selection operator (LASSO), which selects only the most influential of them toward perceptual quality. For comparison, we apply feature selection in the complete feature sets and ridge regression on the reduced sets. The models are validated using a database of H.264/AVC encoded videos that were subjectively assessed for quality in an ITU-T compliant laboratory. We infer that just two features selected by RR LASSO and two bitstream-based features selected by NR LASSO are able to estimate perceptual quality with high accuracy, higher than that of ridge, which uses more features. The comparisons with competing works and two full-reference metrics also verify the superiority of our models.
Feature Selection Methods for Zero-Shot Learning of Neural Activity
Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513
News video story segmentation method using fusion of audio-visual features
NASA Astrophysics Data System (ADS)
Wen, Jun; Wu, Ling-da; Zeng, Pu; Luan, Xi-dao; Xie, Yu-xiang
2007-11-01
News story segmentation is an important aspect for news video analysis. This paper presents a method for news video story segmentation. Different form prior works, which base on visual features transform, the proposed technique uses audio features as baseline and fuses visual features with it to refine the results. At first, it selects silence clips as audio features candidate points, and selects shot boundaries and anchor shots as two kinds of visual features candidate points. Then this paper selects audio feature candidates as cues and develops different fusion method, which effectively using diverse type visual candidates to refine audio candidates, to get story boundaries. Experiment results show that this method has high efficiency and adaptability to different kinds of news video.
An opinion formation based binary optimization approach for feature selection
NASA Astrophysics Data System (ADS)
Hamedmoghadam, Homayoun; Jalili, Mahdi; Yu, Xinghuo
2018-02-01
This paper proposed a novel optimization method based on opinion formation in complex network systems. The proposed optimization technique mimics human-human interaction mechanism based on a mathematical model derived from social sciences. Our method encodes a subset of selected features to the opinion of an artificial agent and simulates the opinion formation process among a population of agents to solve the feature selection problem. The agents interact using an underlying interaction network structure and get into consensus in their opinions, while finding better solutions to the problem. A number of mechanisms are employed to avoid getting trapped in local minima. We compare the performance of the proposed method with a number of classical population-based optimization methods and a state-of-the-art opinion formation based method. Our experiments on a number of high dimensional datasets reveal outperformance of the proposed algorithm over others.
Text-Based Conferencing: Features vs. Functionality
ERIC Educational Resources Information Center
Anderson, Lynn; McCarthy, Cathy
2005-01-01
This report examines three text-based conferencing products: "WowBB", "Invision Power Board", and "vBulletin". Their selection was prompted by a feature-by-feature comparison of the same products on the "WowBB" website. The comparison chart painted a misleading impression of "WowBB's" features in relation to the other two products; so the…
Unsupervised Feature Selection Based on the Morisita Index for Hyperspectral Images
NASA Astrophysics Data System (ADS)
Golay, Jean; Kanevski, Mikhail
2017-04-01
Hyperspectral sensors are capable of acquiring images with hundreds of narrow and contiguous spectral bands. Compared with traditional multispectral imagery, the use of hyperspectral images allows better performance in discriminating between land-cover classes, but it also results in large redundancy and high computational data processing. To alleviate such issues, unsupervised feature selection techniques for redundancy minimization can be implemented. Their goal is to select the smallest subset of features (or bands) in such a way that all the information content of a data set is preserved as much as possible. The present research deals with the application to hyperspectral images of a recently introduced technique of unsupervised feature selection: the Morisita-Based filter for Redundancy Minimization (MBRM). MBRM is based on the (multipoint) Morisita index of clustering and on the Morisita estimator of Intrinsic Dimension (ID). The fundamental idea of the technique is to retain only the bands which contribute to increasing the ID of an image. In this way, redundant bands are disregarded, since they have no impact on the ID. Besides, MBRM has several advantages over benchmark techniques: in addition to its ability to deal with large data sets, it can capture highly-nonlinear dependences and its implementation is straightforward in any programming environment. Experimental results on freely available hyperspectral images show the good effectiveness of MBRM in remote sensing data processing. Comparisons with benchmark techniques are carried out and random forests are used to assess the performance of MBRM in reducing the data dimensionality without loss of relevant information. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158-171, 2000. [2] J. Golay, M. Kanevski, A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48(12), pp. 4070-4081, 2015. [3] J. Golay, M. Kanevski, Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, arXiv:1608.05581, 2016.
Wu, K; Daruwalla, Z J; Wong, K L; Murphy, D; Ren, H
2015-08-01
The commercial humeral implants based on the Western population are currently not entirely compatible with Asian patients, due to differences in bone size, shape and structure. Surgeons may have to compromise or use different implants that are less conforming, which may cause complications of as well as inconvenience to the implant position. The construction of Asian humerus atlases of different clusters has therefore been proposed to eradicate this problem and to facilitate planning minimally invasive surgical procedures [6,31]. According to the features of the atlases, new implants could be designed specifically for different patients. Furthermore, an automatic implant selection algorithm has been proposed as well in order to reduce the complications caused by implant and bone mismatch. Prior to the design of the implant, data clustering and extraction of the relevant features were carried out on the datasets of each gender. The fuzzy C-means clustering method is explored in this paper. Besides, two new schemes of implant selection procedures, namely the Procrustes analysis-based scheme and the group average distance-based scheme, were proposed to better search for the matching implants for new coming patients from the database. Both these two algorithms have not been used in this area, while they turn out to have excellent performance in implant selection. Additionally, algorithms to calculate the matching scores between various implants and the patient data are proposed in this paper to assist the implant selection procedure. The results obtained have indicated the feasibility of the proposed development and selection scheme. The 16 sets of male data were divided into two clusters with 8 and 8 subjects, respectively, and the 11 female datasets were also divided into two clusters with 5 and 6 subjects, respectively. Based on the features of each cluster, the implants designed by the proposed algorithm fit very well on their reference humeri and the proposed implant selection procedure allows for a scenario of treating a patient with merely a preoperative anatomical model in order to correctly select the implant that has the best fit. Based on the leave-one-out validation, it can be concluded that both the PA-based method and GAD-based method are able to achieve excellent performance when dealing with the problem of implant selection. The accuracy and average execution time for the PA-based method were 100 % and 0.132 s, respectively, while those of the GAD- based method were 100 % and 0.058 s. Therefore, the GAD-based method outperformed the PA-based method in terms of execution speed. The primary contributions of this paper include the proposal of methods for development of Asian-, gender- and cluster-specific implants based on shape features and selection of the best fit implants for future patients according to their features. To the best of our knowledge, this is the first work that proposes implant design and selection for Asian patients automatically based on features extracted from cluster-specific statistical atlases.
Miao, Minmin; Zeng, Hong; Wang, Aimin; Zhao, Changsen; Liu, Feixiang
2017-02-15
Common spatial pattern (CSP) is most widely used in motor imagery based brain-computer interface (BCI) systems. In conventional CSP algorithm, pairs of the eigenvectors corresponding to both extreme eigenvalues are selected to construct the optimal spatial filter. In addition, an appropriate selection of subject-specific time segments and frequency bands plays an important role in its successful application. This study proposes to optimize spatial-frequency-temporal patterns for discriminative feature extraction. Spatial optimization is implemented by channel selection and finding discriminative spatial filters adaptively on each time-frequency segment. A novel Discernibility of Feature Sets (DFS) criteria is designed for spatial filter optimization. Besides, discriminative features located in multiple time-frequency segments are selected automatically by the proposed sparse time-frequency segment common spatial pattern (STFSCSP) method which exploits sparse regression for significant features selection. Finally, a weight determined by the sparse coefficient is assigned for each selected CSP feature and we propose a Weighted Naïve Bayesian Classifier (WNBC) for classification. Experimental results on two public EEG datasets demonstrate that optimizing spatial-frequency-temporal patterns in a data-driven manner for discriminative feature extraction greatly improves the classification performance. The proposed method gives significantly better classification accuracies in comparison with several competing methods in the literature. The proposed approach is a promising candidate for future BCI systems. Copyright © 2016 Elsevier B.V. All rights reserved.
Fuzzy feature selection based on interval type-2 fuzzy sets
NASA Astrophysics Data System (ADS)
Cherif, Sahar; Baklouti, Nesrine; Alimi, Adel; Snasel, Vaclav
2017-03-01
When dealing with real world data; noise, complexity, dimensionality, uncertainty and irrelevance can lead to low performance and insignificant judgment. Fuzzy logic is a powerful tool for controlling conflicting attributes which can have similar effects and close meanings. In this paper, an interval type-2 fuzzy feature selection is presented as a new approach for removing irrelevant features and reducing complexity. We demonstrate how can Feature Selection be joined with Interval Type-2 Fuzzy Logic for keeping significant features and hence reducing time complexity. The proposed method is compared with some other approaches. The results show that the number of attributes is proportionally small.
PrAS: Prediction of amidation sites using multiple feature extraction.
Wang, Tong; Zheng, Wei; Wuyun, Qiqige; Wu, Zhenfeng; Ruan, Jishou; Hu, Gang; Gao, Jianzhao
2017-02-01
Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg. Copyright © 2016 Elsevier Ltd. All rights reserved.
The role of lightness, hue and saturation in feature-based visual attention.
Stuart, Geoffrey W; Barsdell, Wendy N; Day, Ross H
2014-03-01
Visual attention is used to select part of the visual array for higher-level processing. Visual selection can be based on spatial location, but it has also been demonstrated that multiple locations can be selected simultaneously on the basis of a visual feature such as color. One task that has been used to demonstrate feature-based attention is the judgement of the symmetry of simple four-color displays. In a typical task, when symmetry is violated, four squares on either side of the display do not match. When four colors are involved, symmetry judgements are made more quickly than when only two of the four colors are involved. This indicates that symmetry judgements are made one color at a time. Previous studies have confounded lightness, hue, and saturation when defining the colors used in such displays. In three experiments, symmetry was defined by lightness alone, lightness plus hue, or by hue or saturation alone, with lightness levels randomised. The difference between judgements of two- and four-color asymmetry was maintained, showing that hue and saturation can provide the sole basis for feature-based attentional selection. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Yaacoub, Charles; Mhanna, Georges; Rihana, Sandy
2017-01-01
Electroencephalography is a non-invasive measure of the brain electrical activity generated by millions of neurons. Feature extraction in electroencephalography analysis is a core issue that may lead to accurate brain mental state classification. This paper presents a new feature selection method that improves left/right hand movement identification of a motor imagery brain-computer interface, based on genetic algorithms and artificial neural networks used as classifiers. Raw electroencephalography signals are first preprocessed using appropriate filtering. Feature extraction is carried out afterwards, based on spectral and temporal signal components, and thus a feature vector is constructed. As various features might be inaccurate and mislead the classifier, thus degrading the overall system performance, the proposed approach identifies a subset of features from a large feature space, such that the classifier error rate is reduced. Experimental results show that the proposed method is able to reduce the number of features to as low as 0.5% (i.e., the number of ignored features can reach 99.5%) while improving the accuracy, sensitivity, specificity, and precision of the classifier. PMID:28124985
Yaacoub, Charles; Mhanna, Georges; Rihana, Sandy
2017-01-23
Electroencephalography is a non-invasive measure of the brain electrical activity generated by millions of neurons. Feature extraction in electroencephalography analysis is a core issue that may lead to accurate brain mental state classification. This paper presents a new feature selection method that improves left/right hand movement identification of a motor imagery brain-computer interface, based on genetic algorithms and artificial neural networks used as classifiers. Raw electroencephalography signals are first preprocessed using appropriate filtering. Feature extraction is carried out afterwards, based on spectral and temporal signal components, and thus a feature vector is constructed. As various features might be inaccurate and mislead the classifier, thus degrading the overall system performance, the proposed approach identifies a subset of features from a large feature space, such that the classifier error rate is reduced. Experimental results show that the proposed method is able to reduce the number of features to as low as 0.5% (i.e., the number of ignored features can reach 99.5%) while improving the accuracy, sensitivity, specificity, and precision of the classifier.
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R; Barman, Ishan; Kumar Gundawar, Manoj
2015-08-19
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the 'curse of dimensionality' have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.
Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj
2015-01-01
Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers –based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations. PMID:26286630
A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.
Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed
2014-01-01
With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.
Raman spectral feature selection using ant colony optimization for breast cancer diagnosis.
Fallahzadeh, Omid; Dehghani-Bidgoli, Zohreh; Assarian, Mohammad
2018-06-04
Pathology as a common diagnostic test of cancer is an invasive, time-consuming, and partially subjective method. Therefore, optical techniques, especially Raman spectroscopy, have attracted the attention of cancer diagnosis researchers. However, as Raman spectra contain numerous peaks involved in molecular bounds of the sample, finding the best features related to cancerous changes can improve the accuracy of diagnosis in this method. The present research attempted to improve the power of Raman-based cancer diagnosis by finding the best Raman features using the ACO algorithm. In the present research, 49 spectra were measured from normal, benign, and cancerous breast tissue samples using a 785-nm micro-Raman system. After preprocessing for removal of noise and background fluorescence, the intensity of 12 important Raman bands of the biological samples was extracted as features of each spectrum. Then, the ACO algorithm was applied to find the optimum features for diagnosis. As the results demonstrated, by selecting five features, the classification accuracy of the normal, benign, and cancerous groups increased by 14% and reached 87.7%. ACO feature selection can improve the diagnostic accuracy of Raman-based diagnostic models. In the present study, features corresponding to ν(C-C) αhelix proline, valine (910-940), νs(C-C) skeletal lipids (1110-1130), and δ(CH2)/δ(CH3) proteins (1445-1460) were selected as the best features in cancer diagnosis.
Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Al-Dhief, Fahad Taha; Sammour, Mahmoud A M
2018-01-01
Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.
Tiun, Sabrina; AL-Dhief, Fahad Taha; Sammour, Mahmoud A. M.
2018-01-01
Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%. PMID:29672546
Feature selection gait-based gender classification under different circumstances
NASA Astrophysics Data System (ADS)
Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah
2014-05-01
This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
A scale space feature based registration technique for fusion of satellite imagery
NASA Technical Reports Server (NTRS)
Raghavan, Srini; Cromp, Robert F.; Campbell, William C.
1997-01-01
Feature based registration is one of the most reliable methods to register multi-sensor images (both active and passive imagery) since features are often more reliable than intensity or radiometric values. The only situation where a feature based approach will fail is when the scene is completely homogenous or densely textural in which case a combination of feature and intensity based methods may yield better results. In this paper, we present some preliminary results of testing our scale space feature based registration technique, a modified version of feature based method developed earlier for classification of multi-sensor imagery. The proposed approach removes the sensitivity in parameter selection experienced in the earlier version as explained later.
A feature-based approach to modeling protein–protein interaction hot spots
Cho, Kyu-il; Kim, Dongsup; Lee, Doheon
2009-01-01
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions. PMID:19273533
Choice: 36 band feature selection software with applications to multispectral pattern recognition
NASA Technical Reports Server (NTRS)
Jones, W. C.
1973-01-01
Feature selection software was developed at the Earth Resources Laboratory that is capable of inputting up to 36 channels and selecting channel subsets according to several criteria based on divergence. One of the criterion used is compatible with the table look-up classifier requirements. The software indicates which channel subset best separates (based on average divergence) each class from all other classes. The software employs an exhaustive search technique, and computer time is not prohibitive. A typical task to select the best 4 of 22 channels for 12 classes takes 9 minutes on a Univac 1108 computer.
A stereo remote sensing feature selection method based on artificial bee colony algorithm
NASA Astrophysics Data System (ADS)
Yan, Yiming; Liu, Pigang; Zhang, Ye; Su, Nan; Tian, Shu; Gao, Fengjiao; Shen, Yi
2014-05-01
To improve the efficiency of stereo information for remote sensing classification, a stereo remote sensing feature selection method is proposed in this paper presents, which is based on artificial bee colony algorithm. Remote sensing stereo information could be described by digital surface model (DSM) and optical image, which contain information of the three-dimensional structure and optical characteristics, respectively. Firstly, three-dimensional structure characteristic could be analyzed by 3D-Zernike descriptors (3DZD). However, different parameters of 3DZD could descript different complexity of three-dimensional structure, and it needs to be better optimized selected for various objects on the ground. Secondly, features for representing optical characteristic also need to be optimized. If not properly handled, when a stereo feature vector composed of 3DZD and image features, that would be a lot of redundant information, and the redundant information may not improve the classification accuracy, even cause adverse effects. To reduce information redundancy while maintaining or improving the classification accuracy, an optimized frame for this stereo feature selection problem is created, and artificial bee colony algorithm is introduced for solving this optimization problem. Experimental results show that the proposed method can effectively improve the computational efficiency, improve the classification accuracy.
EEG-based recognition of video-induced emotions: selecting subject-independent feature set.
Kortelainen, Jukka; Seppänen, Tapio
2013-01-01
Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.
Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A
2018-01-01
In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Feature selection using probabilistic prediction of support vector regression.
Yang, Jian-Bo; Ong, Chong-Jin
2011-06-01
This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.
Collective feature selection to identify crucial epistatic variants.
Verma, Shefali S; Lucas, Anastasia; Zhang, Xinyuan; Veturi, Yogasudha; Dudek, Scott; Li, Binglan; Li, Ruowang; Urbanowicz, Ryan; Moore, Jason H; Kim, Dokyoon; Ritchie, Marylyn D
2018-01-01
Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach. Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration). In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.
Selecting multiple features delays perception, but only when targets are horizontally arranged.
Lo, Shih-Yu
2017-01-01
Based on the finding that perception is lagged by attention split on multiple features (Lo et al., 2012), this study investigated how the feature-based lag effect interacts with the target spatial arrangement. Participants were presented with gratings the spatial frequencies of which constantly changed. The task was to monitor two gratings of the same or different colors and report their spatial frequencies right before the stimulus offset. The results showed a perceptual lag wherein the reported value was closer to the physical value some time prior to the stimulus offset. This lag effect was larger when the two gratings were of different colors than when they were the same color. Furthermore, the feature-based lag effect was statistically significant when the two gratings were horizontally arranged but not when they were vertically or diagonally arranged. A model is proposed to explain the effect of target arrangement: When targets are horizontally arranged, selecting an additional feature delays perception. When targets are vertically or diagonally arranged, target selection for the lower field is prioritized. This prioritization on the lower target might prompt observers to only select the lower target and ignore the upper one, and this causes more perceptual errors without delaying perception. © 2017 Elsevier B.V. All rights reserved.
Multiclass feature selection for improved pediatric brain tumor segmentation
NASA Astrophysics Data System (ADS)
Ahmed, Shaheen; Iftekharuddin, Khan M.
2012-03-01
In our previous work, we showed that fractal-based texture features are effective in detection, segmentation and classification of posterior-fossa (PF) pediatric brain tumor in multimodality MRI. We exploited an information theoretic approach such as Kullback-Leibler Divergence (KLD) for feature selection and ranking different texture features. We further incorporated the feature selection technique with segmentation method such as Expectation Maximization (EM) for segmentation of tumor T and non tumor (NT) tissues. In this work, we extend the two class KLD technique to multiclass for effectively selecting the best features for brain tumor (T), cyst (C) and non tumor (NT). We further obtain segmentation robustness for each tissue types by computing Bay's posterior probabilities and corresponding number of pixels for each tissue segments in MRI patient images. We evaluate improved tumor segmentation robustness using different similarity metric for 5 patients in T1, T2 and FLAIR modalities.
Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor
Alamedine, D.; Khalil, M.; Marque, C.
2013-01-01
Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536
Torija, Antonio J; Ruiz, Diego P
2015-02-01
The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.
Peng, Bo; Lu, Jieru; Saxena, Aditya; Zhou, Zhiyong; Zhang, Tao; Wang, Suhong; Dai, Yakang
2017-01-01
Purpose: This study is to exam self-esteem related brain morphometry on brain magnetic resonance (MR) images using multilevel-features-based classification method. Method: The multilevel region of interest (ROI) features consist of two types of features: (i) ROI features, which include gray matter volume, white matter volume, cerebrospinal fluid volume, cortical thickness, and cortical surface area, and (ii) similarity features, which are based on similarity calculation of cortical thickness between ROIs. For each feature type, a hybrid feature selection method, comprising of filter-based and wrapper-based algorithms, is used to select the most discriminating features. ROI features and similarity features are integrated by using multi-kernel support vector machines (SVMs) with appropriate weighting factor. Results: The classification performance is improved by using multilevel ROI features with an accuracy of 96.66%, a specificity of 96.62%, and a sensitivity of 95.67%. The most discriminating ROI features that are related to self-esteem spread over occipital lobe, frontal lobe, parietal lobe, limbic lobe, temporal lobe, and central region, mainly involving white matter and cortical thickness. The most discriminating similarity features are distributed in both the right and left hemisphere, including frontal lobe, occipital lobe, limbic lobe, parietal lobe, and central region, which conveys information of structural connections between different brain regions. Conclusion: By using ROI features and similarity features to exam self-esteem related brain morphometry, this paper provides a pilot evidence that self-esteem is linked to specific ROIs and structural connections between different brain regions. PMID:28588470
Peng, Bo; Lu, Jieru; Saxena, Aditya; Zhou, Zhiyong; Zhang, Tao; Wang, Suhong; Dai, Yakang
2017-01-01
Purpose: This study is to exam self-esteem related brain morphometry on brain magnetic resonance (MR) images using multilevel-features-based classification method. Method: The multilevel region of interest (ROI) features consist of two types of features: (i) ROI features, which include gray matter volume, white matter volume, cerebrospinal fluid volume, cortical thickness, and cortical surface area, and (ii) similarity features, which are based on similarity calculation of cortical thickness between ROIs. For each feature type, a hybrid feature selection method, comprising of filter-based and wrapper-based algorithms, is used to select the most discriminating features. ROI features and similarity features are integrated by using multi-kernel support vector machines (SVMs) with appropriate weighting factor. Results: The classification performance is improved by using multilevel ROI features with an accuracy of 96.66%, a specificity of 96.62%, and a sensitivity of 95.67%. The most discriminating ROI features that are related to self-esteem spread over occipital lobe, frontal lobe, parietal lobe, limbic lobe, temporal lobe, and central region, mainly involving white matter and cortical thickness. The most discriminating similarity features are distributed in both the right and left hemisphere, including frontal lobe, occipital lobe, limbic lobe, parietal lobe, and central region, which conveys information of structural connections between different brain regions. Conclusion: By using ROI features and similarity features to exam self-esteem related brain morphometry, this paper provides a pilot evidence that self-esteem is linked to specific ROIs and structural connections between different brain regions.
Kumar, Shiu; Sharma, Alok; Tsunoda, Tatsuhiko
2017-12-28
Common spatial pattern (CSP) has been an effective technique for feature extraction in electroencephalography (EEG) based brain computer interfaces (BCIs). However, motor imagery EEG signal feature extraction using CSP generally depends on the selection of the frequency bands to a great extent. In this study, we propose a mutual information based frequency band selection approach. The idea of the proposed method is to utilize the information from all the available channels for effectively selecting the most discriminative filter banks. CSP features are extracted from multiple overlapping sub-bands. An additional sub-band has been introduced that cover the wide frequency band (7-30 Hz) and two different types of features are extracted using CSP and common spatio-spectral pattern techniques, respectively. Mutual information is then computed from the extracted features of each of these bands and the top filter banks are selected for further processing. Linear discriminant analysis is applied to the features extracted from each of the filter banks. The scores are fused together, and classification is done using support vector machine. The proposed method is evaluated using BCI Competition III dataset IVa, BCI Competition IV dataset I and BCI Competition IV dataset IIb, and it outperformed all other competing methods achieving the lowest misclassification rate and the highest kappa coefficient on all three datasets. Introducing a wide sub-band and using mutual information for selecting the most discriminative sub-bands, the proposed method shows improvement in motor imagery EEG signal classification.
NASA Astrophysics Data System (ADS)
Weinmann, Martin; Jutzi, Boris; Hinz, Stefan; Mallet, Clément
2015-07-01
3D scene analysis in terms of automatically assigning 3D points a respective semantic label has become a topic of great importance in photogrammetry, remote sensing, computer vision and robotics. In this paper, we address the issue of how to increase the distinctiveness of geometric features and select the most relevant ones among these for 3D scene analysis. We present a new, fully automated and versatile framework composed of four components: (i) neighborhood selection, (ii) feature extraction, (iii) feature selection and (iv) classification. For each component, we consider a variety of approaches which allow applicability in terms of simplicity, efficiency and reproducibility, so that end-users can easily apply the different components and do not require expert knowledge in the respective domains. In a detailed evaluation involving 7 neighborhood definitions, 21 geometric features, 7 approaches for feature selection, 10 classifiers and 2 benchmark datasets, we demonstrate that the selection of optimal neighborhoods for individual 3D points significantly improves the results of 3D scene analysis. Additionally, we show that the selection of adequate feature subsets may even further increase the quality of the derived results while significantly reducing both processing time and memory consumption.
Hatamikia, Sepideh; Maghooli, Keivan; Nasrabadi, Ali Motie
2014-01-01
Electroencephalogram (EEG) is one of the useful biological signals to distinguish different brain diseases and mental states. In recent years, detecting different emotional states from biological signals has been merged more attention by researchers and several feature extraction methods and classifiers are suggested to recognize emotions from EEG signals. In this research, we introduce an emotion recognition system using autoregressive (AR) model, sequential forward feature selection (SFS) and K-nearest neighbor (KNN) classifier using EEG signals during emotional audio-visual inductions. The main purpose of this paper is to investigate the performance of AR features in the classification of emotional states. To achieve this goal, a distinguished AR method (Burg's method) based on Levinson-Durbin's recursive algorithm is used and AR coefficients are extracted as feature vectors. In the next step, two different feature selection methods based on SFS algorithm and Davies–Bouldin index are used in order to decrease the complexity of computing and redundancy of features; then, three different classifiers include KNN, quadratic discriminant analysis and linear discriminant analysis are used to discriminate two and three different classes of valence and arousal levels. The proposed method is evaluated with EEG signals of available database for emotion analysis using physiological signals, which are recorded from 32 participants during 40 1 min audio visual inductions. According to the results, AR features are efficient to recognize emotional states from EEG signals, and KNN performs better than two other classifiers in discriminating of both two and three valence/arousal classes. The results also show that SFS method improves accuracies by almost 10-15% as compared to Davies–Bouldin based feature selection. The best accuracies are %72.33 and %74.20 for two classes of valence and arousal and %61.10 and %65.16 for three classes, respectively. PMID:25298928
Thomas, Minta; De Brabanter, Kris; De Moor, Bart
2014-05-10
DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
NASA Astrophysics Data System (ADS)
Shahiri, Amirah Mohamed; Husain, Wahidah; Rashid, Nur'Aini Abd
2017-10-01
Huge amounts of data in educational datasets may cause the problem in producing quality data. Recently, data mining approach are increasingly used by educational data mining researchers for analyzing the data patterns. However, many research studies have concentrated on selecting suitable learning algorithms instead of performing feature selection process. As a result, these data has problem with computational complexity and spend longer computational time for classification. The main objective of this research is to provide an overview of feature selection techniques that have been used to analyze the most significant features. Then, this research will propose a framework to improve the quality of students' dataset. The proposed framework uses filter and wrapper based technique to support prediction process in future study.
Lawo, Vera; Fels, Janina; Oberem, Josefa; Koch, Iring
2014-10-01
Using an auditory variant of task switching, we examined the ability to intentionally switch attention in a dichotic-listening task. In our study, participants responded selectively to one of two simultaneously presented auditory number words (spoken by a female and a male, one for each ear) by categorizing its numerical magnitude. The mapping of gender (female vs. male) and ear (left vs. right) was unpredictable. The to-be-attended feature for gender or ear, respectively, was indicated by a visual selection cue prior to auditory stimulus onset. In Experiment 1, explicitly cued switches of the relevant feature dimension (e.g., from gender to ear) and switches of the relevant feature within a dimension (e.g., from male to female) occurred in an unpredictable manner. We found large performance costs when the relevant feature switched, but switches of the relevant feature dimension incurred only small additional costs. The feature-switch costs were larger in ear-relevant than in gender-relevant trials. In Experiment 2, we replicated these findings using a simplified design (i.e., only within-dimension switches with blocked dimensions). In Experiment 3, we examined preparation effects by manipulating the cueing interval and found a preparation benefit only when ear was cued. Together, our data suggest that the large part of attentional switch costs arises from reconfiguration at the level of relevant auditory features (e.g., left vs. right) rather than feature dimensions (ear vs. gender). Additionally, our findings suggest that ear-based target selection benefits more from preparation time (i.e., time to direct attention to one ear) than gender-based target selection.
Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A
2017-09-15
Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jing, Yaqi; Meng, Qinghao, E-mail: qh-meng@tju.edu.cn; Qi, Peifeng
An electronic nose (e-nose) was designed to classify Chinese liquors of the same aroma style. A new method of feature reduction which combined feature selection with feature extraction was proposed. Feature selection method used 8 feature-selection algorithms based on information theory and reduced the dimension of the feature space to 41. Kernel entropy component analysis was introduced into the e-nose system as a feature extraction method and the dimension of feature space was reduced to 12. Classification of Chinese liquors was performed by using back propagation artificial neural network (BP-ANN), linear discrimination analysis (LDA), and a multi-linear classifier. The classificationmore » rate of the multi-linear classifier was 97.22%, which was higher than LDA and BP-ANN. Finally the classification of Chinese liquors according to their raw materials and geographical origins was performed using the proposed multi-linear classifier and classification rate was 98.75% and 100%, respectively.« less
Gunavathi, Chellamuthu; Premalatha, Kandasamy
2014-01-01
Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.
Comparability of Computer-Based and Paper-Based Science Assessments
ERIC Educational Resources Information Center
Herrmann-Abell, Cari F.; Hardcastle, Joseph; DeBoer, George E.
2018-01-01
We compared students' performance on a paper-based test (PBT) and three computer-based tests (CBTs). The three computer-based tests used different test navigation and answer selection features, allowing us to examine how these features affect student performance. The study sample consisted of 9,698 fourth through twelfth grade students from across…
Feature-based attention to unconscious shapes and colors.
Schmidt, Filipp; Schmidt, Thomas
2010-08-01
Two experiments employed feature-based attention to modulate the impact of completely masked primes on subsequent pointing responses. Participants processed a color cue to select a pair of possible pointing targets out of multiple targets on the basis of their color, and then pointed to the one of those two targets with a prespecified shape. All target pairs were preceded by prime pairs triggering either the correct or the opposite response. The time interval between cue and primes was varied to modulate the time course of feature-based attentional selection. In a second experiment, the roles of color and shape were switched. Pointing trajectories showed large priming effects that were amplified by feature-based attention, indicating that attention modulated the earliest phases of motor output. Priming effects as well as their attentional modulation occurred even though participants remained unable to identify the primes, indicating distinct processes underlying visual awareness, attention, and response control.
A method for fast selecting feature wavelengths from the spectral information of crop nitrogen
USDA-ARS?s Scientific Manuscript database
Research on a method for fast selecting feature wavelengths from the nitrogen spectral information is necessary, which can determine the nitrogen content of crops. Based on the uniformity of uniform design, this paper proposed an improved particle swarm optimization (PSO) method. The method can ch...
Fesharaki, Nooshin Jafari; Pourghassem, Hossein
2013-07-01
Due to the daily mass production and the widespread variation of medical X-ray images, it is necessary to classify these for searching and retrieving proposes, especially for content-based medical image retrieval systems. In this paper, a medical X-ray image hierarchical classification structure based on a novel merging and splitting scheme and using shape and texture features is proposed. In the first level of the proposed structure, to improve the classification performance, similar classes with regard to shape contents are grouped based on merging measures and shape features into the general overlapped classes. In the next levels of this structure, the overlapped classes split in smaller classes based on the classification performance of combination of shape and texture features or texture features only. Ultimately, in the last levels, this procedure is also continued forming all the classes, separately. Moreover, to optimize the feature vector in the proposed structure, we use orthogonal forward selection algorithm according to Mahalanobis class separability measure as a feature selection and reduction algorithm. In other words, according to the complexity and inter-class distance of each class, a sub-space of the feature space is selected in each level and then a supervised merging and splitting scheme is applied to form the hierarchical classification. The proposed structure is evaluated on a database consisting of 2158 medical X-ray images of 18 classes (IMAGECLEF 2005 database) and accuracy rate of 93.6% in the last level of the hierarchical structure for an 18-class classification problem is obtained.
Feature and Score Fusion Based Multiple Classifier Selection for Iris Recognition
Islam, Md. Rabiul
2014-01-01
The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al. PMID:25114676
Feature and score fusion based multiple classifier selection for iris recognition.
Islam, Md Rabiul
2014-01-01
The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Optical-Correlator Neural Network Based On Neocognitron
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin; Stoner, William W.
1994-01-01
Multichannel optical correlator implements shift-invariant, high-discrimination pattern-recognizing neural network based on paradigm of neocognitron. Selected as basic building block of this neural network because invariance under shifts is inherent advantage of Fourier optics included in optical correlators in general. Neocognitron is conceptual electronic neural-network model for recognition of visual patterns. Multilayer processing achieved by iteratively feeding back output of feature correlator to input spatial light modulator and updating Fourier filters. Neural network trained by use of characteristic features extracted from target images. Multichannel implementation enables parallel processing of large number of selected features.
Feature selection and classification of multiparametric medical images using bagging and SVM
NASA Astrophysics Data System (ADS)
Fan, Yong; Resnick, Susan M.; Davatzikos, Christos
2008-03-01
This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.
NASA Astrophysics Data System (ADS)
Adi Putra, Januar
2018-04-01
In this paper, we propose a new mammogram classification scheme to classify the breast tissues as normal or abnormal. Feature matrix is generated using Local Binary Pattern to all the detailed coefficients from 2D-DWT of the region of interest (ROI) of a mammogram. Feature selection is done by selecting the relevant features that affect the classification. Feature selection is used to reduce the dimensionality of data and features that are not relevant, in this paper the F-test and Ttest will be performed to the results of the feature extraction dataset to reduce and select the relevant feature. The best features are used in a Neural Network classifier for classification. In this research we use MIAS and DDSM database. In addition to the suggested scheme, the competent schemes are also simulated for comparative analysis. It is observed that the proposed scheme has a better say with respect to accuracy, specificity and sensitivity. Based on experiments, the performance of the proposed scheme can produce high accuracy that is 92.71%, while the lowest accuracy obtained is 77.08%.
Feature selection for the classification of traced neurons.
López-Cabrera, José D; Lorenzo-Ginori, Juan V
2018-06-01
The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.
Object-based attentional selection modulates anticipatory alpha oscillations
Knakker, Balázs; Weiss, Béla; Vidnyánszky, Zoltán
2015-01-01
Visual cortical alpha oscillations are involved in attentional gating of incoming visual information. It has been shown that spatial and feature-based attentional selection result in increased alpha oscillations over the cortical regions representing sensory input originating from the unattended visual field and task-irrelevant visual features, respectively. However, whether attentional gating in the case of object based selection is also associated with alpha oscillations has not been investigated before. Here we measured anticipatory electroencephalography (EEG) alpha oscillations while participants were cued to attend to foveal face or word stimuli, the processing of which is known to have right and left hemispheric lateralization, respectively. The results revealed that in the case of simultaneously displayed, overlapping face and word stimuli, attending to the words led to increased power of parieto-occipital alpha oscillations over the right hemisphere as compared to when faces were attended. This object category-specific modulation of the hemispheric lateralization of anticipatory alpha oscillations was maintained during sustained attentional selection of sequentially presented face and word stimuli. These results imply that in the case of object-based attentional selection—similarly to spatial and feature-based attention—gating of visual information processing might involve visual cortical alpha oscillations. PMID:25628554
Bartsch, Mandy V; Loewe, Kristian; Merkel, Christian; Heinze, Hans-Jochen; Schoenfeld, Mircea A; Tsotsos, John K; Hopf, Jens-Max
2017-10-25
Attention can facilitate the selection of elementary object features such as color, orientation, or motion. This is referred to as feature-based attention and it is commonly attributed to a modulation of the gain and tuning of feature-selective units in visual cortex. Although gain mechanisms are well characterized, little is known about the cortical processes underlying the sharpening of feature selectivity. Here, we show with high-resolution magnetoencephalography in human observers (men and women) that sharpened selectivity for a particular color arises from feedback processing in the human visual cortex hierarchy. To assess color selectivity, we analyze the response to a color probe that varies in color distance from an attended color target. We find that attention causes an initial gain enhancement in anterior ventral extrastriate cortex that is coarsely selective for the target color and transitions within ∼100 ms into a sharper tuned profile in more posterior ventral occipital cortex. We conclude that attention sharpens selectivity over time by attenuating the response at lower levels of the cortical hierarchy to color values neighboring the target in color space. These observations support computational models proposing that attention tunes feature selectivity in visual cortex through backward-propagating attenuation of units less tuned to the target. SIGNIFICANCE STATEMENT Whether searching for your car, a particular item of clothing, or just obeying traffic lights, in everyday life, we must select items based on color. But how does attention allow us to select a specific color? Here, we use high spatiotemporal resolution neuromagnetic recordings to examine how color selectivity emerges in the human brain. We find that color selectivity evolves as a coarse to fine process from higher to lower levels within the visual cortex hierarchy. Our observations support computational models proposing that feature selectivity increases over time by attenuating the responses of less-selective cells in lower-level brain areas. These data emphasize that color perception involves multiple areas across a hierarchy of regions, interacting with each other in a complex, recursive manner. Copyright © 2017 the authors 0270-6474/17/3710346-12$15.00/0.
Automated texture-based identification of ovarian cancer in confocal microendoscope images
NASA Astrophysics Data System (ADS)
Srivastava, Saurabh; Rodriguez, Jeffrey J.; Rouse, Andrew R.; Brewer, Molly A.; Gmitro, Arthur F.
2005-03-01
The fluorescence confocal microendoscope provides high-resolution, in-vivo imaging of cellular pathology during optical biopsy. There are indications that the examination of human ovaries with this instrument has diagnostic implications for the early detection of ovarian cancer. The purpose of this study was to develop a computer-aided system to facilitate the identification of ovarian cancer from digital images captured with the confocal microendoscope system. To achieve this goal, we modeled the cellular-level structure present in these images as texture and extracted features based on first-order statistics, spatial gray-level dependence matrices, and spatial-frequency content. Selection of the best features for classification was performed using traditional feature selection techniques including stepwise discriminant analysis, forward sequential search, a non-parametric method, principal component analysis, and a heuristic technique that combines the results of these methods. The best set of features selected was used for classification, and performance of various machine classifiers was compared by analyzing the areas under their receiver operating characteristic curves. The results show that it is possible to automatically identify patients with ovarian cancer based on texture features extracted from confocal microendoscope images and that the machine performance is superior to that of the human observer.
Evolutionary Algorithm Based Feature Optimization for Multi-Channel EEG Classification.
Wang, Yubo; Veluvolu, Kalyana C
2017-01-01
The most BCI systems that rely on EEG signals employ Fourier based methods for time-frequency decomposition for feature extraction. The band-limited multiple Fourier linear combiner is well-suited for such band-limited signals due to its real-time applicability. Despite the improved performance of these techniques in two channel settings, its application in multiple-channel EEG is not straightforward and challenging. As more channels are available, a spatial filter will be required to eliminate the noise and preserve the required useful information. Moreover, multiple-channel EEG also adds the high dimensionality to the frequency feature space. Feature selection will be required to stabilize the performance of the classifier. In this paper, we develop a new method based on Evolutionary Algorithm (EA) to solve these two problems simultaneously. The real-valued EA encodes both the spatial filter estimates and the feature selection into its solution and optimizes it with respect to the classification error. Three Fourier based designs are tested in this paper. Our results show that the combination of Fourier based method with covariance matrix adaptation evolution strategy (CMA-ES) has the best overall performance.
Urschler, Martin; Grassegger, Sabine; Štern, Darko
2015-01-01
Age estimation of individuals is important in human biology and has various medical and forensic applications. Recent interest in MR-based methods aims to investigate alternatives for established methods involving ionising radiation. Automatic, software-based methods additionally promise improved estimation objectivity. To investigate how informative automatically selected image features are regarding their ability to discriminate age, by exploring a recently proposed software-based age estimation method for MR images of the left hand and wrist. One hundred and two MR datasets of left hand images are used to evaluate age estimation performance, consisting of bone and epiphyseal gap volume localisation, computation of one age regression model per bone mapping image features to age and fusion of individual bone age predictions to a final age estimate. Quantitative results of the software-based method show an age estimation performance with a mean absolute difference of 0.85 years (SD = 0.58 years) to chronological age, as determined by a cross-validation experiment. Qualitatively, it is demonstrated how feature selection works and which image features of skeletal maturation are automatically chosen to model the non-linear regression function. Feasibility of automatic age estimation based on MRI data is shown and selected image features are found to be informative for describing anatomical changes during physical maturation in male adolescents.
Random forest feature selection approach for image segmentation
NASA Astrophysics Data System (ADS)
Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina; Vaida, Mircea Florin
2017-03-01
In the field of image segmentation, discriminative models have shown promising performance. Generally, every such model begins with the extraction of numerous features from annotated images. Most authors create their discriminative model by using many features without using any selection criteria. A more reliable model can be built by using a framework that selects the important variables, from the point of view of the classification, and eliminates the unimportant once. In this article we present a framework for feature selection and data dimensionality reduction. The methodology is built around the random forest (RF) algorithm and its variable importance evaluation. In order to deal with datasets so large as to be practically unmanageable, we propose an algorithm based on RF that reduces the dimension of the database by eliminating irrelevant features. Furthermore, this framework is applied to optimize our discriminative model for brain tumor segmentation.
Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam
2012-01-15
Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Object-based target templates guide attention during visual search.
Berggren, Nick; Eimer, Martin
2018-05-03
During visual search, attention is believed to be controlled in a strictly feature-based fashion, without any guidance by object-based target representations. To challenge this received view, we measured electrophysiological markers of attentional selection (N2pc component) and working memory (sustained posterior contralateral negativity; SPCN) in search tasks where two possible targets were defined by feature conjunctions (e.g., blue circles and green squares). Critically, some search displays also contained nontargets with two target features (incorrect conjunction objects, e.g., blue squares). Because feature-based guidance cannot distinguish these objects from targets, any selective bias for targets will reflect object-based attentional control. In Experiment 1, where search displays always contained only one object with target-matching features, targets and incorrect conjunction objects elicited identical N2pc and SPCN components, demonstrating that attentional guidance was entirely feature-based. In Experiment 2, where targets and incorrect conjunction objects could appear in the same display, clear evidence for object-based attentional control was found. The target N2pc became larger than the N2pc to incorrect conjunction objects from 250 ms poststimulus, and only targets elicited SPCN components. This demonstrates that after an initial feature-based guidance phase, object-based templates are activated when they are required to distinguish target and nontarget objects. These templates modulate visual processing and control access to working memory, and their activation may coincide with the start of feature integration processes. Results also suggest that while multiple feature templates can be activated concurrently, only a single object-based target template can guide attention at any given time. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
NASA Astrophysics Data System (ADS)
Nemoto, Mitsutaka; Hayashi, Naoto; Hanaoka, Shouhei; Nomura, Yukihiro; Miki, Soichiro; Yoshikawa, Takeharu; Ohtomo, Kuni
2016-03-01
The purpose of this study is to evaluate the feasibility of a novel feature generation, which is based on multiple deep neural networks (DNNs) with boosting, for computer-assisted detection (CADe). It is hard and time-consuming to optimize the hyperparameters for DNNs such as stacked denoising autoencoder (SdA). The proposed method allows using SdA based features without the burden of the hyperparameter setting. The proposed method was evaluated by an application for detecting cerebral aneurysms on magnetic resonance angiogram (MRA). A baseline CADe process included four components; scaling, candidate area limitation, candidate detection, and candidate classification. Proposed feature generation method was applied to extract the optimal features for candidate classification. Proposed method only required setting range of the hyperparameters for SdA. The optimal feature set was selected from a large quantity of SdA based features by multiple SdAs, each of which was trained using different hyperparameter set. The feature selection was operated through ada-boost ensemble learning method. Training of the baseline CADe process and proposed feature generation were operated with 200 MRA cases, and the evaluation was performed with 100 MRA cases. Proposed method successfully provided SdA based features just setting the range of some hyperparameters for SdA. The CADe process by using both previous voxel features and SdA based features had the best performance with 0.838 of an area under ROC curve and 0.312 of ANODE score. The results showed that proposed method was effective in the application for detecting cerebral aneurysms on MRA.
Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S
2007-09-01
The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.
Verleger, Rolf; Groen, Margriet; Heide, Wolfgang; Sobieralska, Kinga; Jaśkowski, Piotr
2008-05-01
We studied how physical and instructed embedding of features in gestalts affects perceptual selection. Four ovals on the horizontal midline were either unconnected or pairwise connected by circles, forming ears of left and right heads (gestalts). Relevant to responding was the position of one colored oval, either within its pair or relative to fixation ("object-based" or "fixation-based" instruction). Responses were faster under fixation- than object-based instruction, less so with gestalts. Previously reported increases of N1 when evoked by features within objects were replicated for fixation-based instruction only. There was no effect of instruction on N2pc. However P1 increased under the adequate instruction, object-based for gestalts, fixation-based for unconnected items, which presumably indicated how foci of attention were set by expecting specific stimuli under instructions that specified how to bind these stimuli to objects.
Geospatial Analytics in Retail Site Selection and Sales Prediction.
Ting, Choo-Yee; Ho, Chiung Ching; Yee, Hui Jia; Matsah, Wan Razali
2018-03-01
Studies have shown that certain features from geography, demography, trade area, and environment can play a vital role in retail site selection, largely due to the impact they asserted on retail performance. Although the relevant features could be elicited by domain experts, determining the optimal feature set can be intractable and labor-intensive exercise. The challenges center around (1) how to determine features that are important to a particular retail business and (2) how to estimate retail sales performance given a new location? The challenges become apparent when the features vary across time. In this light, this study proposed a nonintervening approach by employing feature selection algorithms and subsequently sales prediction through similarity-based methods. The results of prediction were validated by domain experts. In this study, data sets from different sources were transformed and aggregated before an analytics data set that is ready for analysis purpose could be obtained. The data sets included data about feature location, population count, property type, education status, and monthly sales from 96 branches of a telecommunication company in Malaysia. The finding suggested that (1) optimal retail performance can only be achieved through fulfillment of specific location features together with the surrounding trade area characteristics and (2) similarity-based method can provide solution to retail sales prediction.
Tan, Maxine; Aghaei, Faranak; Wang, Yunzhi; Zheng, Bin
2017-01-01
The purpose of this study is to evaluate a new method to improve performance of computer-aided detection (CAD) schemes of screening mammograms with two approaches. In the first approach, we developed a new case based CAD scheme using a set of optimally selected global mammographic density, texture, spiculation, and structural similarity features computed from all four full-field digital mammography (FFDM) images of the craniocaudal (CC) and mediolateral oblique (MLO) views by using a modified fast and accurate sequential floating forward selection feature selection algorithm. Selected features were then applied to a “scoring fusion” artificial neural network (ANN) classification scheme to produce a final case based risk score. In the second approach, we combined the case based risk score with the conventional lesion based scores of a conventional lesion based CAD scheme using a new adaptive cueing method that is integrated with the case based risk scores. We evaluated our methods using a ten-fold cross-validation scheme on 924 cases (476 cancer and 448 recalled or negative), whereby each case had all four images from the CC and MLO views. The area under the receiver operating characteristic curve was AUC = 0.793±0.015 and the odds ratio monotonically increased from 1 to 37.21 as CAD-generated case based detection scores increased. Using the new adaptive cueing method, the region based and case based sensitivities of the conventional CAD scheme at a false positive rate of 0.71 per image increased by 2.4% and 0.8%, respectively. The study demonstrated that supplementary information can be derived by computing global mammographic density image features to improve CAD-cueing performance on the suspicious mammographic lesions. PMID:27997380
A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.
Baur, Brittany; Bozdag, Serdar
2016-01-01
DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.
Feature weight estimation for gene selection: a local hyperlinear learning approach
2014-01-01
Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
Spatial and Feature-Based Attention in a Layered Cortical Microcircuit Model
Wagatsuma, Nobuhiko; Potjans, Tobias C.; Diesmann, Markus; Sakai, Ko; Fukai, Tomoki
2013-01-01
Directing attention to the spatial location or the distinguishing feature of a visual object modulates neuronal responses in the visual cortex and the stimulus discriminability of subjects. However, the spatial and feature-based modes of attention differently influence visual processing by changing the tuning properties of neurons. Intriguingly, neurons' tuning curves are modulated similarly across different visual areas under both these modes of attention. Here, we explored the mechanism underlying the effects of these two modes of visual attention on the orientation selectivity of visual cortical neurons. To do this, we developed a layered microcircuit model. This model describes multiple orientation-specific microcircuits sharing their receptive fields and consisting of layers 2/3, 4, 5, and 6. These microcircuits represent a functional grouping of cortical neurons and mutually interact via lateral inhibition and excitatory connections between groups with similar selectivity. The individual microcircuits receive bottom-up visual stimuli and top-down attention in different layers. A crucial assumption of the model is that feature-based attention activates orientation-specific microcircuits for the relevant feature selectively, whereas spatial attention activates all microcircuits homogeneously, irrespective of their orientation selectivity. Consequently, our model simultaneously accounts for the multiplicative scaling of neuronal responses in spatial attention and the additive modulations of orientation tuning curves in feature-based attention, which have been observed widely in various visual cortical areas. Simulations of the model predict contrasting differences between excitatory and inhibitory neurons in the two modes of attentional modulations. Furthermore, the model replicates the modulation of the psychophysical discriminability of visual stimuli in the presence of external noise. Our layered model with a biologically suggested laminar structure describes the basic circuit mechanism underlying the attention-mode specific modulations of neuronal responses and visual perception. PMID:24324628
Hollingworth, Andrew; Hwang, Seongmin
2013-10-19
We examined the conditions under which a feature value in visual working memory (VWM) recruits visual attention to matching stimuli. Previous work has suggested that VWM supports two qualitatively different states of representation: an active state that interacts with perceptual selection and a passive (or accessory) state that does not. An alternative hypothesis is that VWM supports a single form of representation, with the precision of feature memory controlling whether or not the representation interacts with perceptual selection. The results of three experiments supported the dual-state hypothesis. We established conditions under which participants retained a relatively precise representation of a parcticular colour. If the colour was immediately task relevant, it reliably recruited attention to matching stimuli. However, if the colour was not immediately task relevant, it failed to interact with perceptual selection. Feature maintenance in VWM is not necessarily equivalent with feature-based attentional selection.
Feature Extraction and Selection Strategies for Automated Target Recognition
NASA Technical Reports Server (NTRS)
Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin
2010-01-01
Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.
Feature extraction and selection strategies for automated target recognition
NASA Astrophysics Data System (ADS)
Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin
2010-04-01
Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory regionof- interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.
Oculomotor selection underlies feature retention in visual working memory.
Hanning, Nina M; Jonikaitis, Donatas; Deubel, Heiner; Szinte, Martin
2016-02-01
Oculomotor selection, spatial task relevance, and visual working memory (WM) are described as three processes highly intertwined and sustained by similar cortical structures. However, because task-relevant locations always constitute potential saccade targets, no study so far has been able to distinguish between oculomotor selection and spatial task relevance. We designed an experiment that allowed us to dissociate in humans the contribution of task relevance, oculomotor selection, and oculomotor execution to the retention of feature representations in WM. We report that task relevance and oculomotor selection lead to dissociable effects on feature WM maintenance. In a first task, in which an object's location was encoded as a saccade target, its feature representations were successfully maintained in WM, whereas they declined at nonsaccade target locations. Likewise, we observed a similar WM benefit at the target of saccades that were prepared but never executed. In a second task, when an object's location was marked as task relevant but constituted a nonsaccade target (a location to avoid), feature representations maintained at that location did not benefit. Combined, our results demonstrate that oculomotor selection is consistently associated with WM, whereas task relevance is not. This provides evidence for an overlapping circuitry serving saccade target selection and feature-based WM that can be dissociated from processes encoding task-relevant locations. Copyright © 2016 the American Physiological Society.
JCDSA: a joint covariate detection tool for survival analysis on tumor expression profiles.
Wu, Yiming; Liu, Yanan; Wang, Yueming; Shi, Yan; Zhao, Xudong
2018-05-29
Survival analysis on tumor expression profiles has always been a key issue for subsequent biological experimental validation. It is crucial how to select features which closely correspond to survival time. Furthermore, it is important how to select features which best discriminate between low-risk and high-risk group of patients. Common features derived from the two aspects may provide variable candidates for prognosis of cancer. Based on the provided two-step feature selection strategy, we develop a joint covariate detection tool for survival analysis on tumor expression profiles. Significant features, which are not only consistent with survival time but also associated with the categories of patients with different survival risks, are chosen. Using the miRNA expression data (Level 3) of 548 patients with glioblastoma multiforme (GBM) as an example, miRNA candidates for prognosis of cancer are selected. The reliability of selected miRNAs using this tool is demonstrated by 100 simulations. Furthermore, It is discovered that significant covariates are not directly composed of individually significant variables. Joint covariate detection provides a viewpoint for selecting variables which are not individually but jointly significant. Besides, it helps to select features which are not only consistent with survival time but also associated with prognosis risk. The software is available at http://bio-nefu.com/resource/jcdsa .
Research on intrusion detection based on Kohonen network and support vector machine
NASA Astrophysics Data System (ADS)
Shuai, Chunyan; Yang, Hengcheng; Gong, Zeweiyi
2018-05-01
In view of the problem of low detection accuracy and the long detection time of support vector machine, which directly applied to the network intrusion detection system. Optimization of SVM parameters can greatly improve the detection accuracy, but it can not be applied to high-speed network because of the long detection time. a method based on Kohonen neural network feature selection is proposed to reduce the optimization time of support vector machine parameters. Firstly, this paper is to calculate the weights of the KDD99 network intrusion data by Kohonen network and select feature by weight. Then, after the feature selection is completed, genetic algorithm (GA) and grid search method are used for parameter optimization to find the appropriate parameters and classify them by support vector machines. By comparing experiments, it is concluded that feature selection can reduce the time of parameter optimization, which has little influence on the accuracy of classification. The experiments suggest that the support vector machine can be used in the network intrusion detection system and reduce the missing rate.
Welikala, R A; Fraz, M M; Dehmeshki, J; Hoppe, A; Tah, V; Mann, S; Williamson, T H; Barman, S A
2015-07-01
Proliferative diabetic retinopathy (PDR) is a condition that carries a high risk of severe visual impairment. The hallmark of PDR is the growth of abnormal new vessels. In this paper, an automated method for the detection of new vessels from retinal images is presented. This method is based on a dual classification approach. Two vessel segmentation approaches are applied to create two separate binary vessel map which each hold vital information. Local morphology features are measured from each binary vessel map to produce two separate 4-D feature vectors. Independent classification is performed for each feature vector using a support vector machine (SVM) classifier. The system then combines these individual outcomes to produce a final decision. This is followed by the creation of additional features to generate 21-D feature vectors, which feed into a genetic algorithm based feature selection approach with the objective of finding feature subsets that improve the performance of the classification. Sensitivity and specificity results using a dataset of 60 images are 0.9138 and 0.9600, respectively, on a per patch basis and 1.000 and 0.975, respectively, on a per image basis. Copyright © 2015 Elsevier Ltd. All rights reserved.
Multiresolution texture models for brain tumor segmentation in MRI.
Iftekharuddin, Khan M; Ahmed, Shaheen; Hossen, Jakir
2011-01-01
In this study we discuss different types of texture features such as Fractal Dimension (FD) and Multifractional Brownian Motion (mBm) for estimating random structures and varying appearance of brain tissues and tumors in magnetic resonance images (MRI). We use different selection techniques including KullBack - Leibler Divergence (KLD) for ranking different texture and intensity features. We then exploit graph cut, self organizing maps (SOM) and expectation maximization (EM) techniques to fuse selected features for brain tumors segmentation in multimodality T1, T2, and FLAIR MRI. We use different similarity metrics to evaluate quality and robustness of these selected features for tumor segmentation in MRI for real pediatric patients. We also demonstrate a non-patient-specific automated tumor prediction scheme by using improved AdaBoost classification based on these image features.
Texture-based approach to palmprint retrieval for personal identification
NASA Astrophysics Data System (ADS)
Li, Wenxin; Zhang, David; Xu, Z.; You, J.
2000-12-01
This paper presents a new approach to palmprint retrieval for personal identification. Three key issues in image retrieval are considered - feature selection, similarity measures and dynamic search for the best matching of the sample in the image database. We propose a texture-based method for palmprint feature representation. The concept of texture energy is introduced to define a palm print's global and local features, which are characterized with high convergence of inner-palm similarities and good dispersion of inter-palm discrimination. The search is carried out in a layered fashion: first global features are used to guide the fast selection of a small set of similar candidates from the database from the database and then local features are used to decide the final output within the candidate set. The experimental results demonstrate the effectiveness and accuracy of the proposed method.
Texture-based approach to palmprint retrieval for personal identification
NASA Astrophysics Data System (ADS)
Li, Wenxin; Zhang, David; Xu, Z.; You, J.
2001-01-01
This paper presents a new approach to palmprint retrieval for personal identification. Three key issues in image retrieval are considered - feature selection, similarity measures and dynamic search for the best matching of the sample in the image database. We propose a texture-based method for palmprint feature representation. The concept of texture energy is introduced to define a palm print's global and local features, which are characterized with high convergence of inner-palm similarities and good dispersion of inter-palm discrimination. The search is carried out in a layered fashion: first global features are used to guide the fast selection of a small set of similar candidates from the database from the database and then local features are used to decide the final output within the candidate set. The experimental results demonstrate the effectiveness and accuracy of the proposed method.
A practical guide to assessing clinical decision-making skills using the key features approach.
Farmer, Elizabeth A; Page, Gordon
2005-12-01
This paper in the series on professional assessment provides a practical guide to writing key features problems (KFPs). Key features problems test clinical decision-making skills in written or computer-based formats. They are based on the concept of critical steps or 'key features' in decision making and represent an advance on the older, less reliable patient management problem (PMP) formats. The practical steps in writing these problems are discussed and illustrated by examples. Steps include assembling problem-writing groups, selecting a suitable clinical scenario or problem and defining its key features, writing the questions, selecting question response formats, preparing scoring keys, reviewing item quality and item banking. The KFP format provides educators with a flexible approach to testing clinical decision-making skills with demonstrated validity and reliability when constructed according to the guidelines provided.
A new approach to develop computer-aided detection schemes of digital mammograms
NASA Astrophysics Data System (ADS)
Tan, Maxine; Qian, Wei; Pu, Jiantao; Liu, Hong; Zheng, Bin
2015-06-01
The purpose of this study is to develop a new global mammographic image feature analysis based computer-aided detection (CAD) scheme and evaluate its performance in detecting positive screening mammography examinations. A dataset that includes images acquired from 1896 full-field digital mammography (FFDM) screening examinations was used in this study. Among them, 812 cases were positive for cancer and 1084 were negative or benign. After segmenting the breast area, a computerized scheme was applied to compute 92 global mammographic tissue density based features on each of four mammograms of the craniocaudal (CC) and mediolateral oblique (MLO) views. After adding three existing popular risk factors (woman’s age, subjectively rated mammographic density, and family breast cancer history) into the initial feature pool, we applied a sequential forward floating selection feature selection algorithm to select relevant features from the bilateral CC and MLO view images separately. The selected CC and MLO view image features were used to train two artificial neural networks (ANNs). The results were then fused by a third ANN to build a two-stage classifier to predict the likelihood of the FFDM screening examination being positive. CAD performance was tested using a ten-fold cross-validation method. The computed area under the receiver operating characteristic curve was AUC = 0.779 ± 0.025 and the odds ratio monotonically increased from 1 to 31.55 as CAD-generated detection scores increased. The study demonstrated that this new global image feature based CAD scheme had a relatively higher discriminatory power to cue the FFDM examinations with high risk of being positive, which may provide a new CAD-cueing method to assist radiologists in reading and interpreting screening mammograms.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali
2017-01-01
Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali
2017-01-01
Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports. PMID:28166263
Effective Feature Selection for Classification of Promoter Sequences.
K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish
2016-01-01
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
Support-vector-based emergent self-organising approach for emotional understanding
NASA Astrophysics Data System (ADS)
Nguwi, Yok-Yen; Cho, Siu-Yeung
2010-12-01
This study discusses the computational analysis of general emotion understanding from questionnaires methodology. The questionnaires method approaches the subject by investigating the real experience that accompanied the emotions, whereas the other laboratory approaches are generally associated with exaggerated elements. We adopted a connectionist model called support-vector-based emergent self-organising map (SVESOM) to analyse the emotion profiling from the questionnaires method. The SVESOM first identifies the important variables by giving discriminative features with high ranking. The classifier then performs the classification based on the selected features. Experimental results show that the top rank features are in line with the work of Scherer and Wallbott [(1994), 'Evidence for Universality and Cultural Variation of Differential Emotion Response Patterning', Journal of Personality and Social Psychology, 66, 310-328], which approached the emotions physiologically. While the performance measures show that using the full features for classifications can degrade the performance, the selected features provide superior results in terms of accuracy and generalisation.
Feature-based attentional modulation increases with stimulus separation in divided-attention tasks.
Sally, Sharon L; Vidnyánsky, Zoltán; Papathomas, Thomas V
2009-01-01
Attention modifies our visual experience by selecting certain aspects of a scene for further processing. It is therefore important to understand factors that govern the deployment of selective attention over the visual field. Both location and feature-specific mechanisms of attention have been identified and their modulatory effects can interact at a neural level (Treue and Martinez-Trujillo, 1999). The effects of spatial parameters on feature-based attentional modulation were examined for the feature dimensions of orientation, motion and color using three divided-attention tasks. Subjects performed concurrent discriminations of two briefly presented targets (Gabor patches) to the left and right of a central fixation point at eccentricities of +/-2.5 degrees , 5 degrees , 10 degrees and 15 degrees in the horizontal plane. Gabors were size-scaled to maintain consistent single-task performance across eccentricities. For all feature dimensions, the data show a linear increase in the attentional effects with target separation. In a control experiment, Gabors were presented on an isoeccentric viewing arc at 10 degrees and 15 degrees at the closest spatial separation (+/-2.5 degrees ) of the main experiment. Under these conditions, the effects of feature-based attentional effects were largely eliminated. Our results are consistent with the hypothesis that feature-based attention prioritizes the processing of attended features. Feature-based attentional mechanisms may have helped direct the attentional focus to the appropriate target locations at greater separations, whereas similar assistance may not have been necessary at closer target spacings. The results of the present study specify conditions under which dual-task performance benefits from sharing similar target features and may therefore help elucidate the processes by which feature-based attention operates.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.
Donoho, David; Jin, Jiashun
2008-09-30
In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
Donoho, David; Jin, Jiashun
2008-01-01
In important application fields today—genomics and proteomics are examples—selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, …, p, let πi denote the two-sided P-value associated with the ith feature Z-score and π(i) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p − π(i))/i/p(1−i/p). We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT. PMID:18815365
Adam, Asrul; Shapiai, Mohd Ibrahim; Tumari, Mohd Zaidi Mohd; Mohamad, Mohd Saberi; Mubin, Marizan
2014-01-01
Electroencephalogram (EEG) signal peak detection is widely used in clinical applications. The peak point can be detected using several approaches, including time, frequency, time-frequency, and nonlinear domains depending on various peak features from several models. However, there is no study that provides the importance of every peak feature in contributing to a good and generalized model. In this study, feature selection and classifier parameters estimation based on particle swarm optimization (PSO) are proposed as a framework for peak detection on EEG signals in time domain analysis. Two versions of PSO are used in the study: (1) standard PSO and (2) random asynchronous particle swarm optimization (RA-PSO). The proposed framework tries to find the best combination of all the available features that offers good peak detection and a high classification rate from the results in the conducted experiments. The evaluation results indicate that the accuracy of the peak detection can be improved up to 99.90% and 98.59% for training and testing, respectively, as compared to the framework without feature selection adaptation. Additionally, the proposed framework based on RA-PSO offers a better and reliable classification rate as compared to standard PSO as it produces low variance model.
Neuromuscular disease classification system
NASA Astrophysics Data System (ADS)
Sáez, Aurora; Acha, Begoña; Montero-Sánchez, Adoración; Rivas, Eloy; Escudero, Luis M.; Serrano, Carmen
2013-06-01
Diagnosis of neuromuscular diseases is based on subjective visual assessment of biopsies from patients by the pathologist specialist. A system for objective analysis and classification of muscular dystrophies and neurogenic atrophies through muscle biopsy images of fluorescence microscopy is presented. The procedure starts with an accurate segmentation of the muscle fibers using mathematical morphology and a watershed transform. A feature extraction step is carried out in two parts: 24 features that pathologists take into account to diagnose the diseases and 58 structural features that the human eye cannot see, based on the assumption that the biopsy is considered as a graph, where the nodes are represented by each fiber, and two nodes are connected if two fibers are adjacent. A feature selection using sequential forward selection and sequential backward selection methods, a classification using a Fuzzy ARTMAP neural network, and a study of grading the severity are performed on these two sets of features. A database consisting of 91 images was used: 71 images for the training step and 20 as the test. A classification error of 0% was obtained. It is concluded that the addition of features undetectable by the human visual inspection improves the categorization of atrophic patterns.
A model of proto-object based saliency
Russell, Alexander F.; Mihalaş, Stefan; von der Heydt, Rudiger; Niebur, Ernst; Etienne-Cummings, Ralph
2013-01-01
Organisms use the process of selective attention to optimally allocate their computational resources to the instantaneously most relevant subsets of a visual scene, ensuring that they can parse the scene in real time. Many models of bottom-up attentional selection assume that elementary image features, like intensity, color and orientation, attract attention. Gestalt psychologists, how-ever, argue that humans perceive whole objects before they analyze individual features. This is supported by recent psychophysical studies that show that objects predict eye-fixations better than features. In this report we present a neurally inspired algorithm of object based, bottom-up attention. The model rivals the performance of state of the art non-biologically plausible feature based algorithms (and outperforms biologically plausible feature based algorithms) in its ability to predict perceptual saliency (eye fixations and subjective interest points) in natural scenes. The model achieves this by computing saliency as a function of proto-objects that establish the perceptual organization of the scene. All computational mechanisms of the algorithm have direct neural correlates, and our results provide evidence for the interface theory of attention. PMID:24184601
Image feature extraction based on the camouflage effectiveness evaluation
NASA Astrophysics Data System (ADS)
Yuan, Xin; Lv, Xuliang; Li, Ling; Wang, Xinzhu; Zhang, Zhi
2018-04-01
The key step of camouflage effectiveness evaluation is how to combine the human visual physiological features, psychological features to select effectively evaluation indexes. Based on the predecessors' camo comprehensive evaluation method, this paper chooses the suitable indexes combining with the image quality awareness, and optimizes those indexes combining with human subjective perception. Thus, it perfects the theory of index extraction.
NASA Astrophysics Data System (ADS)
Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa
Fatigued monetary bills adversely affect the daily operation of automated teller machines (ATMs). In order to make the classification of fatigued bills more efficient, the development of an automatic fatigued monetary bill classification method is desirable. We propose a new method by which to estimate the fatigue level of monetary bills from the feature-selected frequency band acoustic energy pattern of banking machines. By using a supervised self-organizing map (SOM), we effectively estimate the fatigue level using only the feature-selected frequency band acoustic energy pattern. Furthermore, the feature-selected frequency band acoustic energy pattern improves the estimation accuracy of the fatigue level of monetary bills by adding frequency domain information to the acoustic energy pattern. The experimental results with real monetary bill samples reveal the effectiveness of the proposed method.
NASA Astrophysics Data System (ADS)
Wang, Xiao; Burghardt, Dirk
2018-05-01
This paper presents a new strategy for the generalization of discrete area features by using stroke grouping method and polarization transportation selection. The mentioned stroke is constructed on derive of the refined proximity graph of area features, and the refinement is under the control of four constraints to meet different grouping requirements. The area features which belong to the same stroke are detected into the same group. The stroke-based strategy decomposes the generalization process into two sub-processes by judging whether the area features related to strokes or not. For the area features which belong to the same one stroke, they normally present a linear like pat-tern, and in order to preserve this kind of pattern, typification is chosen as the operator to implement the generalization work. For the remaining area features which are not related by strokes, they are still distributed randomly and discretely, and the selection is chosen to conduct the generalization operation. For the purpose of retaining their original distribution characteristic, a Polarization Transportation (PT) method is introduced to implement the selection operation. Buildings and lakes are selected as the representatives of artificial area feature and natural area feature respectively to take the experiments. The generalized results indicate that by adopting this proposed strategy, the original distribution characteristics of building and lake data can be preserved, and the visual perception is pre-served as before.
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)
2001-01-01
Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.
Determinants of Global Color-Based Selection in Human Visual Cortex.
Bartsch, Mandy V; Boehler, Carsten N; Stoppel, Christian M; Merkel, Christian; Heinze, Hans-Jochen; Schoenfeld, Mircea A; Hopf, Jens-Max
2015-09-01
Feature attention operates in a spatially global way, with attended feature values being prioritized for selection outside the focus of attention. Accounts of global feature attention have emphasized feature competition as a determining factor. Here, we use magnetoencephalographic recordings in humans to test whether competition is critical for global feature selection to arise. Subjects performed a color/shape discrimination task in one visual field (VF), while irrelevant color probes were presented in the other unattended VF. Global effects of color attention were assessed by analyzing the response to the probe as a function of whether or not the probe's color was a target-defining color. We find that global color selection involves a sequence of modulations in extrastriate cortex, with an initial phase in higher tier areas (lateral occipital complex) followed by a later phase in lower tier retinotopic areas (V3/V4). Importantly, these modulations appeared with and without color competition in the focus of attention. Moreover, early parts of the modulation emerged for a task-relevant color not even present in the focus of attention. All modulations, however, were eliminated during simple onset-detection of the colored target. These results indicate that global color-based attention depends on target discrimination independent of feature competition in the focus of attention. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei
2006-06-23
Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.
Detrended fluctuation analysis for major depressive disorder.
Mumtaz, Wajid; Malik, Aamir Saeed; Ali, Syed Saad Azhar; Yasin, Mohd Azhar Mohd; Amin, Hafeezullah
2015-01-01
Clinical utility of Electroencephalography (EEG) based diagnostic studies is less clear for major depressive disorder (MDD). In this paper, a novel machine learning (ML) scheme was presented to discriminate the MDD patients and healthy controls. The proposed method inherently involved feature extraction, selection, classification and validation. The EEG data acquisition involved eyes closed (EC) and eyes open (EO) conditions. At feature extraction stage, the de-trended fluctuation analysis (DFA) was performed, based on the EEG data, to achieve scaling exponents. The DFA was performed to analyzes the presence or absence of long-range temporal correlations (LRTC) in the recorded EEG data. The scaling exponents were used as input features to our proposed system. At feature selection stage, 3 different techniques were used for comparison purposes. Logistic regression (LR) classifier was employed. The method was validated by a 10-fold cross-validation. As results, we have observed that the effect of 3 different reference montages on the computed features. The proposed method employed 3 different types of feature selection techniques for comparison purposes as well. The results show that the DFA analysis performed better in LE data compared with the IR and AR data. In addition, during Wilcoxon ranking, the AR performed better than LE and IR. Based on the results, it was concluded that the DFA provided useful information to discriminate the MDD patients and with further validation can be employed in clinics for diagnosis of MDD.
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.
Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra
2011-01-01
Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
NASA Astrophysics Data System (ADS)
Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien
2012-02-01
It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.
Hadoop neural network for parallel and distributed feature selection.
Hodge, Victoria J; O'Keefe, Simon; Austin, Jim
2016-06-01
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Combining Feature Selection and Integration—A Neural Model for MT Motion Selectivity
Beck, Cornelia; Neumann, Heiko
2011-01-01
Background The computation of pattern motion in visual area MT based on motion input from area V1 has been investigated in many experiments and models attempting to replicate the main mechanisms. Two different core conceptual approaches were developed to explain the findings. In integrationist models the key mechanism to achieve pattern selectivity is the nonlinear integration of V1 motion activity. In contrast, selectionist models focus on the motion computation at positions with 2D features. Methodology/Principal Findings Recent experiments revealed that neither of the two concepts alone is sufficient to explain all experimental data and that most of the existing models cannot account for the complex behaviour found. MT pattern selectivity changes over time for stimuli like type II plaids from vector average to the direction computed with an intersection of constraint rule or by feature tracking. Also, the spatial arrangement of the stimulus within the receptive field of a MT cell plays a crucial role. We propose a recurrent neural model showing how feature integration and selection can be combined into one common architecture to explain these findings. The key features of the model are the computation of 1D and 2D motion in model area V1 subpopulations that are integrated in model MT cells using feedforward and feedback processing. Our results are also in line with findings concerning the solution of the aperture problem. Conclusions/Significance We propose a new neural model for MT pattern computation and motion disambiguation that is based on a combination of feature selection and integration. The model can explain a range of recent neurophysiological findings including temporally dynamic behaviour. PMID:21814543
Soguero-Ruiz, Cristina; Hindberg, Kristian; Rojo-Alvarez, Jose Luis; Skrovseth, Stein Olav; Godtliebsen, Fred; Mortensen, Kim; Revhaug, Arthur; Lindsetmo, Rolv-Ole; Augestad, Knut Magne; Jenssen, Robert
2016-09-01
The free text in electronic health records (EHRs) conveys a huge amount of clinical information about health state and patient history. Despite a rapidly growing literature on the use of machine learning techniques for extracting this information, little effort has been invested toward feature selection and the features' corresponding medical interpretation. In this study, we focus on the task of early detection of anastomosis leakage (AL), a severe complication after elective surgery for colorectal cancer (CRC) surgery, using free text extracted from EHRs. We use a bag-of-words model to investigate the potential for feature selection strategies. The purpose is earlier detection of AL and prediction of AL with data generated in the EHR before the actual complication occur. Due to the high dimensionality of the data, we derive feature selection strategies using the robust support vector machine linear maximum margin classifier, by investigating: 1) a simple statistical criterion (leave-one-out-based test); 2) an intensive-computation statistical criterion (Bootstrap resampling); and 3) an advanced statistical criterion (kernel entropy). Results reveal a discriminatory power for early detection of complications after CRC (sensitivity 100%; specificity 72%). These results can be used to develop prediction models, based on EHR data, that can support surgeons and patients in the preoperative decision making phase.
Farhan, Saima; Fahiem, Muhammad Abuzar; Tauseef, Huma
2014-01-01
Structural brain imaging is playing a vital role in identification of changes that occur in brain associated with Alzheimer's disease. This paper proposes an automated image processing based approach for the identification of AD from MRI of the brain. The proposed approach is novel in a sense that it has higher specificity/accuracy values despite the use of smaller feature set as compared to existing approaches. Moreover, the proposed approach is capable of identifying AD patients in early stages. The dataset selected consists of 85 age and gender matched individuals from OASIS database. The features selected are volume of GM, WM, and CSF and size of hippocampus. Three different classification models (SVM, MLP, and J48) are used for identification of patients and controls. In addition, an ensemble of classifiers, based on majority voting, is adopted to overcome the error caused by an independent base classifier. Ten-fold cross validation strategy is applied for the evaluation of our scheme. Moreover, to evaluate the performance of proposed approach, individual features and combination of features are fed to individual classifiers and ensemble based classifier. Using size of left hippocampus as feature, the accuracy achieved with ensemble of classifiers is 93.75%, with 100% specificity and 87.5% sensitivity.
Boland, Mary Regina; Miotto, Riccardo; Gao, Junfeng; Weng, Chunhua
2013-01-01
Summary Background When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space. Objectives This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes. Methods We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively. Results We extracted 1,437 distinct eligibility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction. Conclusions It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency. PMID:23666475
Boland, M R; Miotto, R; Gao, J; Weng, C
2013-01-01
When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space. This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes. We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively. We extracted 1,437 distinct eligibility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction. It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency.
Classifying vulnerability to sleep deprivation using baseline measures of psychomotor vigilance.
Patanaik, Amiya; Kwoh, Chee Keong; Chua, Eric C P; Gooley, Joshua J; Chee, Michael W L
2015-05-01
To identify measures derived from baseline psychomotor vigilance task (PVT) performance that can reliably predict vulnerability to sleep deprivation. Subjects underwent total sleep deprivation and completed a 10-min PVT every 1-2 h in a controlled laboratory setting. Participants were categorized as vulnerable or resistant to sleep deprivation, based on a median split of lapses that occurred following sleep deprivation. Standard reaction time, drift diffusion model (DDM), and wavelet metrics were derived from PVT response times collected at baseline. A support vector machine model that incorporated maximum relevance and minimum redundancy feature selection and wrapper-based heuristics was used to classify subjects as vulnerable or resistant using rested data. Two academic sleep laboratories. Independent samples of 135 (69 women, age 18 to 25 y), and 45 (3 women, age 22 to 32 y) healthy adults. In both datasets, DDM measures, number of consecutive reaction times that differ by more than 250 ms, and two wavelet features were selected by the model as features predictive of vulnerability to sleep deprivation. Using the best set of features selected in each dataset, classification accuracy was 77% and 82% using fivefold stratified cross-validation, respectively. In both datasets, DDM measures, number of consecutive reaction times that differ by more than 250 ms, and two wavelet features were selected by the model as features predictive of vulnerability to sleep deprivation. Using the best set of features selected in each dataset, classification accuracy was 77% and 82% using fivefold stratified cross-validation, respectively. Despite differences in experimental conditions across studies, drift diffusion model parameters associated reliably with individual differences in performance during total sleep deprivation. These results demonstrate the utility of drift diffusion modeling of baseline performance in estimating vulnerability to psychomotor vigilance decline following sleep deprivation. © 2015 Associated Professional Sleep Societies, LLC.
MRM-Lasso: A Sparse Multiview Feature Selection Method via Low-Rank Analysis.
Yang, Wanqi; Gao, Yang; Shi, Yinghuan; Cao, Longbing
2015-11-01
Learning about multiview data involves many applications, such as video understanding, image classification, and social media. However, when the data dimension increases dramatically, it is important but very challenging to remove redundant features in multiview feature selection. In this paper, we propose a novel feature selection algorithm, multiview rank minimization-based Lasso (MRM-Lasso), which jointly utilizes Lasso for sparse feature selection and rank minimization for learning relevant patterns across views. Instead of simply integrating multiple Lasso from view level, we focus on the performance of sample-level (sample significance) and introduce pattern-specific weights into MRM-Lasso. The weights are utilized to measure the contribution of each sample to the labels in the current view. In addition, the latent correlation across different views is successfully captured by learning a low-rank matrix consisting of pattern-specific weights. The alternating direction method of multipliers is applied to optimize the proposed MRM-Lasso. Experiments on four real-life data sets show that features selected by MRM-Lasso have better multiview classification performance than the baselines. Moreover, pattern-specific weights are demonstrated to be significant for learning about multiview data, compared with view-specific weights.
A bootstrap based Neyman-Pearson test for identifying variable importance.
Ditzler, Gregory; Polikar, Robi; Rosen, Gail
2015-04-01
Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.
A proposed configurable approach for recommendation systems via data mining techniques
NASA Astrophysics Data System (ADS)
Khedr, Ayman E.; Idrees, Amira M.; Hegazy, Abd El-Fatah; El-Shewy, Samir
2018-02-01
This study presents a configurable approach for recommendations which determines the suitable recommendation method for each field based on the characteristics of its data, the method includes determining the suitable technique for selecting a representative sample of the provided data. Then selecting the suitable feature weighting measure to provide a correct weight for each feature based on its effect on the recommendations. Finally, selecting the suitable algorithm to provide the required recommendations. The proposed configurable approach could be applied on different domains. The experiments have revealed that the approach is able to provide recommendations with only 0.89 error rate percentage.
The limb movement analysis of rehabilitation exercises using wearable inertial sensors.
Bingquan Huang; Giggins, Oonagh; Kechadi, Tahar; Caulfield, Brian
2016-08-01
Due to no supervision of a therapist in home based exercise programs, inertial sensor based feedback systems which can accurately assess movement repetitions are urgently required. The synchronicity and the degrees of freedom both show that one movement might resemble another movement signal which is mixed in with another not precisely defined movement. Therefore, the data and feature selections are important for movement analysis. This paper explores the data and feature selection for the limb movement analysis of rehabilitation exercises. The results highlight that the classification accuracy is very sensitive to the mount location of the sensors. The results show that the use of 2 or 3 sensor units, the combination of acceleration and gyroscope data, and the feature sets combined by the statistical feature set with another type of feature, can significantly improve the classification accuracy rates. The results illustrate that acceleration data is more effective than gyroscope data for most of the movement analysis.
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Dutra, L. V.; Mascarenhas, N. D. A.; Mitsuo, Fernando Augusta, II
1984-01-01
A study area near Ribeirao Preto in Sao Paulo state was selected, with predominance in sugar cane. Eight features were extracted from the 4 original bands of LANDSAT image, using low-pass and high-pass filtering to obtain spatial features. There were 5 training sites in order to acquire the necessary parameters. Two groups of four channels were selected from 12 channels using JM-distance and entropy criterions. The number of selected channels was defined by physical restrictions of the image analyzer and computacional costs. The evaluation was performed by extracting the confusion matrix for training and tests areas, with a maximum likelihood classifier, and by defining performance indexes based on those matrixes for each group of channels. Results show that in spatial features and supervised classification, the entropy criterion is better in the sense that allows a more accurate and generalized definition of class signature. On the other hand, JM-distance criterion strongly reduces the misclassification within training areas.
Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.
Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin
2017-04-01
As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.
Das, Dev Kumar; Ghosh, Madhumala; Pal, Mallika; Maiti, Asok K; Chakraborty, Chandan
2013-02-01
The aim of this paper is to address the development of computer assisted malaria parasite characterization and classification using machine learning approach based on light microscopic images of peripheral blood smears. In doing this, microscopic image acquisition from stained slides, illumination correction and noise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classification of different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated. The erythrocytes are segmented using marker controlled watershed transformation and subsequently total ninety six features describing shape-size and texture of erythrocytes are extracted in respect to the parasitemia infected versus non-infected cells. Ninety four features are found to be statistically significant in discriminating six classes. Here a feature selection-cum-classification scheme has been devised by combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine (SVM) in order to provide the higher classification accuracy using best set of discriminating features. Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classification by selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 most significant features. Finally, the performance of these two classifiers under feature selection framework has been compared toward malaria parasite classification. Copyright © 2012 Elsevier Ltd. All rights reserved.
Neural Determinants of Task Performance during Feature-Based Attention in Human Cortex
Gong, Mengyuan
2018-01-01
Abstract Studies of feature-based attention have associated activity in a dorsal frontoparietal network with putative attentional priority signals. Yet, how this neural activity mediates attentional selection and whether it guides behavior are fundamental questions that require investigation. We reasoned that endogenous fluctuations in the quality of attentional priority should influence task performance. Human subjects detected a speed increment while viewing clockwise (CW) or counterclockwise (CCW) motion (baseline task) or while attending to either direction amid distracters (attention task). In an fMRI experiment, direction-specific neural pattern similarity between the baseline task and the attention task revealed a higher level of similarity for correct than incorrect trials in frontoparietal regions. Using transcranial magnetic stimulation (TMS), we disrupted posterior parietal cortex (PPC) and found a selective deficit in the attention task, but not in the baseline task, demonstrating the necessity of this cortical area during feature-based attention. These results reveal that frontoparietal areas maintain attentional priority that facilitates successful behavioral selection. PMID:29497703
Computer-aided diagnosis of melanoma using border and wavelet-based texture analysis.
Garnavi, Rahil; Aldeen, Mohammad; Bailey, James
2012-11-01
This paper presents a novel computer-aided diagnosis system for melanoma. The novelty lies in the optimised selection and integration of features derived from textural, borderbased and geometrical properties of the melanoma lesion. The texture features are derived from using wavelet-decomposition, the border features are derived from constructing a boundaryseries model of the lesion border and analysing it in spatial and frequency domains, and the geometry features are derived from shape indexes. The optimised selection of features is achieved by using the Gain-Ratio method, which is shown to be computationally efficient for melanoma diagnosis application. Classification is done through the use of four classifiers; namely, Support Vector Machine, Random Forest, Logistic Model Tree and Hidden Naive Bayes. The proposed diagnostic system is applied on a set of 289 dermoscopy images (114 malignant, 175 benign) partitioned into train, validation and test image sets. The system achieves and accuracy of 91.26% and AUC value of 0.937, when 23 features are used. Other important findings include (i) the clear advantage gained in complementing texture with border and geometry features, compared to using texture information only, and (ii) higher contribution of texture features than border-based features in the optimised feature set.
Stabilizing l1-norm prediction models by supervised feature grouping.
Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha
2016-02-01
Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. Copyright © 2015 Elsevier Inc. All rights reserved.
Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.
Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria
2018-04-18
Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Object-based attention: strength of object representation and attentional guidance.
Shomstein, Sarah; Behrmann, Marlene
2008-01-01
Two or more features belonging to a single object are identified more quickly and more accurately than are features belonging to different objects--a finding attributed to sensory enhancement of all features belonging to an attended or selected object. However, several recent studies have suggested that this "single-object advantage" may be a product of probabilistic and configural strategic prioritizations rather than of object-based perceptual enhancement per se, challenging the underlying mechanism that is thought to give rise to object-based attention. In the present article, we further explore constraints on the mechanisms of object-based selection by examining the contribution of the strength of object representations to the single-object advantage. We manipulated factors such as exposure duration (i.e., preview time) and salience of configuration (i.e., objects). Varying preview time changes the magnitude of the object-based effect, so that if there is ample time to establish an object representation (i.e., preview time of 1,000 msec), then both probability and configuration (i.e., objects) guide attentional selection. If, however, insufficient time is provided to establish a robust object-based representation, then only probabilities guide attentional selection. Interestingly, at a short preview time of 200 msec, when the two objects were sufficiently different from each other (i.e., different colors), both configuration and probability guided attention selection. These results suggest that object-based effects can be explained both in terms of strength of object representations (established at longer exposure durations and by pictorial cues) and probabilistic contingencies in the visual environment.
Infants' Perception of Chasing
ERIC Educational Resources Information Center
Frankenhuis, Willem E.; House, Bailey; Barrett, H. Clark; Johnson, Scott P.
2013-01-01
Two significant questions in cognitive and developmental science are first, whether objects and events are selected for attention based on their features (featural processing) or the configuration of their features (configural processing), and second, how these modes of processing develop. These questions have been addressed in part with…
NASA Astrophysics Data System (ADS)
Pandremmenou, K.; Shahid, M.; Kondi, L. P.; Lövström, B.
2015-03-01
In this work, we propose a No-Reference (NR) bitstream-based model for predicting the quality of H.264/AVC video sequences, affected by both compression artifacts and transmission impairments. The proposed model is based on a feature extraction procedure, where a large number of features are calculated from the packet-loss impaired bitstream. Many of the features are firstly proposed in this work, and the specific set of the features as a whole is applied for the first time for making NR video quality predictions. All feature observations are taken as input to the Least Absolute Shrinkage and Selection Operator (LASSO) regression method. LASSO indicates the most important features, and using only them, it is possible to estimate the Mean Opinion Score (MOS) with high accuracy. Indicatively, we point out that only 13 features are able to produce a Pearson Correlation Coefficient of 0.92 with the MOS. Interestingly, the performance statistics we computed in order to assess our method for predicting the Structural Similarity Index and the Video Quality Metric are equally good. Thus, the obtained experimental results verified the suitability of the features selected by LASSO as well as the ability of LASSO in making accurate predictions through sparse modeling.
Hollingworth, Andrew; Hwang, Seongmin
2013-01-01
We examined the conditions under which a feature value in visual working memory (VWM) recruits visual attention to matching stimuli. Previous work has suggested that VWM supports two qualitatively different states of representation: an active state that interacts with perceptual selection and a passive (or accessory) state that does not. An alternative hypothesis is that VWM supports a single form of representation, with the precision of feature memory controlling whether or not the representation interacts with perceptual selection. The results of three experiments supported the dual-state hypothesis. We established conditions under which participants retained a relatively precise representation of a parcticular colour. If the colour was immediately task relevant, it reliably recruited attention to matching stimuli. However, if the colour was not immediately task relevant, it failed to interact with perceptual selection. Feature maintenance in VWM is not necessarily equivalent with feature-based attentional selection. PMID:24018723
A Transform-Based Feature Extraction Approach for Motor Imagery Tasks Classification
Khorshidtalab, Aida; Mesbah, Mostefa; Salami, Momoh J. E.
2015-01-01
In this paper, we present a new motor imagery classification method in the context of electroencephalography (EEG)-based brain–computer interface (BCI). This method uses a signal-dependent orthogonal transform, referred to as linear prediction singular value decomposition (LP-SVD), for feature extraction. The transform defines the mapping as the left singular vectors of the LP coefficient filter impulse response matrix. Using a logistic tree-based model classifier; the extracted features are classified into one of four motor imagery movements. The proposed approach was first benchmarked against two related state-of-the-art feature extraction approaches, namely, discrete cosine transform (DCT) and adaptive autoregressive (AAR)-based methods. By achieving an accuracy of 67.35%, the LP-SVD approach outperformed the other approaches by large margins (25% compared with DCT and 6 % compared with AAR-based methods). To further improve the discriminatory capability of the extracted features and reduce the computational complexity, we enlarged the extracted feature subset by incorporating two extra features, namely, Q- and the Hotelling’s \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$T^{2}$ \\end{document} statistics of the transformed EEG and introduced a new EEG channel selection method. The performance of the EEG classification based on the expanded feature set and channel selection method was compared with that of a number of the state-of-the-art classification methods previously reported with the BCI IIIa competition data set. Our method came second with an average accuracy of 81.38%. PMID:27170898
An Automatic Prediction of Epileptic Seizures Using Cloud Computing and Wireless Sensor Networks.
Sareen, Sanjay; Sood, Sandeep K; Gupta, Sunil Kumar
2016-11-01
Epilepsy is one of the most common neurological disorders which is characterized by the spontaneous and unforeseeable occurrence of seizures. An automatic prediction of seizure can protect the patients from accidents and save their life. In this article, we proposed a mobile-based framework that automatically predict seizures using the information contained in electroencephalography (EEG) signals. The wireless sensor technology is used to capture the EEG signals of patients. The cloud-based services are used to collect and analyze the EEG data from the patient's mobile phone. The features from the EEG signal are extracted using the fast Walsh-Hadamard transform (FWHT). The Higher Order Spectral Analysis (HOSA) is applied to FWHT coefficients in order to select the features set relevant to normal, preictal and ictal states of seizure. We subsequently exploit the selected features as input to a k-means classifier to detect epileptic seizure states in a reasonable time. The performance of the proposed model is tested on Amazon EC2 cloud and compared in terms of execution time and accuracy. The findings show that with selected HOS based features, we were able to achieve a classification accuracy of 94.6 %.
Mutual information-based feature selection for radiomics
NASA Astrophysics Data System (ADS)
Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine
2016-03-01
Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.
Feature-based data assimilation in geophysics
NASA Astrophysics Data System (ADS)
Morzfeld, Matthias; Adams, Jesse; Lunderman, Spencer; Orozco, Rafael
2018-05-01
Many applications in science require that computational models and data be combined. In a Bayesian framework, this is usually done by defining likelihoods based on the mismatch of model outputs and data. However, matching model outputs and data in this way can be unnecessary or impossible. For example, using large amounts of steady state data is unnecessary because these data are redundant. It is numerically difficult to assimilate data in chaotic systems. It is often impossible to assimilate data of a complex system into a low-dimensional model. As a specific example, consider a low-dimensional stochastic model for the dipole of the Earth's magnetic field, while other field components are ignored in the model. The above issues can be addressed by selecting features of the data, and defining likelihoods based on the features, rather than by the usual mismatch of model output and data. Our goal is to contribute to a fundamental understanding of such a feature-based approach that allows us to assimilate selected aspects of data into models. We also explain how the feature-based approach can be interpreted as a method for reducing an effective dimension and derive new noise models, based on perturbed observations, that lead to computationally efficient solutions. Numerical implementations of our ideas are illustrated in four examples.
Tsai, Richard Tzong-Han; Sung, Cheng-Lung; Dai, Hong-Jie; Hung, Hsieh-Chuan; Sung, Ting-Yi; Hsu, Wen-Lian
2006-12-18
Biomedical named entity recognition (Bio-NER) is a challenging problem because, in general, biomedical named entities of the same category (e.g., proteins and genes) do not follow one standard nomenclature. They have many irregularities and sometimes appear in ambiguous contexts. In recent years, machine-learning (ML) approaches have become increasingly common and now represent the cutting edge of Bio-NER technology. This paper addresses three problems faced by ML-based Bio-NER systems. First, most ML approaches usually employ singleton features that comprise one linguistic property (e.g., the current word is capitalized) and at least one class tag (e.g., B-protein, the beginning of a protein name). However, such features may be insufficient in cases where multiple properties must be considered. Adding conjunction features that contain multiple properties can be beneficial, but it would be infeasible to include all conjunction features in an NER model since memory resources are limited and some features are ineffective. To resolve the problem, we use a sequential forward search algorithm to select an effective set of features. Second, variations in the numerical parts of biomedical terms (e.g., "2" in the biomedical term IL2) cause data sparseness and generate many redundant features. In this case, we apply numerical normalization, which solves the problem by replacing all numerals in a term with one representative numeral to help classify named entities. Third, the assignment of NE tags does not depend solely on the target word's closest neighbors, but may depend on words outside the context window (e.g., a context window of five consists of the current word plus two preceding and two subsequent words). We use global patterns generated by the Smith-Waterman local alignment algorithm to identify such structures and modify the results of our ML-based tagger. This is called pattern-based post-processing. To develop our ML-based Bio-NER system, we employ conditional random fields, which have performed effectively in several well-known tasks, as our underlying ML model. Adding selected conjunction features, applying numerical normalization, and employing pattern-based post-processing improve the F-scores by 1.67%, 1.04%, and 0.57%, respectively. The combined increase of 3.28% yields a total score of 72.98%, which is better than the baseline system that only uses singleton features. We demonstrate the benefits of using the sequential forward search algorithm to select effective conjunction feature groups. In addition, we show that numerical normalization can effectively reduce the number of redundant and unseen features. Furthermore, the Smith-Waterman local alignment algorithm can help ML-based Bio-NER deal with difficult cases that need longer context windows.
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.
Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao
2015-04-01
Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Case-based fracture image retrieval.
Zhou, Xin; Stern, Richard; Müller, Henning
2012-05-01
Case-based fracture image retrieval can assist surgeons in decisions regarding new cases by supplying visually similar past cases. This tool may guide fracture fixation and management through comparison of long-term outcomes in similar cases. A fracture image database collected over 10 years at the orthopedic service of the University Hospitals of Geneva was used. This database contains 2,690 fracture cases associated with 43 classes (based on the AO/OTA classification). A case-based retrieval engine was developed and evaluated using retrieval precision as a performance metric. Only cases in the same class as the query case are considered as relevant. The scale-invariant feature transform (SIFT) is used for image analysis. Performance evaluation was computed in terms of mean average precision (MAP) and early precision (P10, P30). Retrieval results produced with the GNU image finding tool (GIFT) were used as a baseline. Two sampling strategies were evaluated. One used a dense 40 × 40 pixel grid sampling, and the second one used the standard SIFT features. Based on dense pixel grid sampling, three unsupervised feature selection strategies were introduced to further improve retrieval performance. With dense pixel grid sampling, the image is divided into 1,600 (40 × 40) square blocks. The goal is to emphasize the salient regions (blocks) and ignore irrelevant regions. Regions are considered as important when a high variance of the visual features is found. The first strategy is to calculate the variance of all descriptors on the global database. The second strategy is to calculate the variance of all descriptors for each case. A third strategy is to perform a thumbnail image clustering in a first step and then to calculate the variance for each cluster. Finally, a fusion between a SIFT-based system and GIFT is performed. A first comparison on the selection of sampling strategies using SIFT features shows that dense sampling using a pixel grid (MAP = 0.18) outperformed the SIFT detector-based sampling approach (MAP = 0.10). In a second step, three unsupervised feature selection strategies were evaluated. A grid parameter search is applied to optimize parameters for feature selection and clustering. Results show that using half of the regions (700 or 800) obtains the best performance for all three strategies. Increasing the number of clusters in clustering can also improve the retrieval performance. The SIFT descriptor variance in each case gave the best indication of saliency for the regions (MAP = 0.23), better than the other two strategies (MAP = 0.20 and 0.21). Combining GIFT (MAP = 0.23) and the best SIFT strategy (MAP = 0.23) produced significantly better results (MAP = 0.27) than each system alone. A case-based fracture retrieval engine was developed and is available for online demonstration. SIFT is used to extract local features, and three feature selection strategies were introduced and evaluated. A baseline using the GIFT system was used to evaluate the salient point-based approaches. Without supervised learning, SIFT-based systems with optimized parameters slightly outperformed the GIFT system. A fusion of the two approaches shows that the information contained in the two approaches is complementary. Supervised learning on the feature space is foreseen as the next step of this study.
NASA Astrophysics Data System (ADS)
Ding, Hao; Cao, Ming; DuPont, Andrew W.; Scott, Larry D.; Guha, Sushovan; Singhal, Shashideep; Younes, Mamoun; Pence, Isaac; Herline, Alan; Schwartz, David; Xu, Hua; Mahadevan-Jansen, Anita; Bi, Xiaohong
2016-03-01
Inflammatory bowel disease (IBD) is an idiopathic disease that is typically characterized by chronic inflammation of the gastrointestinal tract. Recently much effort has been devoted to the development of novel diagnostic tools that can assist physicians for fast, accurate, and automated diagnosis of the disease. Previous research based on Raman spectroscopy has shown promising results in differentiating IBD patients from normal screening cases. In the current study, we examined IBD patients in vivo through a colonoscope-coupled Raman system. Optical diagnosis for IBD discrimination was conducted based on full-range spectra using multivariate statistical methods. Further, we incorporated several feature selection methods in machine learning into the classification model. The diagnostic performance for disease differentiation was significantly improved after feature selection. Our results showed that improved IBD diagnosis can be achieved using Raman spectroscopy in combination with multivariate analysis and feature selection.
Dimensionality Reduction Through Classifier Ensembles
NASA Technical Reports Server (NTRS)
Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)
1999-01-01
In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.
Adam, Asrul; Mohd Tumari, Mohd Zaidi; Mohamad, Mohd Saberi
2014-01-01
Electroencephalogram (EEG) signal peak detection is widely used in clinical applications. The peak point can be detected using several approaches, including time, frequency, time-frequency, and nonlinear domains depending on various peak features from several models. However, there is no study that provides the importance of every peak feature in contributing to a good and generalized model. In this study, feature selection and classifier parameters estimation based on particle swarm optimization (PSO) are proposed as a framework for peak detection on EEG signals in time domain analysis. Two versions of PSO are used in the study: (1) standard PSO and (2) random asynchronous particle swarm optimization (RA-PSO). The proposed framework tries to find the best combination of all the available features that offers good peak detection and a high classification rate from the results in the conducted experiments. The evaluation results indicate that the accuracy of the peak detection can be improved up to 99.90% and 98.59% for training and testing, respectively, as compared to the framework without feature selection adaptation. Additionally, the proposed framework based on RA-PSO offers a better and reliable classification rate as compared to standard PSO as it produces low variance model. PMID:25243236
A face and palmprint recognition approach based on discriminant DCT feature extraction.
Jing, Xiao-Yuan; Zhang, David
2004-12-01
In the field of image processing and recognition, discrete cosine transform (DCT) and linear discrimination are two widely used techniques. Based on them, we present a new face and palmprint recognition approach in this paper. It first uses a two-dimensional separability judgment to select the DCT frequency bands with favorable linear separability. Then from the selected bands, it extracts the linear discriminative features by an improved Fisherface method and performs the classification by the nearest neighbor classifier. We detailedly analyze theoretical advantages of our approach in feature extraction. The experiments on face databases and palmprint database demonstrate that compared to the state-of-the-art linear discrimination methods, our approach obtains better classification performance. It can significantly improve the recognition rates for face and palmprint data and effectively reduce the dimension of feature space.
Kherfi, Mohammed Lamine; Ziou, Djemel
2006-04-01
In content-based image retrieval, understanding the user's needs is a challenging task that requires integrating him in the process of retrieval. Relevance feedback (RF) has proven to be an effective tool for taking the user's judgement into account. In this paper, we present a new RF framework based on a feature selection algorithm that nicely combines the advantages of a probabilistic formulation with those of using both the positive example (PE) and the negative example (NE). Through interaction with the user, our algorithm learns the importance he assigns to image features, and then applies the results obtained to define similarity measures that correspond better to his judgement. The use of the NE allows images undesired by the user to be discarded, thereby improving retrieval accuracy. As for the probabilistic formulation of the problem, it presents a multitude of advantages and opens the door to more modeling possibilities that achieve a good feature selection. It makes it possible to cluster the query data into classes, choose the probability law that best models each class, model missing data, and support queries with multiple PE and/or NE classes. The basic principle of our algorithm is to assign more importance to features with a high likelihood and those which distinguish well between PE classes and NE classes. The proposed algorithm was validated separately and in image retrieval context, and the experiments show that it performs a good feature selection and contributes to improving retrieval effectiveness.
Manifold Regularized Multitask Feature Learning for Multimodality Disease Classification
Jie, Biao; Zhang, Daoqiang; Cheng, Bo; Shen, Dinggang
2015-01-01
Multimodality based methods have shown great advantages in classification of Alzheimer’s disease (AD) and its prodromal stage, that is, mild cognitive impairment (MCI). Recently, multitask feature selection methods are typically used for joint selection of common features across multiple modalities. However, one disadvantage of existing multimodality based methods is that they ignore the useful data distribution information in each modality, which is essential for subsequent classification. Accordingly, in this paper we propose a manifold regularized multitask feature learning method to preserve both the intrinsic relatedness among multiple modalities of data and the data distribution information in each modality. Specifically, we denote the feature learning on each modality as a single task, and use group-sparsity regularizer to capture the intrinsic relatedness among multiple tasks (i.e., modalities) and jointly select the common features from multiple tasks. Furthermore, we introduce a new manifold-based Laplacian regularizer to preserve the data distribution information from each task. Finally, we use the multikernel support vector machine method to fuse multimodality data for eventual classification. Conversely, we also extend our method to the semisupervised setting, where only partial data are labeled. We evaluate our method using the baseline magnetic resonance imaging (MRI), fluorodeoxyglucose positron emission tomography (FDG-PET), and cerebrospinal fluid (CSF) data of subjects from AD neuroimaging initiative database. The experimental results demonstrate that our proposed method can not only achieve improved classification performance, but also help to discover the disease-related brain regions useful for disease diagnosis. PMID:25277605
Zhang, Xiong; Zhao, Yacong; Zhang, Yu; Zhong, Xuefei; Fan, Zhaowen
2018-01-01
The novel human-computer interface (HCI) using bioelectrical signals as input is a valuable tool to improve the lives of people with disabilities. In this paper, surface electromyography (sEMG) signals induced by four classes of wrist movements were acquired from four sites on the lower arm with our designed system. Forty-two features were extracted from the time, frequency and time-frequency domains. Optimal channels were determined from single-channel classification performance rank. The optimal-feature selection was according to a modified entropy criteria (EC) and Fisher discrimination (FD) criteria. The feature selection results were evaluated by four different classifiers, and compared with other conventional feature subsets. In online tests, the wearable system acquired real-time sEMG signals. The selected features and trained classifier model were used to control a telecar through four different paradigms in a designed environment with simple obstacles. Performance was evaluated based on travel time (TT) and recognition rate (RR). The results of hardware evaluation verified the feasibility of our acquisition systems, and ensured signal quality. Single-channel analysis results indicated that the channel located on the extensor carpi ulnaris (ECU) performed best with mean classification accuracy of 97.45% for all movement’s pairs. Channels placed on ECU and the extensor carpi radialis (ECR) were selected according to the accuracy rank. Experimental results showed that the proposed FD method was better than other feature selection methods and single-type features. The combination of FD and random forest (RF) performed best in offline analysis, with 96.77% multi-class RR. Online results illustrated that the state-machine paradigm with a 125 ms window had the highest maneuverability and was closest to real-life control. Subjects could accomplish online sessions by three sEMG-based paradigms, with average times of 46.02, 49.06 and 48.08 s, respectively. These experiments validate the feasibility of proposed real-time wearable HCI system and algorithms, providing a potential assistive device interface for persons with disabilities. PMID:29543737
Firefly as a novel swarm intelligence variable selection method in spectroscopy.
Goodarzi, Mohammad; dos Santos Coelho, Leandro
2014-12-10
A critical step in multivariate calibration is wavelength selection, which is used to build models with better prediction performance when applied to spectral data. Up to now, many feature selection techniques have been developed. Among all different types of feature selection techniques, those based on swarm intelligence optimization methodologies are more interesting since they are usually simulated based on animal and insect life behavior to, e.g., find the shortest path between a food source and their nests. This decision is made by a crowd, leading to a more robust model with less falling in local minima during the optimization cycle. This paper represents a novel feature selection approach to the selection of spectroscopic data, leading to more robust calibration models. The performance of the firefly algorithm, a swarm intelligence paradigm, was evaluated and compared with genetic algorithm and particle swarm optimization. All three techniques were coupled with partial least squares (PLS) and applied to three spectroscopic data sets. They demonstrate improved prediction results in comparison to when only a PLS model was built using all wavelengths. Results show that firefly algorithm as a novel swarm paradigm leads to a lower number of selected wavelengths while the prediction performance of built PLS stays the same. Copyright © 2014. Published by Elsevier B.V.
Automatic brain MR image denoising based on texture feature-based artificial neural networks.
Chang, Yu-Ning; Chang, Herng-Hua
2015-01-01
Noise is one of the main sources of quality deterioration not only for visual inspection but also in computerized processing in brain magnetic resonance (MR) image analysis such as tissue classification, segmentation and registration. Accordingly, noise removal in brain MR images is important for a wide variety of subsequent processing applications. However, most existing denoising algorithms require laborious tuning of parameters that are often sensitive to specific image features and textures. Automation of these parameters through artificial intelligence techniques will be highly beneficial. In the present study, an artificial neural network associated with image texture feature analysis is proposed to establish a predictable parameter model and automate the denoising procedure. In the proposed approach, a total of 83 image attributes were extracted based on four categories: 1) Basic image statistics. 2) Gray-level co-occurrence matrix (GLCM). 3) Gray-level run-length matrix (GLRLM) and 4) Tamura texture features. To obtain the ranking of discrimination in these texture features, a paired-samples t-test was applied to each individual image feature computed in every image. Subsequently, the sequential forward selection (SFS) method was used to select the best texture features according to the ranking of discrimination. The selected optimal features were further incorporated into a back propagation neural network to establish a predictable parameter model. A wide variety of MR images with various scenarios were adopted to evaluate the performance of the proposed framework. Experimental results indicated that this new automation system accurately predicted the bilateral filtering parameters and effectively removed the noise in a number of MR images. Comparing to the manually tuned filtering process, our approach not only produced better denoised results but also saved significant processing time.
Attention improves encoding of task-relevant features in the human visual cortex
Jehee, Janneke F.M.; Brady, Devin K.; Tong, Frank
2011-01-01
When spatial attention is directed towards a particular stimulus, increased activity is commonly observed in corresponding locations of the visual cortex. Does this attentional increase in activity indicate improved processing of all features contained within the attended stimulus, or might spatial attention selectively enhance the features relevant to the observer’s task? We used fMRI decoding methods to measure the strength of orientation-selective activity patterns in the human visual cortex while subjects performed either an orientation or contrast discrimination task, involving one of two laterally presented gratings. Greater overall BOLD activation with spatial attention was observed in areas V1-V4 for both tasks. However, multivariate pattern analysis revealed that orientation-selective responses were enhanced by attention only when orientation was the task-relevant feature, and not when the grating’s contrast had to be attended. In a second experiment, observers discriminated the orientation or color of a specific lateral grating. Here, orientation-selective responses were enhanced in both tasks but color-selective responses were enhanced only when color was task-relevant. In both experiments, task-specific enhancement of feature-selective activity was not confined to the attended stimulus location, but instead spread to other locations in the visual field, suggesting the concurrent involvement of a global feature-based attentional mechanism. These results suggest that attention can be remarkably selective in its ability to enhance particular task-relevant features, and further reveal that increases in overall BOLD amplitude are not necessarily accompanied by improved processing of stimulus information. PMID:21632942
Attention improves encoding of task-relevant features in the human visual cortex.
Jehee, Janneke F M; Brady, Devin K; Tong, Frank
2011-06-01
When spatial attention is directed toward a particular stimulus, increased activity is commonly observed in corresponding locations of the visual cortex. Does this attentional increase in activity indicate improved processing of all features contained within the attended stimulus, or might spatial attention selectively enhance the features relevant to the observer's task? We used fMRI decoding methods to measure the strength of orientation-selective activity patterns in the human visual cortex while subjects performed either an orientation or contrast discrimination task, involving one of two laterally presented gratings. Greater overall BOLD activation with spatial attention was observed in visual cortical areas V1-V4 for both tasks. However, multivariate pattern analysis revealed that orientation-selective responses were enhanced by attention only when orientation was the task-relevant feature and not when the contrast of the grating had to be attended. In a second experiment, observers discriminated the orientation or color of a specific lateral grating. Here, orientation-selective responses were enhanced in both tasks, but color-selective responses were enhanced only when color was task relevant. In both experiments, task-specific enhancement of feature-selective activity was not confined to the attended stimulus location but instead spread to other locations in the visual field, suggesting the concurrent involvement of a global feature-based attentional mechanism. These results suggest that attention can be remarkably selective in its ability to enhance particular task-relevant features and further reveal that increases in overall BOLD amplitude are not necessarily accompanied by improved processing of stimulus information.
Trakoolwilaiwan, Thanawin; Behboodi, Bahareh; Lee, Jaeseok; Kim, Kyungsoo; Choi, Ji-Woong
2018-01-01
The aim of this work is to develop an effective brain-computer interface (BCI) method based on functional near-infrared spectroscopy (fNIRS). In order to improve the performance of the BCI system in terms of accuracy, the ability to discriminate features from input signals and proper classification are desired. Previous studies have mainly extracted features from the signal manually, but proper features need to be selected carefully. To avoid performance degradation caused by manual feature selection, we applied convolutional neural networks (CNNs) as the automatic feature extractor and classifier for fNIRS-based BCI. In this study, the hemodynamic responses evoked by performing rest, right-, and left-hand motor execution tasks were measured on eight healthy subjects to compare performances. Our CNN-based method provided improvements in classification accuracy over conventional methods employing the most commonly used features of mean, peak, slope, variance, kurtosis, and skewness, classified by support vector machine (SVM) and artificial neural network (ANN). Specifically, up to 6.49% and 3.33% improvement in classification accuracy was achieved by CNN compared with SVM and ANN, respectively.
Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update.
Gao, Changxin; Shi, Huizhang; Yu, Jin-Gang; Sang, Nong
2016-04-15
Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the "good" models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm.
Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update
Gao, Changxin; Shi, Huizhang; Yu, Jin-Gang; Sang, Nong
2016-01-01
Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the “good” models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm. PMID:27092505
NASA Astrophysics Data System (ADS)
Wan, Yi
2011-06-01
Chinese wines can be classification or graded by the micrographs. Micrographs of Chinese wines show floccules, stick and granule of variant shape and size. Different wines have variant microstructure and micrographs, we study the classification of Chinese wines based on the micrographs. Shape and structure of wines' particles in microstructure is the most important feature for recognition and classification of wines. So we introduce a feature extraction method which can describe the structure and region shape of micrograph efficiently. First, the micrographs are enhanced using total variation denoising, and segmented using a modified Otsu's method based on the Rayleigh Distribution. Then features are extracted using proposed method in the paper based on area, perimeter and traditional shape feature. Eight kinds total 26 features are selected. Finally, Chinese wine classification system based on micrograph using combination of shape and structure features and BP neural network have been presented. We compare the recognition results for different choices of features (traditional shape features or proposed features). The experimental results show that the better classification rate have been achieved using the combinational features proposed in this paper.
Fast object detection algorithm based on HOG and CNN
NASA Astrophysics Data System (ADS)
Lu, Tongwei; Wang, Dandan; Zhang, Yanduo
2018-04-01
In the field of computer vision, object classification and object detection are widely used in many fields. The traditional object detection have two main problems:one is that sliding window of the regional selection strategy is high time complexity and have window redundancy. And the other one is that Robustness of the feature is not well. In order to solve those problems, Regional Proposal Network (RPN) is used to select candidate regions instead of selective search algorithm. Compared with traditional algorithms and selective search algorithms, RPN has higher efficiency and accuracy. We combine HOG feature and convolution neural network (CNN) to extract features. And we use SVM to classify. For TorontoNet, our algorithm's mAP is 1.6 percentage points higher. For OxfordNet, our algorithm's mAP is 1.3 percentage higher.
NASA Astrophysics Data System (ADS)
Yavari, Somayeh; Valadan Zoej, Mohammad Javad; Salehi, Bahram
2018-05-01
The procedure of selecting an optimum number and best distribution of ground control information is important in order to reach accurate and robust registration results. This paper proposes a new general procedure based on Genetic Algorithm (GA) which is applicable for all kinds of features (point, line, and areal features). However, linear features due to their unique characteristics are of interest in this investigation. This method is called Optimum number of Well-Distributed ground control Information Selection (OWDIS) procedure. Using this method, a population of binary chromosomes is randomly initialized. The ones indicate the presence of a pair of conjugate lines as a GCL and zeros specify the absence. The chromosome length is considered equal to the number of all conjugate lines. For each chromosome, the unknown parameters of a proper mathematical model can be calculated using the selected GCLs (ones in each chromosome). Then, a limited number of Check Points (CPs) are used to evaluate the Root Mean Square Error (RMSE) of each chromosome as its fitness value. The procedure continues until reaching a stopping criterion. The number and position of ones in the best chromosome indicate the selected GCLs among all conjugate lines. To evaluate the proposed method, a GeoEye and an Ikonos Images are used over different areas of Iran. Comparing the obtained results by the proposed method in a traditional RFM with conventional methods that use all conjugate lines as GCLs shows five times the accuracy improvement (pixel level accuracy) as well as the strength of the proposed method. To prevent an over-parametrization error in a traditional RFM due to the selection of a high number of improper correlated terms, an optimized line-based RFM is also proposed. The results show the superiority of the combination of the proposed OWDIS method with an optimized line-based RFM in terms of increasing the accuracy to better than 0.7 pixel, reliability, and reducing systematic errors. These results also demonstrate the high potential of linear features as reliable control features to reach sub-pixel accuracy in registration applications.
Voxel classification based airway tree segmentation
NASA Astrophysics Data System (ADS)
Lo, Pechin; de Bruijne, Marleen
2008-03-01
This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
Ma, Xin; Guo, Jing; Sun, Xiao
2016-01-01
DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.
Visual attention: The past 25 years
Carrasco, Marisa
2012-01-01
This review focuses on covert attention and how it alters early vision. I explain why attention is considered a selective process, the constructs of covert attention, spatial endogenous and exogenous attention, and feature-based attention. I explain how in the last 25 years research on attention has characterized the effects of covert attention on spatial filters and how attention influences the selection of stimuli of interest. This review includes the effects of spatial attention on discriminability and appearance in tasks mediated by contrast sensitivity and spatial resolution; the effects of feature-based attention on basic visual processes, and a comparison of the effects of spatial and feature-based attention. The emphasis of this review is on psychophysical studies, but relevant electrophysiological and neuroimaging studies and models regarding how and where neuronal responses are modulated are also discussed. PMID:21549742
Visual attention: the past 25 years.
Carrasco, Marisa
2011-07-01
This review focuses on covert attention and how it alters early vision. I explain why attention is considered a selective process, the constructs of covert attention, spatial endogenous and exogenous attention, and feature-based attention. I explain how in the last 25 years research on attention has characterized the effects of covert attention on spatial filters and how attention influences the selection of stimuli of interest. This review includes the effects of spatial attention on discriminability and appearance in tasks mediated by contrast sensitivity and spatial resolution; the effects of feature-based attention on basic visual processes, and a comparison of the effects of spatial and feature-based attention. The emphasis of this review is on psychophysical studies, but relevant electrophysiological and neuroimaging studies and models regarding how and where neuronal responses are modulated are also discussed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Alvarez, George A; Gill, Jonathan; Cavanagh, Patrick
2012-01-01
Previous studies have shown independent attentional selection of targets in the left and right visual hemifields during attentional tracking (Alvarez & Cavanagh, 2005) but not during a visual search (Luck, Hillyard, Mangun, & Gazzaniga, 1989). Here we tested whether multifocal spatial attention is the critical process that operates independently in the two hemifields. It is explicitly required in tracking (attend to a subset of object locations, suppress the others) but not in the standard visual search task (where all items are potential targets). We used a modified visual search task in which observers searched for a target within a subset of display items, where the subset was selected based on location (Experiments 1 and 3A) or based on a salient feature difference (Experiments 2 and 3B). The results show hemifield independence in this subset visual search task with location-based selection but not with feature-based selection; this effect cannot be explained by general difficulty (Experiment 4). Combined, these findings suggest that hemifield independence is a signature of multifocal spatial attention and highlight the need for cognitive and neural theories of attention to account for anatomical constraints on selection mechanisms. PMID:22637710
2015-01-01
Background Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets. Results and discussion Our RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large ‘omics’ datasets are increasingly being used in the area of rheumatology. Conclusions Feature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery. PMID:25923811
NASA Astrophysics Data System (ADS)
Attallah, Bilal; Serir, Amina; Chahir, Youssef; Boudjelal, Abdelwahhab
2017-11-01
Palmprint recognition systems are dependent on feature extraction. A method of feature extraction using higher discrimination information was developed to characterize palmprint images. In this method, two individual feature extraction techniques are applied to a discrete wavelet transform of a palmprint image, and their outputs are fused. The two techniques used in the fusion are the histogram of gradient and the binarized statistical image features. They are then evaluated using an extreme learning machine classifier before selecting a feature based on principal component analysis. Three palmprint databases, the Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database, Hong Kong PolyU Palmprint Database II, and the Delhi Touchless (IIDT) Palmprint Database, are used in this study. The study shows that our method effectively identifies and verifies palmprints and outperforms other methods based on feature extraction.
Analysis of A Drug Target-based Classification System using Molecular Descriptors.
Lu, Jing; Zhang, Pin; Bi, Yi; Luo, Xiaomin
2016-01-01
Drug-target interaction is an important topic in drug discovery and drug repositioning. KEGG database offers a drug annotation and classification using a target-based classification system. In this study, we gave an investigation on five target-based classes: (I) G protein-coupled receptors; (II) Nuclear receptors; (III) Ion channels; (IV) Enzymes; (V) Pathogens, using molecular descriptors to represent each drug compound. Two popular feature selection methods, maximum relevance minimum redundancy and incremental feature selection, were adopted to extract the important descriptors. Meanwhile, an optimal prediction model based on nearest neighbor algorithm was constructed, which got the best result in identifying drug target-based classes. Finally, some key descriptors were discussed to uncover their important roles in the identification of drug-target classes.
Sui, Jing; Adali, Tülay; Pearlson, Godfrey D.; Calhoun, Vince D.
2013-01-01
Extraction of relevant features from multitask functional MRI (fMRI) data in order to identify potential biomarkers for disease, is an attractive goal. In this paper, we introduce a novel feature-based framework, which is sensitive and accurate in detecting group differences (e.g. controls vs. patients) by proposing three key ideas. First, we integrate two goal-directed techniques: coefficient-constrained independent component analysis (CC-ICA) and principal component analysis with reference (PCA-R), both of which improve sensitivity to group differences. Secondly, an automated artifact-removal method is developed for selecting components of interest derived from CC-ICA, with an average accuracy of 91%. Finally, we propose a strategy for optimal feature/component selection, aiming to identify optimal group-discriminative brain networks as well as the tasks within which these circuits are engaged. The group-discriminating performance is evaluated on 15 fMRI feature combinations (5 single features and 10 joint features) collected from 28 healthy control subjects and 25 schizophrenia patients. Results show that a feature from a sensorimotor task and a joint feature from a Sternberg working memory (probe) task and an auditory oddball (target) task are the top two feature combinations distinguishing groups. We identified three optimal features that best separate patients from controls, including brain networks consisting of temporal lobe, default mode and occipital lobe circuits, which when grouped together provide improved capability in classifying group membership. The proposed framework provides a general approach for selecting optimal brain networks which may serve as potential biomarkers of several brain diseases and thus has wide applicability in the neuroimaging research community. PMID:19457398
Wavelet-based energy features for glaucomatous image classification.
Dua, Sumeet; Acharya, U Rajendra; Chowriappa, Pradeep; Sree, S Vinitha
2012-01-01
Texture features within images are actively pursued for accurate and efficient glaucoma classification. Energy distribution over wavelet subbands is applied to find these important texture features. In this paper, we investigate the discriminatory potential of wavelet features obtained from the daubechies (db3), symlets (sym3), and biorthogonal (bio3.3, bio3.5, and bio3.7) wavelet filters. We propose a novel technique to extract energy signatures obtained using 2-D discrete wavelet transform, and subject these signatures to different feature ranking and feature selection strategies. We have gauged the effectiveness of the resultant ranked and selected subsets of features using a support vector machine, sequential minimal optimization, random forest, and naïve Bayes classification strategies. We observed an accuracy of around 93% using tenfold cross validations to demonstrate the effectiveness of these methods.
Facial expression reconstruction on the basis of selected vertices of triangle mesh
NASA Astrophysics Data System (ADS)
Peszor, Damian; Wojciechowska, Marzena
2016-06-01
Facial expression reconstruction is an important issue in the field of computer graphics. While it is relatively easy to create an animation based on meshes constructed through video recordings, this kind of high-quality data is often not transferred to another model because of lack of intermediary, anthropometry-based way to do so. However, if a high-quality mesh is sampled with sufficient density, it is possible to use obtained feature points to encode the shape of surrounding vertices in a way that can be easily transferred to another mesh with corresponding feature points. In this paper we present a method used for obtaining information for the purpose of reconstructing changes in facial surface on the basis of selected feature points.
Kruskal-Wallis-based computationally efficient feature selection for face recognition.
Ali Khan, Sajid; Hussain, Ayyaz; Basit, Abdul; Akram, Sheeraz
2014-01-01
Face recognition in today's technological world, and face recognition applications attain much more importance. Most of the existing work used frontal face images to classify face image. However these techniques fail when applied on real world face images. The proposed technique effectively extracts the prominent facial features. Most of the features are redundant and do not contribute to representing face. In order to eliminate those redundant features, computationally efficient algorithm is used to select the more discriminative face features. Extracted features are then passed to classification step. In the classification step, different classifiers are ensemble to enhance the recognition accuracy rate as single classifier is unable to achieve the high accuracy. Experiments are performed on standard face database images and results are compared with existing techniques.
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Fukunaga-Koontz transform based dimensionality reduction for hyperspectral imagery
NASA Astrophysics Data System (ADS)
Ochilov, S.; Alam, M. S.; Bal, A.
2006-05-01
Fukunaga-Koontz Transform based technique offers some attractive properties for desired class oriented dimensionality reduction in hyperspectral imagery. In FKT, feature selection is performed by transforming into a new space where feature classes have complimentary eigenvectors. Dimensionality reduction technique based on these complimentary eigenvector analysis can be described under two classes, desired class and background clutter, such that each basis function best represent one class while carrying the least amount of information from the second class. By selecting a few eigenvectors which are most relevant to desired class, one can reduce the dimension of hyperspectral cube. Since the FKT based technique reduces data size, it provides significant advantages for near real time detection applications in hyperspectral imagery. Furthermore, the eigenvector selection approach significantly reduces computation burden via the dimensionality reduction processes. The performance of the proposed dimensionality reduction algorithm has been tested using real-world hyperspectral dataset.
Magnetic field feature extraction and selection for indoor location estimation.
Galván-Tejada, Carlos E; García-Vázquez, Juan Pablo; Brena, Ramon F
2014-06-20
User indoor positioning has been under constant improvement especially with the availability of new sensors integrated into the modern mobile devices, which allows us to exploit not only infrastructures made for everyday use, such as WiFi, but also natural infrastructure, as is the case of natural magnetic field. In this paper we present an extension and improvement of our current indoor localization model based on the feature extraction of 46 magnetic field signal features. The extension adds a feature selection phase to our methodology, which is performed through Genetic Algorithm (GA) with the aim of optimizing the fitness of our current model. In addition, we present an evaluation of the final model in two different scenarios: home and office building. The results indicate that performing a feature selection process allows us to reduce the number of signal features of the model from 46 to 5 regardless the scenario and room location distribution. Further, we verified that reducing the number of features increases the probability of our estimator correctly detecting the user's location (sensitivity) and its capacity to detect false positives (specificity) in both scenarios.
Cortical processing of dynamic sound envelope transitions.
Zhou, Yi; Wang, Xiaoqin
2010-12-08
Slow envelope fluctuations in the range of 2-20 Hz provide important segmental cues for processing communication sounds. For a successful segmentation, a neural processor must capture envelope features associated with the rise and fall of signal energy, a process that is often challenged by the interference of background noise. This study investigated the neural representations of slowly varying envelopes in quiet and in background noise in the primary auditory cortex (A1) of awake marmoset monkeys. We characterized envelope features based on the local average and rate of change of sound level in envelope waveforms and identified envelope features to which neurons were selective by reverse correlation. Our results showed that envelope feature selectivity of A1 neurons was correlated with the degree of nonmonotonicity in their static rate-level functions. Nonmonotonic neurons exhibited greater feature selectivity than monotonic neurons in quiet and in background noise. The diverse envelope feature selectivity decreased spike-timing correlation among A1 neurons in response to the same envelope waveforms. As a result, the variability, but not the average, of the ensemble responses of A1 neurons represented more faithfully the dynamic transitions in low-frequency sound envelopes both in quiet and in background noise.
Zheng, Lu-Lu; Niu, Shen; Hao, Pei; Feng, KaiYan; Cai, Yu-Dong; Li, Yixue
2011-01-01
Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations. PMID:22174779
Adversarial Feature Selection Against Evasion Attacks.
Zhang, Fei; Chan, Patrick P K; Biggio, Battista; Yeung, Daniel S; Roli, Fabio
2016-03-01
Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion, and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algorithms to counter evasion attempts, only few authors have considered the impact of using reduced feature sets on classifier security against the same attacks. An interesting, preliminary result is that classifier security to evasion may be even worsened by the application of feature selection. In this paper, we provide a more detailed investigation of this aspect, shedding some light on the security properties of feature selection against evasion attacks. Inspired by previous work on adversary-aware classifiers, we propose a novel adversary-aware feature selection model that can improve classifier security against evasion attacks, by incorporating specific assumptions on the adversary's data manipulation strategy. We focus on an efficient, wrapper-based implementation of our approach, and experimentally validate its soundness on different application examples, including spam and malware detection.
Ocular tracking responses to background motion gated by feature-based attention.
Souto, David; Kerzel, Dirk
2014-09-01
Involuntary ocular tracking responses to background motion offer a window on the dynamics of motion computations. In contrast to spatial attention, we know little about the role of feature-based attention in determining this ocular response. To probe feature-based effects of background motion on involuntary eye movements, we presented human observers with a balanced background perturbation. Two clouds of dots moved in opposite vertical directions while observers tracked a target moving in horizontal direction. Additionally, they had to discriminate a change in the direction of motion (±10° from vertical) of one of the clouds. A vertical ocular following response occurred in response to the motion of the attended cloud. When motion selection was based on motion direction and color of the dots, the peak velocity of the tracking response was 30% of the tracking response elicited in a single task with only one direction of background motion. In two other experiments, we tested the effect of the perturbation when motion selection was based on color, by having motion direction vary unpredictably, or on motion direction alone. Although the gain of pursuit in the horizontal direction was significantly reduced in all experiments, indicating a trade-off between perceptual and oculomotor tasks, ocular responses to perturbations were only observed when selection was based on both motion direction and color. It appears that selection by motion direction can only be effective for driving ocular tracking when the relevant elements can be segregated before motion onset. Copyright © 2014 the American Physiological Society.
Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina
2016-01-01
The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions. PMID:26861308
Development of a fuzzy logic expert system for pile selection. Master's thesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ulshafer, M.L.
1989-01-01
This thesis documents the development of prototype expert system for pile selection for use on microcomputers. It concerns the initial selection of a pile foundation taking into account the parameters such as soil condition, pile length, loading scenario, material availability, contractor experience, and noise or vibration constraints. The prototype expert system called Pile Selection, version 1 (PS1) was developed using an expert system shell FLOPS. FLOPS is a shell based on the AI language OPS5 with many unique features. The system PS1 utilizes all of these unique features. Among the features used are approximate reasoning with fuzzy set theory, themore » blackboard architecture, and the emulated parallel processing of fuzzy production rules. A comprehensive review of the parameters used in selecting a pile was made, and the effects of the uncertainties associated with the vagueness of these parameters was examined in detail. Fuzzy set theory was utilized to deal with such uncertainties and provides the basis for developing a method for determining the best possible choice of piles for a given situation. Details of the development of PS1, including documenting and collating pile information for use in the expert knowledge data bases, are discussed.« less
Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei
2014-09-01
In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Combined rule extraction and feature elimination in supervised classification.
Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E
2012-09-01
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
On equivalent parameter learning in simplified feature space based on Bayesian asymptotic analysis.
Yamazaki, Keisuke
2012-07-01
Parametric models for sequential data, such as hidden Markov models, stochastic context-free grammars, and linear dynamical systems, are widely used in time-series analysis and structural data analysis. Computation of the likelihood function is one of primary considerations in many learning methods. Iterative calculation of the likelihood such as the model selection is still time-consuming though there are effective algorithms based on dynamic programming. The present paper studies parameter learning in a simplified feature space to reduce the computational cost. Simplifying data is a common technique seen in feature selection and dimension reduction though an oversimplified space causes adverse learning results. Therefore, we mathematically investigate a condition of the feature map to have an asymptotically equivalent convergence point of estimated parameters, referred to as the vicarious map. As a demonstration to find vicarious maps, we consider the feature space, which limits the length of data, and derive a necessary length for parameter learning in hidden Markov models. Copyright © 2012 Elsevier Ltd. All rights reserved.
Iliyasu, Abdullah M; Fatichah, Chastine
2017-12-19
A quantum hybrid (QH) intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO) method with the intuitionistic rationality of traditional fuzzy k -nearest neighbours (Fuzzy k -NN) algorithm (known simply as the Q-Fuzzy approach) is proposed for efficient feature selection and classification of cells in cervical smeared (CS) images. From an initial multitude of 17 features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e., global best particles) that represent a pruned down collection of seven features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: the All-features (i.e., classification without prior feature selection) and another hybrid technique combining the standard PSO algorithm with the Fuzzy k -NN technique (P-Fuzzy approach). In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e., QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in the experimental scenarios 1 and 3. The synergy between the QPSO and Fuzzy k -NN in the proposed Q-Fuzzy approach improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.
Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection
Kim, Sungho; Song, Woo-Jin; Kim, So-Hyun
2016-01-01
Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR) images or infrared (IR) images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT) and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter) and an asymmetric morphological closing filter (AMCF, post-filter) into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC)-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic database generated by OKTAL-SE. PMID:27447635
Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection.
Kim, Sungho; Song, Woo-Jin; Kim, So-Hyun
2016-07-19
Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR) images or infrared (IR) images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT) and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter) and an asymmetric morphological closing filter (AMCF, post-filter) into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC)-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic database generated by OKTAL-SE.
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.
Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens
2016-01-01
MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
Database Selection: One Size Does Not Fit All.
ERIC Educational Resources Information Center
Allison, DeeAnn; McNeil, Beth; Swanson, Signe
2000-01-01
Describes a strategy for selecting a delivery method for electronic resources based on experiences at the University of Nebraska-Lincoln. Considers local conditions, pricing, feature options, hardware costs, and network availability and presents a model for evaluating the decision based on dollar requirements and local issues. (Author/LRW)
Sex determination based on a thoracic vertebra and ribs evaluation using clinical chest radiography.
Tsubaki, Shun; Morishita, Junji; Usumoto, Yosuke; Sakaguchi, Kyoko; Matsunobu, Yusuke; Kawazoe, Yusuke; Okumura, Miki; Ikeda, Noriaki
2017-07-01
Our aim was to investigate whether sex can be determined from a combination of geometric features obtained from the 10th thoracic vertebra, 6th rib, and 7th rib. Six hundred chest radiographs (300 males and 300 females) were randomly selected to include patients of six age groups (20s, 30s, 40s, 50s, 60s, and 70s). Each group included 100 images (50 males and 50 females). A total of 14 features, including 7 lengths, 5 indices for the vertebra, and 2 types of widths for ribs, were utilized and analyzed for sex determination. Dominant features contributing to sex determination were selected by stepwise discriminant analysis after checking the variance inflation factors for multicollinearity. The accuracy of sex determination using a combination of the vertebra and ribs was evaluated from the selected features by the stepwise discriminant analysis. The accuracies in each age group were also evaluated in this study. The accuracy of sex determination based on a combination of features of the vertebra and ribs was 88.8% (533/600). This performance was superior to that of the vertebra or ribs only. Moreover, sex determination of subjects in their 20s demonstrated the highest accuracy (96.0%, 96/100). The features selected in the stepwise discriminant analysis included some features in both the vertebra and ribs. These results indicate the usefulness of combined information obtained from the vertebra and ribs for sex determination. We conclude that a combination of geometric characteristics obtained from the vertebra and ribs could be useful for determining sex. Copyright © 2017 Elsevier B.V. All rights reserved.
Marafino, Ben J; Boscardin, W John; Dudley, R Adams
2015-04-01
Sparsity is often a desirable property of statistical models, and various feature selection methods exist so as to yield sparser and interpretable models. However, their application to biomedical text classification, particularly to mortality risk stratification among intensive care unit (ICU) patients, has not been thoroughly studied. To develop and characterize sparse classifiers based on the free text of nursing notes in order to predict ICU mortality risk and to discover text features most strongly associated with mortality. We selected nursing notes from the first 24h of ICU admission for 25,826 adult ICU patients from the MIMIC-II database. We then developed a pair of stochastic gradient descent-based classifiers with elastic-net regularization. We also studied the performance-sparsity tradeoffs of both classifiers as their regularization parameters were varied. The best-performing classifier achieved a 10-fold cross-validated AUC of 0.897 under the log loss function and full L2 regularization, while full L1 regularization used just 0.00025% of candidate input features and resulted in an AUC of 0.889. Using the log loss (range of AUCs 0.889-0.897) yielded better performance compared to the hinge loss (0.850-0.876), but the latter yielded even sparser models. Most features selected by both classifiers appear clinically relevant and correspond to predictors already present in existing ICU mortality models. The sparser classifiers were also able to discover a number of informative - albeit nonclinical - features. The elastic-net-regularized classifiers perform reasonably well and are capable of reducing the number of features required by over a thousandfold, with only a modest impact on performance. Copyright © 2015 Elsevier Inc. All rights reserved.
Naive Bayes Bearing Fault Diagnosis Based on Enhanced Independence of Data
Zhang, Nannan; Wu, Lifeng; Yang, Jing; Guan, Yong
2018-01-01
The bearing is the key component of rotating machinery, and its performance directly determines the reliability and safety of the system. Data-based bearing fault diagnosis has become a research hotspot. Naive Bayes (NB), which is based on independent presumption, is widely used in fault diagnosis. However, the bearing data are not completely independent, which reduces the performance of NB algorithms. In order to solve this problem, we propose a NB bearing fault diagnosis method based on enhanced independence of data. The method deals with data vector from two aspects: the attribute feature and the sample dimension. After processing, the classification limitation of NB is reduced by the independence hypothesis. First, we extract the statistical characteristics of the original signal of the bearings effectively. Then, the Decision Tree algorithm is used to select the important features of the time domain signal, and the low correlation features is selected. Next, the Selective Support Vector Machine (SSVM) is used to prune the dimension data and remove redundant vectors. Finally, we use NB to diagnose the fault with the low correlation data. The experimental results show that the independent enhancement of data is effective for bearing fault diagnosis. PMID:29401730
Jing, Luyang; Wang, Taiyong; Zhao, Ming; Wang, Peng
2017-01-01
A fault diagnosis approach based on multi-sensor data fusion is a promising tool to deal with complicated damage detection problems of mechanical systems. Nevertheless, this approach suffers from two challenges, which are (1) the feature extraction from various types of sensory data and (2) the selection of a suitable fusion level. It is usually difficult to choose an optimal feature or fusion level for a specific fault diagnosis task, and extensive domain expertise and human labor are also highly required during these selections. To address these two challenges, we propose an adaptive multi-sensor data fusion method based on deep convolutional neural networks (DCNN) for fault diagnosis. The proposed method can learn features from raw data and optimize a combination of different fusion levels adaptively to satisfy the requirements of any fault diagnosis task. The proposed method is tested through a planetary gearbox test rig. Handcraft features, manual-selected fusion levels, single sensory data, and two traditional intelligent models, back-propagation neural networks (BPNN) and a support vector machine (SVM), are used as comparisons in the experiment. The results demonstrate that the proposed method is able to detect the conditions of the planetary gearbox effectively with the best diagnosis accuracy among all comparative methods in the experiment. PMID:28230767
Rough-Fuzzy Clustering and Unsupervised Feature Selection for Wavelet Based MR Image Segmentation
Maji, Pradipta; Roy, Shaswati
2015-01-01
Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR) images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices. PMID:25848961
NASA Astrophysics Data System (ADS)
Sebatubun, M. M.; Haryawan, C.; Windarta, B.
2018-03-01
Lung cancer causes a high mortality rate in the world than any other cancers. That can be minimised if the symptoms and cancer cells have been detected early. One of the techniques used to detect lung cancer is by computed tomography (CT) scan. CT scan images have been used in this study to identify one of the lesion characteristics named ground glass opacity (GGO). It has been used to determine the level of malignancy of the lesion. There were three phases in identifying GGO: image cropping, feature extraction using grey level co-occurrence matrices (GLCM) and classification using Naïve Bayes Classifier. In order to improve the classification results, the most significant feature was sought by feature selection using gain ratio evaluation. Based on the results obtained, the most significant features could be identified by using feature selection method used in this research. The accuracy rate increased from 83.33% to 91.67%, the sensitivity from 82.35% to 94.11% and the specificity from 84.21% to 89.47%.
Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An
2015-01-01
There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.
An object-based visual attention model for robotic applications.
Yu, Yuanlong; Mann, George K I; Gosine, Raymond G
2010-10-01
By extending integrated competition hypothesis, this paper presents an object-based visual attention model, which selects one object of interest using low-dimensional features, resulting that visual perception starts from a fast attentional selection procedure. The proposed attention model involves seven modules: learning of object representations stored in a long-term memory (LTM), preattentive processing, top-down biasing, bottom-up competition, mediation between top-down and bottom-up ways, generation of saliency maps, and perceptual completion processing. It works in two phases: learning phase and attending phase. In the learning phase, the corresponding object representation is trained statistically when one object is attended. A dual-coding object representation consisting of local and global codings is proposed. Intensity, color, and orientation features are used to build the local coding, and a contour feature is employed to constitute the global coding. In the attending phase, the model preattentively segments the visual field into discrete proto-objects using Gestalt rules at first. If a task-specific object is given, the model recalls the corresponding representation from LTM and deduces the task-relevant feature(s) to evaluate top-down biases. The mediation between automatic bottom-up competition and conscious top-down biasing is then performed to yield a location-based saliency map. By combination of location-based saliency within each proto-object, the proto-object-based saliency is evaluated. The most salient proto-object is selected for attention, and it is finally put into the perceptual completion processing module to yield a complete object region. This model has been applied into distinct tasks of robots: detection of task-specific stationary and moving objects. Experimental results under different conditions are shown to validate this model.
Zyout, Imad; Czajkowska, Joanna; Grzegorzek, Marcin
2015-12-01
The high number of false positives and the resulting number of avoidable breast biopsies are the major problems faced by current mammography Computer Aided Detection (CAD) systems. False positive reduction is not only a requirement for mass but also for calcification CAD systems which are currently deployed for clinical use. This paper tackles two problems related to reducing the number of false positives in the detection of all lesions and masses, respectively. Firstly, textural patterns of breast tissue have been analyzed using several multi-scale textural descriptors based on wavelet and gray level co-occurrence matrix. The second problem addressed in this paper is the parameter selection and performance optimization. For this, we adopt a model selection procedure based on Particle Swarm Optimization (PSO) for selecting the most discriminative textural features and for strengthening the generalization capacity of the supervised learning stage based on a Support Vector Machine (SVM) classifier. For evaluating the proposed methods, two sets of suspicious mammogram regions have been used. The first one, obtained from Digital Database for Screening Mammography (DDSM), contains 1494 regions (1000 normal and 494 abnormal samples). The second set of suspicious regions was obtained from database of Mammographic Image Analysis Society (mini-MIAS) and contains 315 (207 normal and 108 abnormal) samples. Results from both datasets demonstrate the efficiency of using PSO based model selection for optimizing both classifier hyper-parameters and parameters, respectively. Furthermore, the obtained results indicate the promising performance of the proposed textural features and more specifically, those based on co-occurrence matrix of wavelet image representation technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Targeted Feature Detection for Data-Dependent Shotgun Proteomics
2017-01-01
Label-free quantification of shotgun LC–MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification (“FFId”), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between “internal” and “external” (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the “uncertain” feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS (www.openms.org). PMID:28673088
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.
Weisser, Hendrik; Choudhary, Jyoti S
2017-08-04
Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).
Evolutionary optimization of radial basis function classifiers for data mining applications.
Buchtala, Oliver; Klimek, Manuel; Sick, Bernhard
2005-10-01
In many data mining applications that address classification problems, feature and model selection are considered as key tasks. That is, appropriate input features of the classifier must be selected from a given (and often large) set of possible features and structure parameters of the classifier must be adapted with respect to these features and a given data set. This paper describes an evolutionary algorithm (EA) that performs feature and model selection simultaneously for radial basis function (RBF) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the EA significantly: hybrid training of RBF networks, lazy evaluation, consideration of soft constraints by means of penalty terms, and temperature-based adaptive control of the EA. The feasibility and the benefits of the approach are demonstrated by means of four data mining problems: intrusion detection in computer networks, biometric signature verification, customer acquisition with direct marketing methods, and optimization of chemical production processes. It is shown that, compared to earlier EA-based RBF optimization techniques, the runtime is reduced by up to 99% while error rates are lowered by up to 86%, depending on the application. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms.
Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar
2016-04-01
Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.
Wang, Yun; Huang, Fangzhou
2018-01-01
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible. PMID:29666661
Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou
2018-01-01
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.
Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming
2014-12-01
Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Combining Feature Extraction Methods to Assist the Diagnosis of Alzheimer's Disease.
Segovia, F; Górriz, J M; Ramírez, J; Phillips, C
2016-01-01
Neuroimaging data as (18)F-FDG PET is widely used to assist the diagnosis of Alzheimer's disease (AD). Looking for regions with hypoperfusion/ hypometabolism, clinicians may predict or corroborate the diagnosis of the patients. Modern computer aided diagnosis (CAD) systems based on the statistical analysis of whole neuroimages are more accurate than classical systems based on quantifying the uptake of some predefined regions of interests (ROIs). In addition, these new systems allow determining new ROIs and take advantage of the huge amount of information comprised in neuroimaging data. A major branch of modern CAD systems for AD is based on multivariate techniques, which analyse a neuroimage as a whole, considering not only the voxel intensities but also the relations among them. In order to deal with the vast dimensionality of the data, a number of feature extraction methods have been successfully applied. In this work, we propose a CAD system based on the combination of several feature extraction techniques. First, some commonly used feature extraction methods based on the analysis of the variance (as principal component analysis), on the factorization of the data (as non-negative matrix factorization) and on classical magnitudes (as Haralick features) were simultaneously applied to the original data. These feature sets were then combined by means of two different combination approaches: i) using a single classifier and a multiple kernel learning approach and ii) using an ensemble of classifier and selecting the final decision by majority voting. The proposed approach was evaluated using a labelled neuroimaging database along with a cross validation scheme. As conclusion, the proposed CAD system performed better than approaches using only one feature extraction technique. We also provide a fair comparison (using the same database) of the selected feature extraction methods.
ERIC Educational Resources Information Center
Klemm, E. Barbara; Iding, Marie K.; Crosby, Martha E.
This study addresses the need to develop research-based criteria for science teacher educators to use in preparing teachers to critically evaluate and select web-based resources for their students' use. The study focuses on the cognitive load imposed on the learner for tasks required in using text, illustrations, and other features of multi-…
Computer aided diagnosis based on medical image processing and artificial intelligence methods
NASA Astrophysics Data System (ADS)
Stoitsis, John; Valavanis, Ioannis; Mougiakakou, Stavroula G.; Golemati, Spyretta; Nikita, Alexandra; Nikita, Konstantina S.
2006-12-01
Advances in imaging technology and computer science have greatly enhanced interpretation of medical images, and contributed to early diagnosis. The typical architecture of a Computer Aided Diagnosis (CAD) system includes image pre-processing, definition of region(s) of interest, features extraction and selection, and classification. In this paper, the principles of CAD systems design and development are demonstrated by means of two examples. The first one focuses on the differentiation between symptomatic and asymptomatic carotid atheromatous plaques. For each plaque, a vector of texture and motion features was estimated, which was then reduced to the most robust ones by means of ANalysis of VAriance (ANOVA). Using fuzzy c-means, the features were then clustered into two classes. Clustering performances of 74%, 79%, and 84% were achieved for texture only, motion only, and combinations of texture and motion features, respectively. The second CAD system presented in this paper supports the diagnosis of focal liver lesions and is able to characterize liver tissue from Computed Tomography (CT) images as normal, hepatic cyst, hemangioma, and hepatocellular carcinoma. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of neural network classifiers. The achieved classification performance was 100%, 93.75% and 90.63% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.
Featural and temporal attention selectively enhance task-appropriate representations in human V1
Warren, Scott; Yacoub, Essa; Ghose, Geoffrey
2015-01-01
Our perceptions are often shaped by focusing our attention toward specific features or periods of time irrespective of location. We explore the physiological bases of these non-spatial forms of attention by imaging brain activity while subjects perform a challenging change detection task. The task employs a continuously varying visual stimulus that, for any moment in time, selectively activates functionally distinct subpopulations of primary visual cortex (V1) neurons. When subjects are cued to the timing and nature of the change, the mapping of orientation preference across V1 was systematically shifts toward the cued stimulus just prior to its appearance. A simple linear model can explain this shift: attentional changes are selectively targeted toward neural subpopulations representing the attended feature at the times the feature was anticipated. Our results suggest that featural attention is mediated by a linear change in the responses of task-appropriate neurons across cortex during appropriate periods of time. PMID:25501983
SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.
Xu, Wenxuan; Zhang, Li; Lu, Yaping
2016-06-01
The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.
Ozcift, Akin
2012-08-01
Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.
NASA Astrophysics Data System (ADS)
Giana, Fabián Eduardo; Bonetto, Fabián José; Bellotti, Mariela Inés
2018-03-01
In this work we present an assay to discriminate between normal and cancerous cells. The method is based on the measurement of electrical impedance spectra of in vitro cell cultures. We developed a protocol consisting on four consecutive measurement phases, each of them designed to obtain different information about the cell cultures. Through the analysis of the measured data, 26 characteristic features were obtained for both cell types. From the complete set of features, we selected the most relevant in terms of their discriminant capacity by means of conventional statistical tests. A linear discriminant analysis was then carried out on the selected features, allowing the classification of the samples in normal or cancerous with 4.5% of false positives and no false negatives.
Giana, Fabián Eduardo; Bonetto, Fabián José; Bellotti, Mariela Inés
2018-03-01
In this work we present an assay to discriminate between normal and cancerous cells. The method is based on the measurement of electrical impedance spectra of in vitro cell cultures. We developed a protocol consisting on four consecutive measurement phases, each of them designed to obtain different information about the cell cultures. Through the analysis of the measured data, 26 characteristic features were obtained for both cell types. From the complete set of features, we selected the most relevant in terms of their discriminant capacity by means of conventional statistical tests. A linear discriminant analysis was then carried out on the selected features, allowing the classification of the samples in normal or cancerous with 4.5% of false positives and no false negatives.
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, R; Aguilera, T; Shultz, D
2014-06-15
Purpose: This study aims to develop predictive models of patient outcome by extracting advanced imaging features (i.e., Radiomics) from FDG-PET images. Methods: We acquired pre-treatment PET scans for 51 stage I NSCLC patients treated with SABR. We calculated 139 quantitative features from each patient PET image, including 5 morphological features, 8 statistical features, 27 texture features, and 100 features from the intensity-volume histogram. Based on the imaging features, we aim to distinguish between 2 risk groups of patients: those with regional failure or distant metastasis versus those without. We investigated 3 pattern classification algorithms: linear discriminant analysis (LDA), naive Bayesmore » (NB), and logistic regression (LR). To avoid the curse of dimensionality, we performed feature selection by first removing redundant features and then applying sequential forward selection using the wrapper approach. To evaluate the predictive performance, we performed 10-fold cross validation with 1000 random splits of the data and calculated the area under the ROC curve (AUC). Results: Feature selection identified 2 texture features (homogeneity and/or wavelet decompositions) for NB and LR, while for LDA SUVmax and one texture feature (correlation) were identified. All 3 classifiers achieved statistically significant improvements over conventional PET imaging metrics such as tumor volume (AUC = 0.668) and SUVmax (AUC = 0.737). Overall, NB achieved the best predictive performance (AUC = 0.806). This also compares favorably with MTV using the best threshold at an SUV of 11.6 (AUC = 0.746). At a sensitivity of 80%, NB achieved 69% specificity, while SUVmax and tumor volume only had 36% and 47% specificity. Conclusion: Through a systematic analysis of advanced PET imaging features, we are able to build models with improved predictive value over conventional imaging metrics. If validated in a large independent cohort, the proposed techniques could potentially aid in identifying patients who might benefit from adjuvant therapy.« less
Park, Sang-Hoon; Lee, David; Lee, Sang-Goog
2018-02-01
For the last few years, many feature extraction methods have been proposed based on biological signals. Among these, the brain signals have the advantage that they can be obtained, even by people with peripheral nervous system damage. Motor imagery electroencephalograms (EEG) are inexpensive to measure, offer a high temporal resolution, and are intuitive. Therefore, these have received a significant amount of attention in various fields, including signal processing, cognitive science, and medicine. The common spatial pattern (CSP) algorithm is a useful method for feature extraction from motor imagery EEG. However, performance degradation occurs in a small-sample setting (SSS), because the CSP depends on sample-based covariance. Since the active frequency range is different for each subject, it is also inconvenient to set the frequency range to be different every time. In this paper, we propose the feature extraction method based on a filter bank to solve these problems. The proposed method consists of five steps. First, motor imagery EEG is divided by a using filter bank. Second, the regularized CSP (R-CSP) is applied to the divided EEG. Third, we select the features according to mutual information based on the individual feature algorithm. Fourth, parameter sets are selected for the ensemble. Finally, we classify using ensemble based on features. The brain-computer interface competition III data set IVa is used to evaluate the performance of the proposed method. The proposed method improves the mean classification accuracy by 12.34%, 11.57%, 9%, 4.95%, and 4.47% compared with CSP, SR-CSP, R-CSP, filter bank CSP (FBCSP), and SR-FBCSP. Compared with the filter bank R-CSP ( , ), which is a parameter selection version of the proposed method, the classification accuracy is improved by 3.49%. In particular, the proposed method shows a large improvement in performance in the SSS.
An ensemble method for extracting adverse drug events from social media.
Liu, Jing; Zhao, Songzheng; Zhang, Xiaodi
2016-06-01
Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media. We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter). When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines. Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness. Copyright © 2016 Elsevier B.V. All rights reserved.
Prediction of lysine ubiquitylation with ensemble classifier and feature selection.
Zhao, Xiaowei; Li, Xiangtao; Ma, Zhiqiang; Yin, Minghao
2011-01-01
Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.
Pan, Xiaoyong; Hu, Xiaohua; Zhang, Yu Hang; Feng, Kaiyan; Wang, Shao Peng; Chen, Lei; Huang, Tao; Cai, Yu Dong
2018-04-12
Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew's correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles.
NASA Astrophysics Data System (ADS)
Tan, Maxine; Leader, Joseph K.; Liu, Hong; Zheng, Bin
2015-03-01
We recently investigated a new mammographic image feature based risk factor to predict near-term breast cancer risk after a woman has a negative mammographic screening. We hypothesized that unlike the conventional epidemiology-based long-term (or lifetime) risk factors, the mammographic image feature based risk factor value will increase as the time lag between the negative and positive mammography screening decreases. The purpose of this study is to test this hypothesis. From a large and diverse full-field digital mammography (FFDM) image database with 1278 cases, we collected all available sequential FFDM examinations for each case including the "current" and 1 to 3 most recently "prior" examinations. All "prior" examinations were interpreted negative, and "current" ones were either malignant or recalled negative/benign. We computed 92 global mammographic texture and density based features, and included three clinical risk factors (woman's age, family history and subjective breast density BIRADS ratings). On this initial feature set, we applied a fast and accurate Sequential Forward Floating Selection (SFFS) feature selection algorithm to reduce feature dimensionality. The features computed on both mammographic views were individually/ separately trained using two artificial neural network (ANN) classifiers. The classification scores of the two ANNs were then merged with a sequential ANN. The results show that the maximum adjusted odds ratios were 5.59, 7.98, and 15.77 for using the 3rd, 2nd, and 1st "prior" FFDM examinations, respectively, which demonstrates a higher association of mammographic image feature change and an increasing risk trend of developing breast cancer in the near-term after a negative screening.
2010-01-01
Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480
Recursive feature selection with significant variables of support vectors.
Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh
2012-01-01
The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.
Exploration of complex visual feature spaces for object perception
Leeds, Daniel D.; Pyles, John A.; Tarr, Michael J.
2014-01-01
The mid- and high-level visual properties supporting object perception in the ventral visual pathway are poorly understood. In the absence of well-specified theory, many groups have adopted a data-driven approach in which they progressively interrogate neural units to establish each unit's selectivity. Such methods are challenging in that they require search through a wide space of feature models and stimuli using a limited number of samples. To more rapidly identify higher-level features underlying human cortical object perception, we implemented a novel functional magnetic resonance imaging method in which visual stimuli are selected in real-time based on BOLD responses to recently shown stimuli. This work was inspired by earlier primate physiology work, in which neural selectivity for mid-level features in IT was characterized using a simple parametric approach (Hung et al., 2012). To extend such work to human neuroimaging, we used natural and synthetic object stimuli embedded in feature spaces constructed on the basis of the complex visual properties of the objects themselves. During fMRI scanning, we employed a real-time search method to control continuous stimulus selection within each image space. This search was designed to maximize neural responses across a pre-determined 1 cm3 brain region within ventral cortex. To assess the value of this method for understanding object encoding, we examined both the behavior of the method itself and the complex visual properties the method identified as reliably activating selected brain regions. We observed: (1) Regions selective for both holistic and component object features and for a variety of surface properties; (2) Object stimulus pairs near one another in feature space that produce responses at the opposite extremes of the measured activity range. Together, these results suggest that real-time fMRI methods may yield more widely informative measures of selectivity within the broad classes of visual features associated with cortical object representation. PMID:25309408
Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou
2016-01-01
The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646
PCA based feature reduction to improve the accuracy of decision tree c4.5 classification
NASA Astrophysics Data System (ADS)
Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.
2018-03-01
Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
A novel image registration approach via combining local features and geometric invariants
Lu, Yan; Gao, Kun; Zhang, Tinghua; Xu, Tingfa
2018-01-01
Image registration is widely used in many fields, but the adaptability of the existing methods is limited. This work proposes a novel image registration method with high precision for various complex applications. In this framework, the registration problem is divided into two stages. First, we detect and describe scale-invariant feature points using modified computer vision-oriented fast and rotated brief (ORB) algorithm, and a simple method to increase the performance of feature points matching is proposed. Second, we develop a new local constraint of rough selection according to the feature distances. Evidence shows that the existing matching techniques based on image features are insufficient for the images with sparse image details. Then, we propose a novel matching algorithm via geometric constraints, and establish local feature descriptions based on geometric invariances for the selected feature points. Subsequently, a new price function is constructed to evaluate the similarities between points and obtain exact matching pairs. Finally, we employ the progressive sample consensus method to remove wrong matches and calculate the space transform parameters. Experimental results on various complex image datasets verify that the proposed method is more robust and significantly reduces the rate of false matches while retaining more high-quality feature points. PMID:29293595
Feature Selection for Chemical Sensor Arrays Using Mutual Information
Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.
2014-01-01
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058
Object-based attention underlies the rehearsal of feature binding in visual working memory.
Shen, Mowei; Huang, Xiang; Gao, Zaifeng
2015-04-01
Feature binding is a core concept in many research fields, including the study of working memory (WM). Over the past decade, it has been debated whether keeping the feature binding in visual WM consumes more visual attention than the constituent single features. Previous studies have only explored the contribution of domain-general attention or space-based attention in the binding process; no study so far has explored the role of object-based attention in retaining binding in visual WM. We hypothesized that object-based attention underlay the mechanism of rehearsing feature binding in visual WM. Therefore, during the maintenance phase of a visual WM task, we inserted a secondary mental rotation (Experiments 1-3), transparent motion (Experiment 4), or an object-based feature report task (Experiment 5) to consume the object-based attention available for binding. In line with the prediction of the object-based attention hypothesis, Experiments 1-5 revealed a more significant impairment for binding than for constituent single features. However, this selective binding impairment was not observed when inserting a space-based visual search task (Experiment 6). We conclude that object-based attention underlies the rehearsal of binding representation in visual WM. (c) 2015 APA, all rights reserved.
SU-F-R-14: PET Based Radiomics to Predict Outcomes in Patients with Hodgkin Lymphoma
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, J; Aristophanous, M; Akhtari, M
Purpose: To identify PET-based radiomics features associated with high refractory/relapsed disease risk for Hodgkin lymphoma patients. Methods: A total of 251 Hodgkin lymphoma patients including 19 primary refractory and 9 relapsed patients were investigated. All patients underwent an initial pre-treatment diagnostic FDG PET/CT scan. All cancerous lymph node regions (ROIs) were delineated by an experienced physician based on thresholding each volume of disease in the anatomical regions to SUV>2.5. We extracted 122 image features and evaluated the effect of ROI selection (the largest ROI, the ROI with highest mean SUV, merged ROI, and a single anatomic region [e.g. mediastinum]) onmore » classification accuracy. Random forest was used as a classifier and ROC analysis was used to assess the relationship between selected features and patient’s outcome status. Results: Each patient had between 1 and 9 separate ROIs, with much intra-patient variability in PET features. The best model, which used features from a single anatomic region (the mediastinal ROI, only volumes>5cc: 169 patients with 12 primary refractory) had a classification accuracy of 80.5% for primary refractory disease. The top five features, based on Gini index, consist of shape features (max 3D-diameter and volume) and texture features (correlation and information measure of correlation1&2). In the ROC analysis, sensitivity and specificity of the best model were 0.92 and 0.80, respectively. The area under the ROC (AUC) and the accuracy were 0.86 and 0.86, respectively. The classification accuracy was less than 60% for other ROI models or when ROIs less than 5cc were included. Conclusion: This study showed that PET-based radiomics features from the mediastinal lymph region are associated with primary refractory disease and therefore may play an important role in predicting outcomes in Hodgkin lymphoma patients. These features could be additive beyond baseline tumor and clinical characteristics, and may warrant more aggressive treatment.« less
Zhang, Yu; Zhou, Guoxu; Jin, Jing; Wang, Xingyu; Cichocki, Andrzej
2015-11-30
Common spatial pattern (CSP) has been most popularly applied to motor-imagery (MI) feature extraction for classification in brain-computer interface (BCI) application. Successful application of CSP depends on the filter band selection to a large degree. However, the most proper band is typically subject-specific and can hardly be determined manually. This study proposes a sparse filter band common spatial pattern (SFBCSP) for optimizing the spatial patterns. SFBCSP estimates CSP features on multiple signals that are filtered from raw EEG data at a set of overlapping bands. The filter bands that result in significant CSP features are then selected in a supervised way by exploiting sparse regression. A support vector machine (SVM) is implemented on the selected features for MI classification. Two public EEG datasets (BCI Competition III dataset IVa and BCI Competition IV IIb) are used to validate the proposed SFBCSP method. Experimental results demonstrate that SFBCSP help improve the classification performance of MI. The optimized spatial patterns by SFBCSP give overall better MI classification accuracy in comparison with several competing methods. The proposed SFBCSP is a potential method for improving the performance of MI-based BCI. Copyright © 2015 Elsevier B.V. All rights reserved.
Nicolay, Amélie; Tilley, T Don
2018-05-31
Metal-metal cooperation is integral to the function of many enzymes and materials, and model complexes hold enormous potential for providing insights into the capabilities of analogous multimetallic cores. However, the selective synthesis of heterobimetallic complexes still presents a significant challenge, especially for systems that hold the metals in close proximity and feature open or reactive coordination sites for both metals. To address this issue, a rigid, naphthyridine-based dinucleating ligand featuring distinct binding environments was synthesized. This ligand enables the selective synthesis of a series of MIICuI bimetallic complexes (M = Mn, Fe, Co, Ni, Cu, Zn), in which each metal center exclusively occupies its preferred binding pocket, from simple chloride salts. The precision of this selectivity is evident from cyclic voltammetry, ESI-MS and anomalous X-ray diffraction measurements. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Randhawa, Vinay; Kumar Singh, Anil; Acharya, Vishal
2015-12-01
Systems-biology inspired identification of drug targets and machine learning-based screening of small molecules which modulate their activity have the potential to revolutionize modern drug discovery by complementing conventional methods. To utilize the effectiveness of such pipelines, we first analyzed the dysregulated gene pairs between control and tumor samples and then implemented an ensemble-based feature selection approach to prioritize targets in oral squamous cell carcinoma (OSCC) for therapeutic exploration. Based on the structural information of known inhibitors of CXCR4-one of the best targets identified in this study-a feature selection was implemented for the identification of optimal structural features (molecular descriptor) based on which a classification model was generated. Furthermore, the CXCR4-centered descriptor-based classification model was finally utilized to screen a repository of plant derived small-molecules to obtain potential inhibitors. The application of our methodology may assist effective selection of the best targets which may have previously been overlooked, that in turn will lead to the development of new oral cancer medications. The small molecules identified in this study can be ideal candidates for trials as potential novel anti-oral cancer agents. Importantly, distinct steps of this whole study may provide reference for the analysis of other complex human diseases.
Graña, M; Termenon, M; Savio, A; Gonzalez-Pinto, A; Echeveste, J; Pérez, J M; Besga, A
2011-09-20
The aim of this paper is to obtain discriminant features from two scalar measures of Diffusion Tensor Imaging (DTI) data, Fractional Anisotropy (FA) and Mean Diffusivity (MD), and to train and test classifiers able to discriminate Alzheimer's Disease (AD) patients from controls on the basis of features extracted from the FA or MD volumes. In this study, support vector machine (SVM) classifier was trained and tested on FA and MD data. Feature selection is done computing the Pearson's correlation between FA or MD values at voxel site across subjects and the indicative variable specifying the subject class. Voxel sites with high absolute correlation are selected for feature extraction. Results are obtained over an on-going study in Hospital de Santiago Apostol collecting anatomical T1-weighted MRI volumes and DTI data from healthy control subjects and AD patients. FA features and a linear SVM classifier achieve perfect accuracy, sensitivity and specificity in several cross-validation studies, supporting the usefulness of DTI-derived features as an image-marker for AD and to the feasibility of building Computer Aided Diagnosis systems for AD based on them. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.
Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina
2015-01-01
MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.
Guo, Xinyu; Dominick, Kelli C; Minai, Ali A; Li, Hailong; Erickson, Craig A; Lu, Long J
2017-01-01
The whole-brain functional connectivity (FC) pattern obtained from resting-state functional magnetic resonance imaging data are commonly applied to study neuropsychiatric conditions such as autism spectrum disorder (ASD) by using different machine learning models. Recent studies indicate that both hyper- and hypo- aberrant ASD-associated FCs were widely distributed throughout the entire brain rather than only in some specific brain regions. Deep neural networks (DNN) with multiple hidden layers have shown the ability to systematically extract lower-to-higher level information from high dimensional data across a series of neural hidden layers, significantly improving classification accuracy for such data. In this study, a DNN with a novel feature selection method (DNN-FS) is developed for the high dimensional whole-brain resting-state FC pattern classification of ASD patients vs. typical development (TD) controls. The feature selection method is able to help the DNN generate low dimensional high-quality representations of the whole-brain FC patterns by selecting features with high discriminating power from multiple trained sparse auto-encoders. For the comparison, a DNN without the feature selection method (DNN-woFS) is developed, and both of them are tested with different architectures (i.e., with different numbers of hidden layers/nodes). Results show that the best classification accuracy of 86.36% is generated by the DNN-FS approach with 3 hidden layers and 150 hidden nodes (3/150). Remarkably, DNN-FS outperforms DNN-woFS for all architectures studied. The most significant accuracy improvement was 9.09% with the 3/150 architecture. The method also outperforms other feature selection methods, e.g., two sample t -test and elastic net. In addition to improving the classification accuracy, a Fisher's score-based biomarker identification method based on the DNN is also developed, and used to identify 32 FCs related to ASD. These FCs come from or cross different pre-defined brain networks including the default-mode, cingulo-opercular, frontal-parietal, and cerebellum. Thirteen of them are statically significant between ASD and TD groups (two sample t -test p < 0.05) while 19 of them are not. The relationship between the statically significant FCs and the corresponding ASD behavior symptoms is discussed based on the literature and clinician's expert knowledge. Meanwhile, the potential reason of obtaining 19 FCs which are not statistically significant is also provided.
Feature-selective attention enhances color signals in early visual areas of the human brain.
Müller, M M; Andersen, S; Trujillo, N J; Valdés-Sosa, P; Malinowski, P; Hillyard, S A
2006-09-19
We used an electrophysiological measure of selective stimulus processing (the steady-state visual evoked potential, SSVEP) to investigate feature-specific attention to color cues. Subjects viewed a display consisting of spatially intermingled red and blue dots that continually shifted their positions at random. The red and blue dots flickered at different frequencies and thereby elicited distinguishable SSVEP signals in the visual cortex. Paying attention selectively to either the red or blue dot population produced an enhanced amplitude of its frequency-tagged SSVEP, which was localized by source modeling to early levels of the visual cortex. A control experiment showed that this selection was based on color rather than flicker frequency cues. This signal amplification of attended color items provides an empirical basis for the rapid identification of feature conjunctions during visual search, as proposed by "guided search" models.
Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method
NASA Astrophysics Data System (ADS)
Shamsoddini, A.; Aboodi, M. R.; Karami, J.
2017-09-01
Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L
2017-06-28
Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.
High-order graph matching based feature selection for Alzheimer's disease identification.
Liu, Feng; Suk, Heung-Il; Wee, Chong-Yaw; Chen, Huafu; Shen, Dinggang
2013-01-01
One of the main limitations of l1-norm feature selection is that it focuses on estimating the target vector for each sample individually without considering relations with other samples. However, it's believed that the geometrical relation among target vectors in the training set may provide useful information, and it would be natural to expect that the predicted vectors have similar geometric relations as the target vectors. To overcome these limitations, we formulate this as a graph-matching feature selection problem between a predicted graph and a target graph. In the predicted graph a node is represented by predicted vector that may describe regional gray matter volume or cortical thickness features, and in the target graph a node is represented by target vector that include class label and clinical scores. In particular, we devise new regularization terms in sparse representation to impose high-order graph matching between the target vectors and the predicted ones. Finally, the selected regional gray matter volume and cortical thickness features are fused in kernel space for classification. Using the ADNI dataset, we evaluate the effectiveness of the proposed method and obtain the accuracies of 92.17% and 81.57% in AD and MCI classification, respectively.
Mutual information-based facial expression recognition
NASA Astrophysics Data System (ADS)
Hazar, Mliki; Hammami, Mohamed; Hanêne, Ben-Abdallah
2013-12-01
This paper introduces a novel low-computation discriminative regions representation for expression analysis task. The proposed approach relies on interesting studies in psychology which show that most of the descriptive and responsible regions for facial expression are located around some face parts. The contributions of this work lie in the proposition of new approach which supports automatic facial expression recognition based on automatic regions selection. The regions selection step aims to select the descriptive regions responsible or facial expression and was performed using Mutual Information (MI) technique. For facial feature extraction, we have applied Local Binary Patterns Pattern (LBP) on Gradient image to encode salient micro-patterns of facial expressions. Experimental studies have shown that using discriminative regions provide better results than using the whole face regions whilst reducing features vector dimension.
A wavelet-based technique to predict treatment outcome for Major Depressive Disorder.
Mumtaz, Wajid; Xia, Likun; Mohd Yasin, Mohd Azhar; Azhar Ali, Syed Saad; Malik, Aamir Saeed
2017-01-01
Treatment management for Major Depressive Disorder (MDD) has been challenging. However, electroencephalogram (EEG)-based predictions of antidepressant's treatment outcome may help during antidepressant's selection and ultimately improve the quality of life for MDD patients. In this study, a machine learning (ML) method involving pretreatment EEG data was proposed to perform such predictions for Selective Serotonin Reuptake Inhibitor (SSRIs). For this purpose, the acquisition of experimental data involved 34 MDD patients and 30 healthy controls. Consequently, a feature matrix was constructed involving time-frequency decomposition of EEG data based on wavelet transform (WT) analysis, termed as EEG data matrix. However, the resultant EEG data matrix had high dimensionality. Therefore, dimension reduction was performed based on a rank-based feature selection method according to a criterion, i.e., receiver operating characteristic (ROC). As a result, the most significant features were identified and further be utilized during the training and testing of a classification model, i.e., the logistic regression (LR) classifier. Finally, the LR model was validated with 100 iterations of 10-fold cross-validation (10-CV). The classification results were compared with short-time Fourier transform (STFT) analysis, and empirical mode decompositions (EMD). The wavelet features extracted from frontal and temporal EEG data were found statistically significant. In comparison with other time-frequency approaches such as the STFT and EMD, the WT analysis has shown highest classification accuracy, i.e., accuracy = 87.5%, sensitivity = 95%, and specificity = 80%. In conclusion, significant wavelet coefficients extracted from frontal and temporal pre-treatment EEG data involving delta and theta frequency bands may predict antidepressant's treatment outcome for the MDD patients.
Dense mesh sampling for video-based facial animation
NASA Astrophysics Data System (ADS)
Peszor, Damian; Wojciechowska, Marzena
2016-06-01
The paper describes an approach for selection of feature points on three-dimensional, triangle mesh obtained using various techniques from several video footages. This approach has a dual purpose. First, it allows to minimize the data stored for the purpose of facial animation, so that instead of storing position of each vertex in each frame, one could store only a small subset of vertices for each frame and calculate positions of others based on the subset. Second purpose is to select feature points that could be used for anthropometry-based retargeting of recorded mimicry to another model, with sampling density beyond that which can be achieved using marker-based performance capture techniques. Developed approach was successfully tested on artificial models, models constructed using structured light scanner, and models constructed from video footages using stereophotogrammetry.
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing
2016-01-01
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651
Iris recognition based on key image feature extraction.
Ren, X; Tian, Q; Zhang, J; Wu, S; Zeng, Y
2008-01-01
In iris recognition, feature extraction can be influenced by factors such as illumination and contrast, and thus the features extracted may be unreliable, which can cause a high rate of false results in iris pattern recognition. In order to obtain stable features, an algorithm was proposed in this paper to extract key features of a pattern from multiple images. The proposed algorithm built an iris feature template by extracting key features and performed iris identity enrolment. Simulation results showed that the selected key features have high recognition accuracy on the CASIA Iris Set, where both contrast and illumination variance exist.
Simulation of millimeter-wave body images and its application to biometric recognition
NASA Astrophysics Data System (ADS)
Moreno-Moreno, Miriam; Fierrez, Julian; Vera-Rodriguez, Ruben; Parron, Josep
2012-06-01
One of the emerging applications of the millimeter-wave imaging technology is its use in biometric recognition. This is mainly due to some properties of the millimeter-waves such as their ability to penetrate through clothing and other occlusions, their low obtrusiveness when collecting the image and the fact that they are harmless to health. In this work we first describe the generation of a database comprising 1200 synthetic images at 94 GHz obtained from the body of 50 people. Then we extract a small set of distance-based features from each image and select the best feature subsets for person recognition using the SFFS feature selection algorithm. Finally these features are used in body geometry authentication obtaining promising results.
Zeier, Joshua D; Newman, Joseph P
2013-08-01
As predicted by the response modulation model, psychopathic offenders are insensitive to potentially important inhibitory information when it is peripheral to their primary focus of attention. To date, the clearest tests of this hypothesis have manipulated spatial attention to cue the location of goal-relevant versus inhibitory information. However, the theory predicts a more general abnormality in selective attention. In the current study, male prisoners performed a conflict-monitoring task, which included a feature-based manipulation (i.e., color) that biased selective attention toward goal-relevant stimuli and away from inhibitory distracters on some trials but not others. Paralleling results for spatial cuing, feature-based cuing resulted in less distracter interference, particularly for participants with primary psychopathy (i.e., low anxiety). This study also investigated the moderating effect of externalizing on psychopathy. Participants high in psychopathy but low in externalizing performed similarly to primary psychopathic individuals. These results demonstrate that the abnormal selective attention associated with primary psychopathy is not limited to spatial attention but, instead, applies to diverse methods for establishing attentional focus. Furthermore, they demonstrate a novel method of investigating psychopathic subtypes using continuous analyses. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Shim, Miseon; Hwang, Han-Jeong; Kim, Do-Won; Lee, Seung-Hwan; Im, Chang-Hwan
2016-10-01
Recently, an increasing number of researchers have endeavored to develop practical tools for diagnosing patients with schizophrenia using machine learning techniques applied to EEG biomarkers. Although a number of studies showed that source-level EEG features can potentially be applied to the differential diagnosis of schizophrenia, most studies have used only sensor-level EEG features such as ERP peak amplitude and power spectrum for machine learning-based diagnosis of schizophrenia. In this study, we used both sensor-level and source-level features extracted from EEG signals recorded during an auditory oddball task for the classification of patients with schizophrenia and healthy controls. EEG signals were recorded from 34 patients with schizophrenia and 34 healthy controls while each subject was asked to attend to oddball tones. Our results demonstrated higher classification accuracy when source-level features were used together with sensor-level features, compared to when only sensor-level features were used. In addition, the selected sensor-level features were mostly found in the frontal area, and the selected source-level features were mostly extracted from the temporal area, which coincide well with the well-known pathological region of cognitive processing in patients with schizophrenia. Our results suggest that our approach would be a promising tool for the computer-aided diagnosis of schizophrenia. Copyright © 2016 Elsevier B.V. All rights reserved.
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767
Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.
Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang
2016-01-01
Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.
NASA Astrophysics Data System (ADS)
de Oliveira Silveira, Eduarda Martiniano; de Menezes, Michele Duarte; Acerbi Júnior, Fausto Weimar; Castro Nunes Santos Terra, Marcela; de Mello, José Márcio
2017-07-01
Accurate mapping and monitoring of savanna and semiarid woodland biomes are needed to support the selection of areas of conservation, to provide sustainable land use, and to improve the understanding of vegetation. The potential of geostatistical features, derived from medium spatial resolution satellite imagery, to characterize contrasted landscape vegetation cover and improve object-based image classification is studied. The study site in Brazil includes cerrado sensu stricto, deciduous forest, and palm swamp vegetation cover. Sentinel 2 and Landsat 8 images were acquired and divided into objects, for each of which a semivariogram was calculated using near-infrared (NIR) and normalized difference vegetation index (NDVI) to extract the set of geostatistical features. The features selected by principal component analysis were used as input data to train a random forest algorithm. Tests were conducted, combining spectral and geostatistical features. Change detection evaluation was performed using a confusion matrix and its accuracies. The semivariogram curves were efficient to characterize spatial heterogeneity, with similar results using NIR and NDVI from Sentinel 2 and Landsat 8. Accuracy was significantly greater when combining geostatistical features with spectral data, suggesting that this method can improve image classification results.
NASA Technical Reports Server (NTRS)
Howard, Ayanna; Bayard, David
2006-01-01
Fuzzy Feature Observation Planner for Small Body Proximity Observations (FuzzObserver) is a developmental computer program, to be used along with other software, for autonomous planning of maneuvers of a spacecraft near an asteroid, comet, or other small astronomical body. Selection of terrain features and estimation of the position of the spacecraft relative to these features is an essential part of such planning. FuzzObserver contributes to the selection and estimation by generating recommendations for spacecraft trajectory adjustments to maintain the spacecraft's ability to observe sufficient terrain features for estimating position. The input to FuzzObserver consists of data from terrain images, including sets of data on features acquired during descent toward, or traversal of, a body of interest. The name of this program reflects its use of fuzzy logic to reason about the terrain features represented by the data and extract corresponding trajectory-adjustment rules. Linguistic fuzzy sets and conditional statements enable fuzzy systems to make decisions based on heuristic rule-based knowledge derived by engineering experts. A major advantage of using fuzzy logic is that it involves simple arithmetic calculations that can be performed rapidly enough to be useful for planning within the short times typically available for spacecraft maneuvers.
Wu, Guorong; Kim, Minjeong; Wang, Qian; Munsell, Brent C.
2015-01-01
Feature selection is a critical step in deformable image registration. In particular, selecting the most discriminative features that accurately and concisely describe complex morphological patterns in image patches improves correspondence detection, which in turn improves image registration accuracy. Furthermore, since more and more imaging modalities are being invented to better identify morphological changes in medical imaging data,, the development of deformable image registration method that scales well to new image modalities or new image applications with little to no human intervention would have a significant impact on the medical image analysis community. To address these concerns, a learning-based image registration framework is proposed that uses deep learning to discover compact and highly discriminative features upon observed imaging data. Specifically, the proposed feature selection method uses a convolutional stacked auto-encoder to identify intrinsic deep feature representations in image patches. Since deep learning is an unsupervised learning method, no ground truth label knowledge is required. This makes the proposed feature selection method more flexible to new imaging modalities since feature representations can be directly learned from the observed imaging data in a very short amount of time. Using the LONI and ADNI imaging datasets, image registration performance was compared to two existing state-of-the-art deformable image registration methods that use handcrafted features. To demonstrate the scalability of the proposed image registration framework image registration experiments were conducted on 7.0-tesla brain MR images. In all experiments, the results showed the new image registration framework consistently demonstrated more accurate registration results when compared to state-of-the-art. PMID:26552069
Wu, Guorong; Kim, Minjeong; Wang, Qian; Munsell, Brent C; Shen, Dinggang
2016-07-01
Feature selection is a critical step in deformable image registration. In particular, selecting the most discriminative features that accurately and concisely describe complex morphological patterns in image patches improves correspondence detection, which in turn improves image registration accuracy. Furthermore, since more and more imaging modalities are being invented to better identify morphological changes in medical imaging data, the development of deformable image registration method that scales well to new image modalities or new image applications with little to no human intervention would have a significant impact on the medical image analysis community. To address these concerns, a learning-based image registration framework is proposed that uses deep learning to discover compact and highly discriminative features upon observed imaging data. Specifically, the proposed feature selection method uses a convolutional stacked autoencoder to identify intrinsic deep feature representations in image patches. Since deep learning is an unsupervised learning method, no ground truth label knowledge is required. This makes the proposed feature selection method more flexible to new imaging modalities since feature representations can be directly learned from the observed imaging data in a very short amount of time. Using the LONI and ADNI imaging datasets, image registration performance was compared to two existing state-of-the-art deformable image registration methods that use handcrafted features. To demonstrate the scalability of the proposed image registration framework, image registration experiments were conducted on 7.0-T brain MR images. In all experiments, the results showed that the new image registration framework consistently demonstrated more accurate registration results when compared to state of the art.
A novel feature ranking method for prediction of cancer stages using proteomics data
Saghapour, Ehsan; Sehhati, Mohammadreza
2017-01-01
Proteomic analysis of cancers' stages has provided new opportunities for the development of novel, highly sensitive diagnostic tools which helps early detection of cancer. This paper introduces a new feature ranking approach called FRMT. FRMT is based on the Technique for Order of Preference by Similarity to Ideal Solution method (TOPSIS) which select the most discriminative proteins from proteomics data for cancer staging. In this approach, outcomes of 10 feature selection techniques were combined by TOPSIS method, to select the final discriminative proteins from seven different proteomic databases of protein expression profiles. In the proposed workflow, feature selection methods and protein expressions have been considered as criteria and alternatives in TOPSIS, respectively. The proposed method is tested on seven various classifier models in a 10-fold cross validation procedure that repeated 30 times on the seven cancer datasets. The obtained results proved the higher stability and superior classification performance of method in comparison with other methods, and it is less sensitive to the applied classifier. Moreover, the final introduced proteins are informative and have the potential for application in the real medical practice. PMID:28934234
Feature Selection for Nonstationary Data: Application to Human Recognition Using Medical Biometrics.
Komeili, Majid; Louis, Wael; Armanfard, Narges; Hatzinakos, Dimitrios
2018-05-01
Electrocardiogram (ECG) and transient evoked otoacoustic emission (TEOAE) are among the physiological signals that have attracted significant interest in biometric community due to their inherent robustness to replay and falsification attacks. However, they are time-dependent signals and this makes them hard to deal with in across-session human recognition scenario where only one session is available for enrollment. This paper presents a novel feature selection method to address this issue. It is based on an auxiliary dataset with multiple sessions where it selects a subset of features that are more persistent across different sessions. It uses local information in terms of sample margins while enforcing an across-session measure. This makes it a perfect fit for aforementioned biometric recognition problem. Comprehensive experiments on ECG and TEOAE variability due to time lapse and body posture are done. Performance of the proposed method is compared against seven state-of-the-art feature selection algorithms as well as another six approaches in the area of ECG and TEOAE biometric recognition. Experimental results demonstrate that the proposed method performs noticeably better than other algorithms.
[Lithology feature extraction of CASI hyperspectral data based on fractal signal algorithm].
Tang, Chao; Chen, Jian-Ping; Cui, Jing; Wen, Bo-Tao
2014-05-01
Hyperspectral data is characterized by combination of image and spectrum and large data volume dimension reduction is the main research direction. Band selection and feature extraction is the primary method used for this objective. In the present article, the authors tested methods applied for the lithology feature extraction from hyperspectral data. Based on the self-similarity of hyperspectral data, the authors explored the application of fractal algorithm to lithology feature extraction from CASI hyperspectral data. The "carpet method" was corrected and then applied to calculate the fractal value of every pixel in the hyperspectral data. The results show that fractal information highlights the exposed bedrock lithology better than the original hyperspectral data The fractal signal and characterized scale are influenced by the spectral curve shape, the initial scale selection and iteration step. At present, research on the fractal signal of spectral curve is rare, implying the necessity of further quantitative analysis and investigation of its physical implications.
A malware detection scheme based on mining format information.
Bai, Jinrong; Wang, Junfeng; Zou, Guozhong
2014-01-01
Malware has become one of the most serious threats to computer information system and the current malware detection technology still has very significant limitations. In this paper, we proposed a malware detection approach by mining format information of PE (portable executable) files. Based on in-depth analysis of the static format information of the PE files, we extracted 197 features from format information of PE files and applied feature selection methods to reduce the dimensionality of the features and achieve acceptable high performance. When the selected features were trained using classification algorithms, the results of our experiments indicate that the accuracy of the top classification algorithm is 99.1% and the value of the AUC is 0.998. We designed three experiments to evaluate the performance of our detection scheme and the ability of detecting unknown and new malware. Although the experimental results of identifying new malware are not perfect, our method is still able to identify 97.6% of new malware with 1.3% false positive rates.
A Malware Detection Scheme Based on Mining Format Information
Bai, Jinrong; Wang, Junfeng; Zou, Guozhong
2014-01-01
Malware has become one of the most serious threats to computer information system and the current malware detection technology still has very significant limitations. In this paper, we proposed a malware detection approach by mining format information of PE (portable executable) files. Based on in-depth analysis of the static format information of the PE files, we extracted 197 features from format information of PE files and applied feature selection methods to reduce the dimensionality of the features and achieve acceptable high performance. When the selected features were trained using classification algorithms, the results of our experiments indicate that the accuracy of the top classification algorithm is 99.1% and the value of the AUC is 0.998. We designed three experiments to evaluate the performance of our detection scheme and the ability of detecting unknown and new malware. Although the experimental results of identifying new malware are not perfect, our method is still able to identify 97.6% of new malware with 1.3% false positive rates. PMID:24991639
NASA Astrophysics Data System (ADS)
Sheikhan, Mansour; Abbasnezhad Arabi, Mahdi; Gharavian, Davood
2015-10-01
Artificial neural networks are efficient models in pattern recognition applications, but their performance is dependent on employing suitable structure and connection weights. This study used a hybrid method for obtaining the optimal weight set and architecture of a recurrent neural emotion classifier based on gravitational search algorithm (GSA) and its binary version (BGSA), respectively. By considering the features of speech signal that were related to prosody, voice quality, and spectrum, a rich feature set was constructed. To select more efficient features, a fast feature selection method was employed. The performance of the proposed hybrid GSA-BGSA method was compared with similar hybrid methods based on particle swarm optimisation (PSO) algorithm and its binary version, PSO and discrete firefly algorithm, and hybrid of error back-propagation and genetic algorithm that were used for optimisation. Experimental tests on Berlin emotional database demonstrated the superior performance of the proposed method using a lighter network structure.
Addeh, Abdoljalil; Khormali, Aminollah; Golilarz, Noorbakhsh Amiri
2018-05-04
The control chart patterns are the most commonly used statistical process control (SPC) tools to monitor process changes. When a control chart produces an out-of-control signal, this means that the process has been changed. In this study, a new method based on optimized radial basis function neural network (RBFNN) is proposed for control chart patterns (CCPs) recognition. The proposed method consists of four main modules: feature extraction, feature selection, classification and learning algorithm. In the feature extraction module, shape and statistical features are used. Recently, various shape and statistical features have been presented for the CCPs recognition. In the feature selection module, the association rules (AR) method has been employed to select the best set of the shape and statistical features. In the classifier section, RBFNN is used and finally, in RBFNN, learning algorithm has a high impact on the network performance. Therefore, a new learning algorithm based on the bees algorithm has been used in the learning module. Most studies have considered only six patterns: Normal, Cyclic, Increasing Trend, Decreasing Trend, Upward Shift and Downward Shift. Since three patterns namely Normal, Stratification, and Systematic are very similar to each other and distinguishing them is very difficult, in most studies Stratification and Systematic have not been considered. Regarding to the continuous monitoring and control over the production process and the exact type detection of the problem encountered during the production process, eight patterns have been investigated in this study. The proposed method is tested on a dataset containing 1600 samples (200 samples from each pattern) and the results showed that the proposed method has a very good performance. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
Liu, Tongtong; Ge, Xifeng; Yu, Jinhua; Guo, Yi; Wang, Yuanyuan; Wang, Wenping; Cui, Ligang
2018-06-21
B-mode ultrasound (B-US) and strain elastography ultrasound (SE-US) images have a potential to distinguish thyroid tumor with different lymph node (LN) status. The purpose of our study is to investigate whether the application of multi-modality images including B-US and SE-US can improve the discriminability of thyroid tumor with LN metastasis based on a radiomics approach. Ultrasound (US) images including B-US and SE-US images of 75 papillary thyroid carcinoma (PTC) cases were retrospectively collected. A radiomics approach was developed in this study to estimate LNs status of PTC patients. The approach included image segmentation, quantitative feature extraction, feature selection and classification. Three feature sets were extracted from B-US, SE-US, and multi-modality containing B-US and SE-US. They were used to evaluate the contribution of different modalities. A total of 684 radiomics features have been extracted in our study. We used sparse representation coefficient-based feature selection method with 10-bootstrap to reduce the dimension of feature sets. Support vector machine with leave-one-out cross-validation was used to build the model for estimating LN status. Using features extracted from both B-US and SE-US, the radiomics-based model produced an area under the receiver operating characteristic curve (AUC) [Formula: see text] 0.90, accuracy (ACC) [Formula: see text] 0.85, sensitivity (SENS) [Formula: see text] 0.77 and specificity (SPEC) [Formula: see text] 0.88, which was better than using features extracted from B-US or SE-US separately. Multi-modality images provided more information in radiomics study. Combining use of B-US and SE-US could improve the LN metastasis estimation accuracy for PTC patients.
Feature-selective attention in healthy old age: a selective decline in selective attention?
Quigley, Cliodhna; Müller, Matthias M
2014-02-12
Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.
Rios, Anthony; Kavuluru, Ramakanth
2013-09-01
Extracting diagnosis codes from medical records is a complex task carried out by trained coders by reading all the documents associated with a patient's visit. With the popularity of electronic medical records (EMRs), computational approaches to code extraction have been proposed in the recent years. Machine learning approaches to multi-label text classification provide an important methodology in this task given each EMR can be associated with multiple codes. In this paper, we study the the role of feature selection, training data selection, and probabilistic threshold optimization in improving different multi-label classification approaches. We conduct experiments based on two different datasets: a recent gold standard dataset used for this task and a second larger and more complex EMR dataset we curated from the University of Kentucky Medical Center. While conventional approaches achieve results comparable to the state-of-the-art on the gold standard dataset, on our complex in-house dataset, we show that feature selection, training data selection, and probabilistic thresholding provide significant gains in performance.
Nasir, Muhammad; Attique Khan, Muhammad; Sharif, Muhammad; Lali, Ikram Ullah; Saba, Tanzila; Iqbal, Tassawar
2018-02-21
Melanoma is the deadliest type of skin cancer with highest mortality rate. However, the annihilation in early stage implies a high survival rate therefore, it demands early diagnosis. The accustomed diagnosis methods are costly and cumbersome due to the involvement of experienced experts as well as the requirements for highly equipped environment. The recent advancements in computerized solutions for these diagnoses are highly promising with improved accuracy and efficiency. In this article, we proposed a method for the classification of melanoma and benign skin lesions. Our approach integrates preprocessing, lesion segmentation, features extraction, features selection, and classification. Preprocessing is executed in the context of hair removal by DullRazor, whereas lesion texture and color information are utilized to enhance the lesion contrast. In lesion segmentation, a hybrid technique has been implemented and results are fused using additive law of probability. Serial based method is applied subsequently that extracts and fuses the traits such as color, texture, and HOG (shape). The fused features are selected afterwards by implementing a novel Boltzman Entropy method. Finally, the selected features are classified by Support Vector Machine. The proposed method is evaluated on publically available data set PH2. Our approach has provided promising results of sensitivity 97.7%, specificity 96.7%, accuracy 97.5%, and F-score 97.5%, which are significantly better than the results of existing methods available on the same data set. The proposed method detects and classifies melanoma significantly good as compared to existing methods. © 2018 Wiley Periodicals, Inc.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-09
... features and swimming pools, hotel and restaurant interiors, town planning/ master planning, green design... applicants will be evaluated on their ability to meet certain conditions and to best satisfy the selection...-one percent U.S. content. Selection Criteria for Participation Selection will be based on the...
Words, Words Everywhere, but Which Ones Do We Teach?
ERIC Educational Resources Information Center
Graves, Michael F.; Baumann, James F.; Blachowicz, Camille L. Z.; Manyak, Patrick; Bates, Ann; Cieply, Char; Davis, Jeni R.; Von Gunten, Heather
2014-01-01
This article highlights the challenging task teachers face in selecting vocabulary to teach. First, we briefly discuss three features of the English lexicon that are crucial to keep in mind when selecting vocabulary for instruction and three approaches that have been suggested. Then, we present a theoretically based approach called Selecting Words…
Hippocampus shape analysis for temporal lobe epilepsy detection in magnetic resonance imaging
NASA Astrophysics Data System (ADS)
Kohan, Zohreh; Azmi, Reza
2016-03-01
There are evidences in the literature that Temporal Lobe Epilepsy (TLE) causes some lateralized atrophy and deformation on hippocampus and other substructures of the brain. Magnetic Resonance Imaging (MRI), due to high-contrast soft tissue imaging, is one of the most popular imaging modalities being used in TLE diagnosis and treatment procedures. Using an algorithm to help clinicians for better and more effective shape deformations analysis could improve the diagnosis and treatment of the disease. In this project our purpose is to design, implement and test a classification algorithm for MRIs based on hippocampal asymmetry detection using shape and size-based features. Our method consisted of two main parts; (1) shape feature extraction, and (2) image classification. We tested 11 different shape and size features and selected four of them that detect the asymmetry in hippocampus significantly in a randomly selected subset of the dataset. Then, we employed a support vector machine (SVM) classifier to classify the remaining images of the dataset to normal and epileptic images using our selected features. The dataset contains 25 patient images in which 12 cases were used as a training set and the rest 13 cases for testing the performance of classifier. We measured accuracy, specificity and sensitivity of, respectively, 76%, 100%, and 70% for our algorithm. The preliminary results show that using shape and size features for detecting hippocampal asymmetry could be helpful in TLE diagnosis in MRI.
Optimized Kernel Entropy Components.
Izquierdo-Verdiguier, Emma; Laparra, Valero; Jenssen, Robert; Gomez-Chova, Luis; Camps-Valls, Gustau
2017-06-01
This brief addresses two main issues of the standard kernel entropy component analysis (KECA) algorithm: the optimization of the kernel decomposition and the optimization of the Gaussian kernel parameter. KECA roughly reduces to a sorting of the importance of kernel eigenvectors by entropy instead of variance, as in the kernel principal components analysis. In this brief, we propose an extension of the KECA method, named optimized KECA (OKECA), that directly extracts the optimal features retaining most of the data entropy by means of compacting the information in very few features (often in just one or two). The proposed method produces features which have higher expressive power. In particular, it is based on the independent component analysis framework, and introduces an extra rotation to the eigen decomposition, which is optimized via gradient-ascent search. This maximum entropy preservation suggests that OKECA features are more efficient than KECA features for density estimation. In addition, a critical issue in both the methods is the selection of the kernel parameter, since it critically affects the resulting performance. Here, we analyze the most common kernel length-scale selection criteria. The results of both the methods are illustrated in different synthetic and real problems. Results show that OKECA returns projections with more expressive power than KECA, the most successful rule for estimating the kernel parameter is based on maximum likelihood, and OKECA is more robust to the selection of the length-scale parameter in kernel density estimation.
Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin
2013-03-01
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Aksu, Yaman; Miller, David J; Kesidis, George; Yang, Qing X
2010-05-01
Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
Noise Reduction in Brainwaves by Using Both EEG Signals and Frontal Viewing Camera Images
Bang, Jae Won; Choi, Jong-Suk; Park, Kang Ryoung
2013-01-01
Electroencephalogram (EEG)-based brain-computer interfaces (BCIs) have been used in various applications, including human–computer interfaces, diagnosis of brain diseases, and measurement of cognitive status. However, EEG signals can be contaminated with noise caused by user's head movements. Therefore, we propose a new method that combines an EEG acquisition device and a frontal viewing camera to isolate and exclude the sections of EEG data containing these noises. This method is novel in the following three ways. First, we compare the accuracies of detecting head movements based on the features of EEG signals in the frequency and time domains and on the motion features of images captured by the frontal viewing camera. Second, the features of EEG signals in the frequency domain and the motion features captured by the frontal viewing camera are selected as optimal ones. The dimension reduction of the features and feature selection are performed using linear discriminant analysis. Third, the combined features are used as inputs to support vector machine (SVM), which improves the accuracy in detecting head movements. The experimental results show that the proposed method can detect head movements with an average error rate of approximately 3.22%, which is smaller than that of other methods. PMID:23669713
Hierarchical content-based image retrieval by dynamic indexing and guided search
NASA Astrophysics Data System (ADS)
You, Jane; Cheung, King H.; Liu, James; Guo, Linong
2003-12-01
This paper presents a new approach to content-based image retrieval by using dynamic indexing and guided search in a hierarchical structure, and extending data mining and data warehousing techniques. The proposed algorithms include: a wavelet-based scheme for multiple image feature extraction, the extension of a conventional data warehouse and an image database to an image data warehouse for dynamic image indexing, an image data schema for hierarchical image representation and dynamic image indexing, a statistically based feature selection scheme to achieve flexible similarity measures, and a feature component code to facilitate query processing and guide the search for the best matching. A series of case studies are reported, which include a wavelet-based image color hierarchy, classification of satellite images, tropical cyclone pattern recognition, and personal identification using multi-level palmprint and face features.
Feature selection from hyperspectral imaging for guava fruit defects detection
NASA Astrophysics Data System (ADS)
Mat Jafri, Mohd. Zubir; Tan, Sou Ching
2017-06-01
Development of technology makes hyperspectral imaging commonly used for defect detection. In this research, a hyperspectral imaging system was setup in lab to target for guava fruits defect detection. Guava fruit was selected as the object as to our knowledge, there is fewer attempts were made for guava defect detection based on hyperspectral imaging. The common fluorescent light source was used to represent the uncontrolled lighting condition in lab and analysis was carried out in a specific wavelength range due to inefficiency of this particular light source. Based on the data, the reflectance intensity of this specific setup could be categorized in two groups. Sequential feature selection with linear discriminant (LD) and quadratic discriminant (QD) function were used to select features that could potentially be used in defects detection. Besides the ordinary training method, training dataset in discriminant was separated in two to cater for the uncontrolled lighting condition. These two parts were separated based on the brighter and dimmer area. Four evaluation matrixes were evaluated which are LD with common training method, QD with common training method, LD with two part training method and QD with two part training method. These evaluation matrixes were evaluated using F1-score with total 48 defected areas. Experiment shown that F1-score of linear discriminant with the compensated method hitting 0.8 score, which is the highest score among all.
Wang, Guohua; Liu, Qiong
2015-01-01
Far-infrared pedestrian detection approaches for advanced driver-assistance systems based on high-dimensional features fail to simultaneously achieve robust and real-time detection. We propose a robust and real-time pedestrian detection system characterized by novel candidate filters, novel pedestrian features and multi-frame approval matching in a coarse-to-fine fashion. Firstly, we design two filters based on the pedestrians’ head and the road to select the candidates after applying a pedestrian segmentation algorithm to reduce false alarms. Secondly, we propose a novel feature encapsulating both the relationship of oriented gradient distribution and the code of oriented gradient to deal with the enormous variance in pedestrians’ size and appearance. Thirdly, we introduce a multi-frame approval matching approach utilizing the spatiotemporal continuity of pedestrians to increase the detection rate. Large-scale experiments indicate that the system works in real time and the accuracy has improved about 9% compared with approaches based on high-dimensional features only. PMID:26703611
Wang, Guohua; Liu, Qiong
2015-12-21
Far-infrared pedestrian detection approaches for advanced driver-assistance systems based on high-dimensional features fail to simultaneously achieve robust and real-time detection. We propose a robust and real-time pedestrian detection system characterized by novel candidate filters, novel pedestrian features and multi-frame approval matching in a coarse-to-fine fashion. Firstly, we design two filters based on the pedestrians' head and the road to select the candidates after applying a pedestrian segmentation algorithm to reduce false alarms. Secondly, we propose a novel feature encapsulating both the relationship of oriented gradient distribution and the code of oriented gradient to deal with the enormous variance in pedestrians' size and appearance. Thirdly, we introduce a multi-frame approval matching approach utilizing the spatiotemporal continuity of pedestrians to increase the detection rate. Large-scale experiments indicate that the system works in real time and the accuracy has improved about 9% compared with approaches based on high-dimensional features only.
Painter, David R; Dux, Paul E; Mattingley, Jason B
2015-09-01
When visual attention is set for a particular target feature, such as color or shape, neural responses to that feature are enhanced across the visual field. This global feature-based enhancement is hypothesized to underlie the contingent attentional capture effect, in which task-irrelevant items with the target feature capture spatial attention. In humans, however, different cortical regions have been implicated in global feature-based enhancement and contingent capture. Here, we applied intermittent theta-burst stimulation (iTBS) to assess the causal roles of two regions of extrastriate cortex - right area MT and the right temporoparietal junction (TPJ) - in both global feature-based enhancement and contingent capture. We recorded cortical activity using EEG while participants monitored centrally for targets defined by color and ignored peripheral checkerboards that matched the distractor or target color. In central vision, targets were preceded by colored cues designed to capture attention. Stimuli flickered at unique frequencies, evoking distinct cortical oscillations. Analyses of these oscillations and behavioral performance revealed contingent capture in central vision and global feature-based enhancement in the periphery. Stimulation of right area MT selectively increased global feature-based enhancement, but did not influence contingent attentional capture. By contrast, stimulation of the right TPJ left both processes unaffected. Our results reveal a causal role for the right area MT in feature-based attention, and suggest that global feature-based enhancement does not underlie the contingent capture effect. Copyright © 2015 Elsevier Inc. All rights reserved.
A recurrent neural model for proto-object based contour integration and figure-ground segregation.
Hu, Brian; Niebur, Ernst
2017-12-01
Visual processing of objects makes use of both feedforward and feedback streams of information. However, the nature of feedback signals is largely unknown, as is the identity of the neuronal populations in lower visual areas that receive them. Here, we develop a recurrent neural model to address these questions in the context of contour integration and figure-ground segregation. A key feature of our model is the use of grouping neurons whose activity represents tentative objects ("proto-objects") based on the integration of local feature information. Grouping neurons receive input from an organized set of local feature neurons, and project modulatory feedback to those same neurons. Additionally, inhibition at both the local feature level and the object representation level biases the interpretation of the visual scene in agreement with principles from Gestalt psychology. Our model explains several sets of neurophysiological results (Zhou et al. Journal of Neuroscience, 20(17), 6594-6611 2000; Qiu et al. Nature Neuroscience, 10(11), 1492-1499 2007; Chen et al. Neuron, 82(3), 682-694 2014), and makes testable predictions about the influence of neuronal feedback and attentional selection on neural responses across different visual areas. Our model also provides a framework for understanding how object-based attention is able to select both objects and the features associated with them.
Yugandhar, K; Gromiha, M Michael
2014-09-01
Protein-protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein-protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein-protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein-protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein-protein interaction networks and human-pathogen interactions based on the strength of interactions. © 2014 Wiley Periodicals, Inc.
Using environmental heterogeneity to plan for sea-level rise.
Hunter, Elizabeth A; Nibbelink, Nathan P
2017-12-01
Environmental heterogeneity is increasingly being used to select conservation areas that will provide for future biodiversity under a variety of climate scenarios. This approach, termed conserving nature's stage (CNS), assumes environmental features respond to climate change more slowly than biological communities, but will CNS be effective if the stage were to change as rapidly as the climate? We tested the effectiveness of using CNS to select sites in salt marshes for conservation in coastal Georgia (U.S.A.), where environmental features will change rapidly as sea level rises. We calculated species diversity based on distributions of 7 bird species with a variety of niches in Georgia salt marshes. Environmental heterogeneity was assessed across six landscape gradients (e.g., elevation, salinity, and patch area). We used 2 approaches to select sites with high environmental heterogeneity: site complementarity (environmental diversity [ED]) and local environmental heterogeneity (environmental richness [ER]). Sites selected based on ER predicted present-day species diversity better than randomly selected sites (up to an 8.1% improvement), were resilient to areal loss from SLR (1.0% average areal loss by 2050 compared with 0.9% loss of randomly selected sites), and provided habitat to a threatened species (0.63 average occupancy compared with 0.6 average occupancy of randomly selected sites). Sites selected based on ED predicted species diversity no better or worse than random and were not resilient to SLR (2.9% average areal loss by 2050). Despite the discrepancy between the 2 approaches, CNS is a viable strategy for conservation site selection in salt marshes because the ER approach was successful. It has potential for application in other coastal areas where SLR will affect environmental features, but its performance may depend on the magnitude of geological changes caused by SLR. Our results indicate that conservation planners that had heretofore excluded low-lying coasts from CNS planning could include coastal ecosystems in regional conservation strategies. © 2017 Society for Conservation Biology.
Larson, Eric; Lee, Adrian K C
2014-01-01
Switching attention between different stimuli of interest based on particular task demands is important in many everyday settings. In audition in particular, switching attention between different speakers of interest that are talking concurrently is often necessary for effective communication. Recently, it has been shown by multiple studies that auditory selective attention suppresses the representation of unwanted streams in auditory cortical areas in favor of the target stream of interest. However, the neural processing that guides this selective attention process is not well understood. Here we investigated the cortical mechanisms involved in switching attention based on two different types of auditory features. By combining magneto- and electro-encephalography (M-EEG) with an anatomical MRI constraint, we examined the cortical dynamics involved in switching auditory attention based on either spatial or pitch features. We designed a paradigm where listeners were cued in the beginning of each trial to switch or maintain attention halfway through the presentation of concurrent target and masker streams. By allowing listeners time to switch during a gap in the continuous target and masker stimuli, we were able to isolate the mechanisms involved in endogenous, top-down attention switching. Our results show a double dissociation between the involvement of right temporoparietal junction (RTPJ) and the left inferior parietal supramarginal part (LIPSP) in tasks requiring listeners to switch attention based on space and pitch features, respectively, suggesting that switching attention based on these features involves at least partially separate processes or behavioral strategies. © 2013 Elsevier Inc. All rights reserved.
Design and Synthesis of Mannich bases as Benzimidazole Derivatives as Analgesic Agents.
Datar, Prasanna A; Limaye, Saleel A
2015-01-01
Mannich bases were selected for 2D QSAR study to derive meaningful relationship between the structural features and analgesic activity. Using the knowledge of important features a novel series was designed to obtain improved analgesic activity. A series of novel Mannich bases 1-(N-substituted amino)methyl]-2-substituted benzimidazole derivatives were synthesized and were screened for analgesic activity. Some of these compounds showed promising analgesic activity when compared with the standard drug diclofenac sodium.
EHR based Genetic Testing Knowledge Base (iGTKB) Development
2015-01-01
Background The gap between a large growing number of genetic tests and a suboptimal clinical workflow of incorporating these tests into regular clinical practice poses barriers to effective reliance on advanced genetic technologies to improve quality of healthcare. A promising solution to fill this gap is to develop an intelligent genetic test recommendation system that not only can provide a comprehensive view of genetic tests as education resources, but also can recommend the most appropriate genetic tests to patients based on clinical evidence. In this study, we developed an EHR based Genetic Testing Knowledge Base for Individualized Medicine (iGTKB). Methods We extracted genetic testing information and patient medical records from EHR systems at Mayo Clinic. Clinical features have been semi-automatically annotated from the clinical notes by applying a Natural Language Processing (NLP) tool, MedTagger suite. To prioritize clinical features for each genetic test, we compared odds ratio across four population groups. Genetic tests, genetic disorders and clinical features with their odds ratios have been applied to establish iGTKB, which is to be integrated into the Genetic Testing Ontology (GTO). Results Overall, there are five genetic tests operated with sample size greater than 100 in 2013 at Mayo Clinic. A total of 1,450 patients who was tested by one of the five genetic tests have been selected. We assembled 243 clinical features from the Human Phenotype Ontology (HPO) for these five genetic tests. There are 60 clinical features with at least one mention in clinical notes of patients taking the test. Twenty-eight clinical features with high odds ratio (greater than 1) have been selected as dominant features and deposited into iGTKB with their associated information about genetic tests and genetic disorders. Conclusions In this study, we developed an EHR based genetic testing knowledge base, iGTKB. iGTKB will be integrated into the GTO by providing relevant clinical evidence, and ultimately to support development of genetic testing recommendation system, iGenetics. PMID:26606281
EHR based Genetic Testing Knowledge Base (iGTKB) Development.
Zhu, Qian; Liu, Hongfang; Chute, Christopher G; Ferber, Matthew
2015-01-01
The gap between a large growing number of genetic tests and a suboptimal clinical workflow of incorporating these tests into regular clinical practice poses barriers to effective reliance on advanced genetic technologies to improve quality of healthcare. A promising solution to fill this gap is to develop an intelligent genetic test recommendation system that not only can provide a comprehensive view of genetic tests as education resources, but also can recommend the most appropriate genetic tests to patients based on clinical evidence. In this study, we developed an EHR based Genetic Testing Knowledge Base for Individualized Medicine (iGTKB). We extracted genetic testing information and patient medical records from EHR systems at Mayo Clinic. Clinical features have been semi-automatically annotated from the clinical notes by applying a Natural Language Processing (NLP) tool, MedTagger suite. To prioritize clinical features for each genetic test, we compared odds ratio across four population groups. Genetic tests, genetic disorders and clinical features with their odds ratios have been applied to establish iGTKB, which is to be integrated into the Genetic Testing Ontology (GTO). Overall, there are five genetic tests operated with sample size greater than 100 in 2013 at Mayo Clinic. A total of 1,450 patients who was tested by one of the five genetic tests have been selected. We assembled 243 clinical features from the Human Phenotype Ontology (HPO) for these five genetic tests. There are 60 clinical features with at least one mention in clinical notes of patients taking the test. Twenty-eight clinical features with high odds ratio (greater than 1) have been selected as dominant features and deposited into iGTKB with their associated information about genetic tests and genetic disorders. In this study, we developed an EHR based genetic testing knowledge base, iGTKB. iGTKB will be integrated into the GTO by providing relevant clinical evidence, and ultimately to support development of genetic testing recommendation system, iGenetics.
Image steganalysis using Artificial Bee Colony algorithm
NASA Astrophysics Data System (ADS)
Sajedi, Hedieh
2017-09-01
Steganography is the science of secure communication where the presence of the communication cannot be detected while steganalysis is the art of discovering the existence of the secret communication. Processing a huge amount of information takes extensive execution time and computational sources most of the time. As a result, it is needed to employ a phase of preprocessing, which can moderate the execution time and computational sources. In this paper, we propose a new feature-based blind steganalysis method for detecting stego images from the cover (clean) images with JPEG format. In this regard, we present a feature selection technique based on an improved Artificial Bee Colony (ABC). ABC algorithm is inspired by honeybees' social behaviour in their search for perfect food sources. In the proposed method, classifier performance and the dimension of the selected feature vector depend on using wrapper-based methods. The experiments are performed using two large data-sets of JPEG images. Experimental results demonstrate the effectiveness of the proposed steganalysis technique compared to the other existing techniques.
Silva Filho, Telmo M; Souza, Renata M C R; Prudêncio, Ricardo B C
2016-08-01
Some complex data types are capable of modeling data variability and imprecision. These data types are studied in the symbolic data analysis field. One such data type is interval data, which represents ranges of values and is more versatile than classic point data for many domains. This paper proposes a new prototype-based classifier for interval data, trained by a swarm optimization method. Our work has two main contributions: a swarm method which is capable of performing both automatic selection of features and pruning of unused prototypes and a generalized weighted squared Euclidean distance for interval data. By discarding unnecessary features and prototypes, the proposed algorithm deals with typical limitations of prototype-based methods, such as the problem of prototype initialization. The proposed distance is useful for learning classes in interval datasets with different shapes, sizes and structures. When compared to other prototype-based methods, the proposed method achieves lower error rates in both synthetic and real interval datasets. Copyright © 2016 Elsevier Ltd. All rights reserved.
Online feature selection with streaming features.
Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan
2013-05-01
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
NASA Astrophysics Data System (ADS)
Nikitaev, V. G.; Pronichev, A. N.; Polyakov, E. V.; Dmitrieva, V. V.; Tupitsyn, N. N.; Frenkel, M. A.; Mozhenkova, A. V.
2017-01-01
The work investigated the effect of the choice of color space component on blood cell detection based on the calculation of texture attributes of blood cells nuclei in bone marrow. The study identified the most informative color space and texture characteristics of blood cells, designed for components of these spaces. Significance ratio was introduced to assess the quality of features. We offered features that have enabled to divide lymphocytes from lymphoblasts. The selection of the features was based on the results of the data analysis.
Research on Signature Verification Method Based on Discrete Fréchet Distance
NASA Astrophysics Data System (ADS)
Fang, J. L.; Wu, W.
2018-05-01
This paper proposes a multi-feature signature template based on discrete Fréchet distance, which breaks through the limitation of traditional signature authentication using a single signature feature. It solves the online handwritten signature authentication signature global feature template extraction calculation workload, signature feature selection unreasonable problem. In this experiment, the false recognition rate (FAR) and false rejection rate (FRR) of the statistical signature are calculated and the average equal error rate (AEER) is calculated. The feasibility of the combined template scheme is verified by comparing the average equal error rate of the combination template and the original template.
Structure-Based Design of Highly Selective Inhibitors of the CREB Binding Protein Bromodomain.
Denny, R Aldrin; Flick, Andrew C; Coe, Jotham; Langille, Jonathan; Basak, Arindrajit; Liu, Shenping; Stock, Ingrid; Sahasrabudhe, Parag; Bonin, Paul; Hay, Duncan A; Brennan, Paul E; Pletcher, Mathew; Jones, Lyn H; Chekler, Eugene L Piatnitski
2017-07-13
Chemical probes are required for preclinical target validation to interrogate novel biological targets and pathways. Selective inhibitors of the CREB binding protein (CREBBP)/EP300 bromodomains are required to facilitate the elucidation of biology associated with these important epigenetic targets. Medicinal chemistry optimization that paid particular attention to physiochemical properties delivered chemical probes with desirable potency, selectivity, and permeability attributes. An important feature of the optimization process was the successful application of rational structure-based drug design to address bromodomain selectivity issues (particularly against the structurally related BRD4 protein).
A Fast, Open EEG Classification Framework Based on Feature Compression and Channel Ranking
Han, Jiuqi; Zhao, Yuwei; Sun, Hongji; Chen, Jiayun; Ke, Ang; Xu, Gesen; Zhang, Hualiang; Zhou, Jin; Wang, Changyong
2018-01-01
Superior feature extraction, channel selection and classification methods are essential for designing electroencephalography (EEG) classification frameworks. However, the performance of most frameworks is limited by their improper channel selection methods and too specifical design, leading to high computational complexity, non-convergent procedure and narrow expansibility. In this paper, to remedy these drawbacks, we propose a fast, open EEG classification framework centralized by EEG feature compression, low-dimensional representation, and convergent iterative channel ranking. First, to reduce the complexity, we use data clustering to compress the EEG features channel-wise, packing the high-dimensional EEG signal, and endowing them with numerical signatures. Second, to provide easy access to alternative superior methods, we structurally represent each EEG trial in a feature vector with its corresponding numerical signature. Thus, the recorded signals of many trials shrink to a low-dimensional structural matrix compatible with most pattern recognition methods. Third, a series of effective iterative feature selection approaches with theoretical convergence is introduced to rank the EEG channels and remove redundant ones, further accelerating the EEG classification process and ensuring its stability. Finally, a classical linear discriminant analysis (LDA) model is employed to classify a single EEG trial with selected channels. Experimental results on two real world brain-computer interface (BCI) competition datasets demonstrate the promising performance of the proposed framework over state-of-the-art methods. PMID:29713262
Receptive fields selection for binary feature description.
Fan, Bin; Kong, Qingqun; Trzcinski, Tomasz; Wang, Zhiheng; Pan, Chunhong; Fua, Pascal
2014-06-01
Feature description for local image patch is widely used in computer vision. While the conventional way to design local descriptor is based on expert experience and knowledge, learning-based methods for designing local descriptor become more and more popular because of their good performance and data-driven property. This paper proposes a novel data-driven method for designing binary feature descriptor, which we call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set of receptive fields, which are selected from a large number of candidates according to their distinctiveness and correlations in a greedy way. Using two different kinds of receptive fields (namely rectangular pooling area and Gaussian pooling area) for selection, we obtain two binary descriptors RFDR and RFDG .accordingly. Image matching experiments on the well-known patch data set and Oxford data set demonstrate that RFD significantly outperforms the state-of-the-art binary descriptors, and is comparable with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object recognition tasks confirm that both RFDR and RFDG successfully bridge the performance gap between binary descriptors and their floating-point competitors.
Multi-level basis selection of wavelet packet decomposition tree for heart sound classification.
Safara, Fatemeh; Doraisamy, Shyamala; Azman, Azreen; Jantan, Azrul; Abdullah Ramaiah, Asri Ranga
2013-10-01
Wavelet packet transform decomposes a signal into a set of orthonormal bases (nodes) and provides opportunities to select an appropriate set of these bases for feature extraction. In this paper, multi-level basis selection (MLBS) is proposed to preserve the most informative bases of a wavelet packet decomposition tree through removing less informative bases by applying three exclusion criteria: frequency range, noise frequency, and energy threshold. MLBS achieved an accuracy of 97.56% for classifying normal heart sound, aortic stenosis, mitral regurgitation, and aortic regurgitation. MLBS is a promising basis selection to be suggested for signals with a small range of frequencies. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.
Computer-aided diagnosis of prostate cancer in the peripheral zone using multiparametric MRI
NASA Astrophysics Data System (ADS)
Niaf, Emilie; Rouvière, Olivier; Mège-Lechevallier, Florence; Bratan, Flavie; Lartizien, Carole
2012-06-01
This study evaluated a computer-assisted diagnosis (CADx) system for determining a likelihood measure of prostate cancer presence in the peripheral zone (PZ) based on multiparametric magnetic resonance (MR) imaging, including T2-weighted, diffusion-weighted and dynamic contrast-enhanced MRI at 1.5 T. Based on a feature set derived from grey-level images, including first-order statistics, Haralick features, gradient features, semi-quantitative and quantitative (pharmacokinetic modelling) dynamic parameters, four kinds of classifiers were trained and compared : nonlinear support vector machine (SVM), linear discriminant analysis, k-nearest neighbours and naïve Bayes classifiers. A set of feature selection methods based on t-test, mutual information and minimum-redundancy-maximum-relevancy criteria were also compared. The aim was to discriminate between the relevant features as well as to create an efficient classifier using these features. The diagnostic performances of these different CADx schemes were evaluated based on a receiver operating characteristic (ROC) curve analysis. The evaluation database consisted of 30 sets of multiparametric MR images acquired from radical prostatectomy patients. Using histologic sections as the gold standard, both cancer and nonmalignant (but suspicious) tissues were annotated in consensus on all MR images by two radiologists, a histopathologist and a researcher. Benign tissue regions of interest (ROIs) were also delineated in the remaining prostate PZ. This resulted in a series of 42 cancer ROIs, 49 benign but suspicious ROIs and 124 nonsuspicious benign ROIs. From the outputs of all evaluated feature selection methods on the test bench, a restrictive set of about 15 highly informative features coming from all MR sequences was discriminated, thus confirming the validity of the multiparametric approach. Quantitative evaluation of the diagnostic performance yielded a maximal area under the ROC curve (AUC) of 0.89 (0.81-0.94) for the discrimination of the malignant versus nonmalignant tissues and 0.82 (0.73-0.90) for the discrimination of the malignant versus suspicious tissues when combining the t-test feature selection approach with a SVM classifier. A preliminary comparison showed that the optimal CADx scheme mimicked, in terms of AUC, the human experts in differentiating malignant from suspicious tissues, thus demonstrating its potential for assisting cancer identification in the PZ.
NASA Astrophysics Data System (ADS)
Sakkiah, Sugunadevi; Thangapandian, Sundarapandian; John, Shalini; Lee, Keun Woo
2011-01-01
This study was performed to find the selective chemical features for Aurora kinase-B inhibitors using the potent methods like Hip-Hop, virtual screening, homology modeling, molecular dynamics and docking. The best hypothesis, Hypo1 was validated toward a wide range of test set containing the selective inhibitors of Aurora kinase-B. Homology modeling and molecular dynamics studies were carried out to perform the molecular docking studies. The best hypothesis Hypo1 was used as a 3D query to screen the chemical databases. The screened molecules from the databases were sorted based on ADME and drug like properties. The selective hit compounds were docked and the hydrogen bond interactions with the critical amino acids present in Aurora kinase-B were compared with the chemical features present in the Hypo1. Finally, we suggest that the chemical features present in the Hypo1 are vital for a molecule to inhibit the Aurora kinase-B activity.
Maity, Maitreya; Dhane, Dhiraj; Mungle, Tushar; Maiti, A K; Chakraborty, Chandan
2017-10-26
Web-enabled e-healthcare system or computer assisted disease diagnosis has a potential to improve the quality and service of conventional healthcare delivery approach. The article describes the design and development of a web-based distributed healthcare management system for medical information and quantitative evaluation of microscopic images using machine learning approach for malaria. In the proposed study, all the health-care centres are connected in a distributed computer network. Each peripheral centre manages its' own health-care service independently and communicates with the central server for remote assistance. The proposed methodology for automated evaluation of parasites includes pre-processing of blood smear microscopic images followed by erythrocytes segmentation. To differentiate between different parasites; a total of 138 quantitative features characterising colour, morphology, and texture are extracted from segmented erythrocytes. An integrated pattern classification framework is designed where four feature selection methods viz. Correlation-based Feature Selection (CFS), Chi-square, Information Gain, and RELIEF are employed with three different classifiers i.e. Naive Bayes', C4.5, and Instance-Based Learning (IB1) individually. Optimal features subset with the best classifier is selected for achieving maximum diagnostic precision. It is seen that the proposed method achieved with 99.2% sensitivity and 99.6% specificity by combining CFS and C4.5 in comparison with other methods. Moreover, the web-based tool is entirely designed using open standards like Java for a web application, ImageJ for image processing, and WEKA for data mining considering its feasibility in rural places with minimal health care facilities.
Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong
2011-09-01
One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.
Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.
2013-09-01
This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less
BCC skin cancer diagnosis based on texture analysis techniques
NASA Astrophysics Data System (ADS)
Chuang, Shao-Hui; Sun, Xiaoyan; Chang, Wen-Yu; Chen, Gwo-Shing; Huang, Adam; Li, Jiang; McKenzie, Frederic D.
2011-03-01
In this paper, we present a texture analysis based method for diagnosing the Basal Cell Carcinoma (BCC) skin cancer using optical images taken from the suspicious skin regions. We first extracted the Run Length Matrix and Haralick texture features from the images and used a feature selection algorithm to identify the most effective feature set for the diagnosis. We then utilized a Multi-Layer Perceptron (MLP) classifier to classify the images to BCC or normal cases. Experiments showed that detecting BCC cancer based on optical images is feasible. The best sensitivity and specificity we achieved on our data set were 94% and 95%, respectively.
A wavelet-based technique to predict treatment outcome for Major Depressive Disorder
Xia, Likun; Mohd Yasin, Mohd Azhar; Azhar Ali, Syed Saad
2017-01-01
Treatment management for Major Depressive Disorder (MDD) has been challenging. However, electroencephalogram (EEG)-based predictions of antidepressant’s treatment outcome may help during antidepressant’s selection and ultimately improve the quality of life for MDD patients. In this study, a machine learning (ML) method involving pretreatment EEG data was proposed to perform such predictions for Selective Serotonin Reuptake Inhibitor (SSRIs). For this purpose, the acquisition of experimental data involved 34 MDD patients and 30 healthy controls. Consequently, a feature matrix was constructed involving time-frequency decomposition of EEG data based on wavelet transform (WT) analysis, termed as EEG data matrix. However, the resultant EEG data matrix had high dimensionality. Therefore, dimension reduction was performed based on a rank-based feature selection method according to a criterion, i.e., receiver operating characteristic (ROC). As a result, the most significant features were identified and further be utilized during the training and testing of a classification model, i.e., the logistic regression (LR) classifier. Finally, the LR model was validated with 100 iterations of 10-fold cross-validation (10-CV). The classification results were compared with short-time Fourier transform (STFT) analysis, and empirical mode decompositions (EMD). The wavelet features extracted from frontal and temporal EEG data were found statistically significant. In comparison with other time-frequency approaches such as the STFT and EMD, the WT analysis has shown highest classification accuracy, i.e., accuracy = 87.5%, sensitivity = 95%, and specificity = 80%. In conclusion, significant wavelet coefficients extracted from frontal and temporal pre-treatment EEG data involving delta and theta frequency bands may predict antidepressant’s treatment outcome for the MDD patients. PMID:28152063
Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.
Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei
2016-01-13
An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.
1989-10-01
weight based on how powerful the corresponding feature is for object recognition and discrimination. For example, consider an arbitrary weight, denoted...quality of the segmentation, how powerful the features and spatial constraints in the knowledge base are (as far as object recognition is concern...that are powerful for object recognition and discrimination. At this point, this selection is performed heuristically through trial-and-error. As a
Gene-assisted selection: applications of association genetics for forest tree breeding
Philip L. Wilcox; Craig E. Echt; Rowland D. Burdon
2007-01-01
This chapter describes application of association genetics in forest tree species for the purposes of selection. We use the term gene-assisted selection (GAS) to denote application of marker-trait associations determined via association genetics, which we anticipate will be based on poly morph isms associated with expressed genes. The salient features of forest trees...
Cho, Ming-Yuan; Hoang, Thi Thom
2017-01-01
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Discriminative region extraction and feature selection based on the combination of SURF and saliency
NASA Astrophysics Data System (ADS)
Deng, Li; Wang, Chunhong; Rao, Changhui
2011-08-01
The objective of this paper is to provide a possible optimization on salient region algorithm, which is extensively used in recognizing and learning object categories. Salient region algorithm owns the superiority of intra-class tolerance, global score of features and automatically prominent scale selection under certain range. However, the major limitation behaves on performance, and that is what we attempt to improve. By reducing the number of pixels involved in saliency calculation, it can be accelerated. We use interest points detected by fast-Hessian, the detector of SURF, as the candidate feature for saliency operation, rather than the whole set in image. This implementation is thereby called Saliency based Optimization over SURF (SOSU for short). Experiment shows that bringing in of such a fast detector significantly speeds up the algorithm. Meanwhile, Robustness of intra-class diversity ensures object recognition accuracy.
Hierarchy of Gambling Choices: A Framework for Examining EGM Gambling Environment Preferences.
Thorne, Hannah Briony; Rockloff, Matthew Justus; Langham, Erika; Li, En
2016-12-01
This paper presents the Hierarchy of Gambling Choices (HGC), which is a consumer-oriented framework for understanding the key environmental and contextual features that influence peoples' selections of online and venue-based electronic gaming machines (EGMs). The HGC framework proposes that EGM gamblers make choices in selection of EGM gambling experiences utilising Tversky's (Psychol Rev 79(4):281-299, 1972). Elimination-by-Aspects model, and organise their choice in a hierarchical manner by virtue of EGMs being an "experience good" (Nelson in J Polit Econ 78(2):311-329, 1970). EGM features are divided into three levels: the platform-including, online, mobile or land-based; the provider or specific venue in which the gambling occurs; and the game or machine characteristics, such as graphical themes and bonus features. This framework will contribute to the gambling field by providing a manner in which to systematically explore the environment surrounding EGM gambling and how it affects behaviour.
Autonomous Exploration for Gathering Increased Science
NASA Technical Reports Server (NTRS)
Bornstein, Benjamin J.; Castano, Rebecca; Estlin, Tara A.; Gaines, Daniel M.; Anderson, Robert C.; Thompson, David R.; DeGranville, Charles K.; Chien, Steve A.; Tang, Benyang; Burl, Michael C.;
2010-01-01
The Autonomous Exploration for Gathering Increased Science System (AEGIS) provides automated targeting for remote sensing instruments on the Mars Exploration Rover (MER) mission, which at the time of this reporting has had two rovers exploring the surface of Mars (see figure). Currently, targets for rover remote-sensing instruments must be selected manually based on imagery already on the ground with the operations team. AEGIS enables the rover flight software to analyze imagery onboard in order to autonomously select and sequence targeted remote-sensing observations in an opportunistic fashion. In particular, this technology will be used to automatically acquire sub-framed, high-resolution, targeted images taken with the MER panoramic cameras. This software provides: 1) Automatic detection of terrain features in rover camera images, 2) Feature extraction for detected terrain targets, 3) Prioritization of terrain targets based on a scientist target feature set, and 4) Automated re-targeting of rover remote-sensing instruments at the highest priority target.
NASA Astrophysics Data System (ADS)
Zhang, Lei; Yang, Fengbao; Ji, Linna; Lv, Sheng
2018-01-01
Diverse image fusion methods perform differently. Each method has advantages and disadvantages compared with others. One notion is that the advantages of different image methods can be effectively combined. A multiple-algorithm parallel fusion method based on algorithmic complementarity and synergy is proposed. First, in view of the characteristics of the different algorithms and difference-features among images, an index vector-based feature-similarity is proposed to define the degree of complementarity and synergy. This proposed index vector is a reliable evidence indicator for algorithm selection. Second, the algorithms with a high degree of complementarity and synergy are selected. Then, the different degrees of various features and infrared intensity images are used as the initial weights for the nonnegative matrix factorization (NMF). This avoids randomness of the NMF initialization parameter. Finally, the fused images of different algorithms are integrated using the NMF because of its excellent data fusing performance on independent features. Experimental results demonstrate that the visual effect and objective evaluation index of the fused images obtained using the proposed method are better than those obtained using traditional methods. The proposed method retains all the advantages that individual fusion algorithms have.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Zu, Chen; Jie, Biao; Liu, Mingxia; Chen, Songcan
2015-01-01
Multimodal classification methods using different modalities of imaging and non-imaging data have recently shown great advantages over traditional single-modality-based ones for diagnosis and prognosis of Alzheimer’s disease (AD), as well as its prodromal stage, i.e., mild cognitive impairment (MCI). However, to the best of our knowledge, most existing methods focus on mining the relationship across multiple modalities of the same subjects, while ignoring the potentially useful relationship across different subjects. Accordingly, in this paper, we propose a novel learning method for multimodal classification of AD/MCI, by fully exploring the relationships across both modalities and subjects. Specifically, our proposed method includes two subsequent components, i.e., label-aligned multi-task feature selection and multimodal classification. In the first step, the feature selection learning from multiple modalities are treated as different learning tasks and a group sparsity regularizer is imposed to jointly select a subset of relevant features. Furthermore, to utilize the discriminative information among labeled subjects, a new label-aligned regularization term is added into the objective function of standard multi-task feature selection, where label-alignment means that all multi-modality subjects with the same class labels should be closer in the new feature-reduced space. In the second step, a multi-kernel support vector machine (SVM) is adopted to fuse the selected features from multi-modality data for final classification. To validate our method, we perform experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using baseline MRI and FDG-PET imaging data. The experimental results demonstrate that our proposed method achieves better classification performance compared with several state-of-the-art methods for multimodal classification of AD/MCI. PMID:26572145
A support vector machine approach for classification of welding defects from ultrasonic signals
NASA Astrophysics Data System (ADS)
Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming
2014-07-01
Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Low-Complexity Discriminative Feature Selection From EEG Before and After Short-Term Memory Task.
Behzadfar, Neda; Firoozabadi, S Mohammad P; Badie, Kambiz
2016-10-01
A reliable and unobtrusive quantification of changes in cortical activity during short-term memory task can be used to evaluate the efficacy of interfaces and to provide real-time user-state information. In this article, we investigate changes in electroencephalogram signals in short-term memory with respect to the baseline activity. The electroencephalogram signals have been analyzed using 9 linear and nonlinear/dynamic measures. We applied statistical Wilcoxon examination and Davis-Bouldian criterion to select optimal discriminative features. The results show that among the features, the permutation entropy significantly increased in frontal lobe and the occipital second lower alpha band activity decreased during memory task. These 2 features reflect the same mental task; however, their correlation with memory task varies in different intervals. In conclusion, it is suggested that the combination of the 2 features would improve the performance of memory based neurofeedback systems. © EEG and Clinical Neuroscience Society (ECNS) 2016.
NASA Astrophysics Data System (ADS)
Wang, Ximing; Kim, Bokkyu; Park, Ji Hoon; Wang, Erik; Forsyth, Sydney; Lim, Cody; Ravi, Ragini; Karibyan, Sarkis; Sanchez, Alexander; Liu, Brent
2017-03-01
Quantitative imaging biomarkers are used widely in clinical trials for tracking and evaluation of medical interventions. Previously, we have presented a web based informatics system utilizing quantitative imaging features for predicting outcomes in stroke rehabilitation clinical trials. The system integrates imaging features extraction tools and a web-based statistical analysis tool. The tools include a generalized linear mixed model(GLMM) that can investigate potential significance and correlation based on features extracted from clinical data and quantitative biomarkers. The imaging features extraction tools allow the user to collect imaging features and the GLMM module allows the user to select clinical data and imaging features such as stroke lesion characteristics from the database as regressors and regressands. This paper discusses the application scenario and evaluation results of the system in a stroke rehabilitation clinical trial. The system was utilized to manage clinical data and extract imaging biomarkers including stroke lesion volume, location and ventricle/brain ratio. The GLMM module was validated and the efficiency of data analysis was also evaluated.
Gao, Yu-Fei; Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Li, Zhan-Dong; Jiang, Yang
2013-01-27
Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.
Chatterjee, Sankhadeep; Dey, Nilanjan; Shi, Fuqian; Ashour, Amira S; Fong, Simon James; Sen, Soumya
2018-04-01
Dengue fever detection and classification have a vital role due to the recent outbreaks of different kinds of dengue fever. Recently, the advancement in the microarray technology can be employed for such classification process. Several studies have established that the gene selection phase takes a significant role in the classifier performance. Subsequently, the current study focused on detecting two different variations, namely, dengue fever (DF) and dengue hemorrhagic fever (DHF). A modified bag-of-features method has been proposed to select the most promising genes in the classification process. Afterward, a modified cuckoo search optimization algorithm has been engaged to support the artificial neural (ANN-MCS) to classify the unknown subjects into three different classes namely, DF, DHF, and another class containing convalescent and normal cases. The proposed method has been compared with other three well-known classifiers, namely, multilayer perceptron feed-forward network (MLP-FFN), artificial neural network (ANN) trained with cuckoo search (ANN-CS), and ANN trained with PSO (ANN-PSO). Experiments have been carried out with different number of clusters for the initial bag-of-features-based feature selection phase. After obtaining the reduced dataset, the hybrid ANN-MCS model has been employed for the classification process. The results have been compared in terms of the confusion matrix-based performance measuring metrics. The experimental results indicated a highly statistically significant improvement with the proposed classifier over the traditional ANN-CS model.
Peikert, Tobias; Duan, Fenghai; Rajagopalan, Srinivasan; Karwoski, Ronald A; Clay, Ryan; Robb, Richard A; Qin, Ziling; Sicks, JoRean; Bartholmai, Brian J; Maldonado, Fabien
2018-01-01
Optimization of the clinical management of screen-detected lung nodules is needed to avoid unnecessary diagnostic interventions. Herein we demonstrate the potential value of a novel radiomics-based approach for the classification of screen-detected indeterminate nodules. Independent quantitative variables assessing various radiologic nodule features such as sphericity, flatness, elongation, spiculation, lobulation and curvature were developed from the NLST dataset using 726 indeterminate nodules (all ≥ 7 mm, benign, n = 318 and malignant, n = 408). Multivariate analysis was performed using least absolute shrinkage and selection operator (LASSO) method for variable selection and regularization in order to enhance the prediction accuracy and interpretability of the multivariate model. The bootstrapping method was then applied for the internal validation and the optimism-corrected AUC was reported for the final model. Eight of the originally considered 57 quantitative radiologic features were selected by LASSO multivariate modeling. These 8 features include variables capturing Location: vertical location (Offset carina centroid z), Size: volume estimate (Minimum enclosing brick), Shape: flatness, Density: texture analysis (Score Indicative of Lesion/Lung Aggression/Abnormality (SILA) texture), and surface characteristics: surface complexity (Maximum shape index and Average shape index), and estimates of surface curvature (Average positive mean curvature and Minimum mean curvature), all with P<0.01. The optimism-corrected AUC for these 8 features is 0.939. Our novel radiomic LDCT-based approach for indeterminate screen-detected nodule characterization appears extremely promising however independent external validation is needed.
Early Visual Cortex Dynamics during Top-Down Modulated Shifts of Feature-Selective Attention.
Müller, Matthias M; Trautmann, Mireille; Keitel, Christian
2016-04-01
Shifting attention from one color to another color or from color to another feature dimension such as shape or orientation is imperative when searching for a certain object in a cluttered scene. Most attention models that emphasize feature-based selection implicitly assume that all shifts in feature-selective attention underlie identical temporal dynamics. Here, we recorded time courses of behavioral data and steady-state visual evoked potentials (SSVEPs), an objective electrophysiological measure of neural dynamics in early visual cortex to investigate temporal dynamics when participants shifted attention from color or orientation toward color or orientation, respectively. SSVEPs were elicited by four random dot kinematograms that flickered at different frequencies. Each random dot kinematogram was composed of dashes that uniquely combined two features from the dimensions color (red or blue) and orientation (slash or backslash). Participants were cued to attend to one feature (such as color or orientation) and respond to coherent motion targets of the to-be-attended feature. We found that shifts toward color occurred earlier after the shifting cue compared with shifts toward orientation, regardless of the original feature (i.e., color or orientation). This was paralleled in SSVEP amplitude modulations as well as in the time course of behavioral data. Overall, our results suggest different neural dynamics during shifts of attention from color and orientation and the respective shifting destinations, namely, either toward color or toward orientation.
Kim, Bumhwi; Ban, Sang-Woo; Lee, Minho
2013-10-01
Humans can efficiently perceive arbitrary visual objects based on an incremental learning mechanism with selective attention. This paper proposes a new task specific top-down attention model to locate a target object based on its form and color representation along with a bottom-up saliency based on relativity of primitive visual features and some memory modules. In the proposed model top-down bias signals corresponding to the target form and color features are generated, which draw the preferential attention to the desired object by the proposed selective attention model in concomitance with the bottom-up saliency process. The object form and color representation and memory modules have an incremental learning mechanism together with a proper object feature representation scheme. The proposed model includes a Growing Fuzzy Topology Adaptive Resonance Theory (GFTART) network which plays two important roles in object color and form biased attention; one is to incrementally learn and memorize color and form features of various objects, and the other is to generate a top-down bias signal to localize a target object by focusing on the candidate local areas. Moreover, the GFTART network can be utilized for knowledge inference which enables the perception of new unknown objects on the basis of the object form and color features stored in the memory during training. Experimental results show that the proposed model is successful in focusing on the specified target objects, in addition to the incremental representation and memorization of various objects in natural scenes. In addition, the proposed model properly infers new unknown objects based on the form and color features of previously trained objects. Copyright © 2013 Elsevier Ltd. All rights reserved.
Design of a mobile brain computer interface-based smart multimedia controller.
Tseng, Kevin C; Lin, Bor-Shing; Wong, Alice May-Kuen; Lin, Bor-Shyh
2015-03-06
Music is a way of expressing our feelings and emotions. Suitable music can positively affect people. However, current multimedia control methods, such as manual selection or automatic random mechanisms, which are now applied broadly in MP3 and CD players, cannot adaptively select suitable music according to the user's physiological state. In this study, a brain computer interface-based smart multimedia controller was proposed to select music in different situations according to the user's physiological state. Here, a commercial mobile tablet was used as the multimedia platform, and a wireless multi-channel electroencephalograph (EEG) acquisition module was designed for real-time EEG monitoring. A smart multimedia control program built in the multimedia platform was developed to analyze the user's EEG feature and select music according his/her state. The relationship between the user's state and music sorted by listener's preference was also examined in this study. The experimental results show that real-time music biofeedback according a user's EEG feature may positively improve the user's attention state.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Guo, Xinyu; Dominick, Kelli C.; Minai, Ali A.; Li, Hailong; Erickson, Craig A.; Lu, Long J.
2017-01-01
The whole-brain functional connectivity (FC) pattern obtained from resting-state functional magnetic resonance imaging data are commonly applied to study neuropsychiatric conditions such as autism spectrum disorder (ASD) by using different machine learning models. Recent studies indicate that both hyper- and hypo- aberrant ASD-associated FCs were widely distributed throughout the entire brain rather than only in some specific brain regions. Deep neural networks (DNN) with multiple hidden layers have shown the ability to systematically extract lower-to-higher level information from high dimensional data across a series of neural hidden layers, significantly improving classification accuracy for such data. In this study, a DNN with a novel feature selection method (DNN-FS) is developed for the high dimensional whole-brain resting-state FC pattern classification of ASD patients vs. typical development (TD) controls. The feature selection method is able to help the DNN generate low dimensional high-quality representations of the whole-brain FC patterns by selecting features with high discriminating power from multiple trained sparse auto-encoders. For the comparison, a DNN without the feature selection method (DNN-woFS) is developed, and both of them are tested with different architectures (i.e., with different numbers of hidden layers/nodes). Results show that the best classification accuracy of 86.36% is generated by the DNN-FS approach with 3 hidden layers and 150 hidden nodes (3/150). Remarkably, DNN-FS outperforms DNN-woFS for all architectures studied. The most significant accuracy improvement was 9.09% with the 3/150 architecture. The method also outperforms other feature selection methods, e.g., two sample t-test and elastic net. In addition to improving the classification accuracy, a Fisher's score-based biomarker identification method based on the DNN is also developed, and used to identify 32 FCs related to ASD. These FCs come from or cross different pre-defined brain networks including the default-mode, cingulo-opercular, frontal-parietal, and cerebellum. Thirteen of them are statically significant between ASD and TD groups (two sample t-test p < 0.05) while 19 of them are not. The relationship between the statically significant FCs and the corresponding ASD behavior symptoms is discussed based on the literature and clinician's expert knowledge. Meanwhile, the potential reason of obtaining 19 FCs which are not statistically significant is also provided. PMID:28871217
Ataer-Cansizoglu, E; Kalpathy-Cramer, J; You, S; Keck, K; Erdogmus, D; Chiang, M F
2015-01-01
Inter-expert variability in image-based clinical diagnosis has been demonstrated in many diseases including retinopathy of prematurity (ROP), which is a disease affecting low birth weight infants and is a major cause of childhood blindness. In order to better understand the underlying causes of variability among experts, we propose a method to quantify the variability of expert decisions and analyze the relationship between expert diagnoses and features computed from the images. Identification of these features is relevant for development of computer-based decision support systems and educational systems in ROP, and these methods may be applicable to other diseases where inter-expert variability is observed. The experiments were carried out on a dataset of 34 retinal images, each with diagnoses provided independently by 22 experts. Analysis was performed using concepts of Mutual Information (MI) and Kernel Density Estimation. A large set of structural features (a total of 66) were extracted from retinal images. Feature selection was utilized to identify the most important features that correlated to actual clinical decisions by the 22 study experts. The best three features for each observer were selected by an exhaustive search on all possible feature subsets and considering joint MI as a relevance criterion. We also compared our results with the results of Cohen's Kappa [36] as an inter-rater reliability measure. The results demonstrate that a group of observers (17 among 22) decide consistently with each other. Mean and second central moment of arteriolar tortuosity is among the reasons of disagreement between this group and the rest of the observers, meaning that the group of experts consider amount of tortuosity as well as the variation of tortuosity in the image. Given a set of image-based features, the proposed analysis method can identify critical image-based features that lead to expert agreement and disagreement in diagnosis of ROP. Although tree-based features and various statistics such as central moment are not popular in the literature, our results suggest that they are important for diagnosis.
The effect of feature selection methods on computer-aided detection of masses in mammograms
NASA Astrophysics Data System (ADS)
Hupse, Rianne; Karssemeijer, Nico
2010-05-01
In computer-aided diagnosis (CAD) research, feature selection methods are often used to improve generalization performance of classifiers and shorten computation times. In an application that detects malignant masses in mammograms, we investigated the effect of using a selection criterion that is similar to the final performance measure we are optimizing, namely the mean sensitivity of the system in a predefined range of the free-response receiver operating characteristics (FROC). To obtain the generalization performance of the selected feature subsets, a cross validation procedure was performed on a dataset containing 351 abnormal and 7879 normal regions, each region providing a set of 71 mass features. The same number of noise features, not containing any information, were added to investigate the ability of the feature selection algorithms to distinguish between useful and non-useful features. It was found that significantly higher performances were obtained using feature sets selected by the general test statistic Wilks' lambda than using feature sets selected by the more specific FROC measure. Feature selection leads to better performance when compared to a system in which all features were used.
NASA Astrophysics Data System (ADS)
Pietrzyk, Mariusz W.; Donovan, Tim; Brennan, Patrick C.; Dix, Alan; Manning, David J.
2011-03-01
Aim: To optimize automated classification of radiological errors during lung nodule detection from chest radiographs (CxR) using a support vector machine (SVM) run on the spatial frequency features extracted from the local background of selected regions. Background: The majority of the unreported pulmonary nodules are visually detected but not recognized; shown by the prolonged dwell time values at false-negative regions. Similarly, overestimated nodule locations are capturing substantial amounts of foveal attention. Spatial frequency properties of selected local backgrounds are correlated with human observer responses either in terms of accuracy in indicating abnormality position or in the precision of visual sampling the medical images. Methods: Seven radiologists participated in the eye tracking experiments conducted under conditions of pulmonary nodule detection from a set of 20 postero-anterior CxR. The most dwelled locations have been identified and subjected to spatial frequency (SF) analysis. The image-based features of selected ROI were extracted with un-decimated Wavelet Packet Transform. An analysis of variance was run to select SF features and a SVM schema was implemented to classify False-Negative and False-Positive from all ROI. Results: A relative high overall accuracy was obtained for each individually developed Wavelet-SVM algorithm, with over 90% average correct ratio for errors recognition from all prolonged dwell locations. Conclusion: The preliminary results show that combined eye-tracking and image-based features can be used for automated detection of radiological error with SVM. The work is still in progress and not all analytical procedures have been completed, which might have an effect on the specificity of the algorithm.
Adaptive fusion of infrared and visible images in dynamic scene
NASA Astrophysics Data System (ADS)
Yang, Guang; Yin, Yafeng; Man, Hong; Desai, Sachi
2011-11-01
Multiple modalities sensor fusion has been widely employed in various surveillance and military applications. A variety of image fusion techniques including PCA, wavelet, curvelet and HSV has been proposed in recent years to improve human visual perception for object detection. One of the main challenges for visible and infrared image fusion is to automatically determine an optimal fusion strategy for different input scenes along with an acceptable computational cost. This paper, we propose a fast and adaptive feature selection based image fusion method to obtain high a contrast image from visible and infrared sensors for targets detection. At first, fuzzy c-means clustering is applied on the infrared image to highlight possible hotspot regions, which will be considered as potential targets' locations. After that, the region surrounding the target area is segmented as the background regions. Then image fusion is locally applied on the selected target and background regions by computing different linear combination of color components from registered visible and infrared images. After obtaining different fused images, histogram distributions are computed on these local fusion images as the fusion feature set. The variance ratio which is based on Linear Discriminative Analysis (LDA) measure is employed to sort the feature set and the most discriminative one is selected for the whole image fusion. As the feature selection is performed over time, the process will dynamically determine the most suitable feature for the image fusion in different scenes. Experiment is conducted on the OSU Color-Thermal database, and TNO Human Factor dataset. The fusion results indicate that our proposed method achieved a competitive performance compared with other fusion algorithms at a relatively low computational cost.
Khodayari-Rostamabad, Ahmad; Reilly, James P; Hasey, Gary M; de Bruin, Hubert; Maccrimmon, Duncan J
2013-10-01
The problem of identifying, in advance, the most effective treatment agent for various psychiatric conditions remains an elusive goal. To address this challenge, we investigate the performance of the proposed machine learning (ML) methodology (based on the pre-treatment electroencephalogram (EEG)) for prediction of response to treatment with a selective serotonin reuptake inhibitor (SSRI) medication in subjects suffering from major depressive disorder (MDD). A relatively small number of most discriminating features are selected from a large group of candidate features extracted from the subject's pre-treatment EEG, using a machine learning procedure for feature selection. The selected features are fed into a classifier, which was realized as a mixture of factor analysis (MFA) model, whose output is the predicted response in the form of a likelihood value. This likelihood indicates the extent to which the subject belongs to the responder vs. non-responder classes. The overall method was evaluated using a "leave-n-out" randomized permutation cross-validation procedure. A list of discriminating EEG biomarkers (features) was found. The specificity of the proposed method is 80.9% while sensitivity is 94.9%, for an overall prediction accuracy of 87.9%. There is a 98.76% confidence that the estimated prediction rate is within the interval [75%, 100%]. These results indicate that the proposed ML method holds considerable promise in predicting the efficacy of SSRI antidepressant therapy for MDD, based on a simple and cost-effective pre-treatment EEG. The proposed approach offers the potential to improve the treatment of major depression and to reduce health care costs. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Adeniyi, D. A.; Wei, Z.; Yang, Y.
2017-10-01
Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.
An EEG-based functional connectivity measure for automatic detection of alcohol use disorder.
Mumtaz, Wajid; Saad, Mohamad Naufal B Mohamad; Kamel, Nidal; Ali, Syed Saad Azhar; Malik, Aamir Saeed
2018-01-01
The abnormal alcohol consumption could cause toxicity and could alter the human brain's structure and function, termed as alcohol used disorder (AUD). Unfortunately, the conventional screening methods for AUD patients are subjective and manual. Hence, to perform automatic screening of AUD patients, objective methods are needed. The electroencephalographic (EEG) data have been utilized to study the differences of brain signals between alcoholics and healthy controls that could further developed as an automatic screening tool for alcoholics. In this work, resting-state EEG-derived features were utilized as input data to the proposed feature selection and classification method. The aim was to perform automatic classification of AUD patients and healthy controls. The validation of the proposed method involved real-EEG data acquired from 30 AUD patients and 30 age-matched healthy controls. The resting-state EEG-derived features such as synchronization likelihood (SL) were computed involving 19 scalp locations resulted into 513 features. Furthermore, the features were rank-ordered to select the most discriminant features involving a rank-based feature selection method according to a criterion, i.e., receiver operating characteristics (ROC). Consequently, a reduced set of most discriminant features was identified and utilized further during classification of AUD patients and healthy controls. In this study, three different classification models such as Support Vector Machine (SVM), Naïve Bayesian (NB), and Logistic Regression (LR) were used. The study resulted into SVM classification accuracy=98%, sensitivity=99.9%, specificity=95%, and f-measure=0.97; LR classification accuracy=91.7%, sensitivity=86.66%, specificity=96.6%, and f-measure=0.90; NB classification accuracy=93.6%, sensitivity=100%, specificity=87.9%, and f-measure=0.95. The SL features could be utilized as objective markers to screen the AUD patients and healthy controls. Copyright © 2017 Elsevier B.V. All rights reserved.
Latent feature decompositions for integrative analysis of multi-platform genomic data
Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran
2015-01-01
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492
EEG-based Affect and Workload Recognition in a Virtual Driving Environment for ASD Intervention
Wade, Joshua W.; Key, Alexandra P.; Warren, Zachary E.; Sarkar, Nilanjan
2017-01-01
objective To build group-level classification models capable of recognizing affective states and mental workload of individuals with autism spectrum disorder (ASD) during driving skill training. Methods Twenty adolescents with ASD participated in a six-session virtual reality driving simulator based experiment, during which their electroencephalogram (EEG) data were recorded alongside driving events and a therapist’s rating of their affective states and mental workload. Five feature generation approaches including statistical features, fractal dimension features, higher order crossings (HOC)-based features, power features from frequency bands, and power features from bins (Δf = 2 Hz) were applied to extract relevant features. Individual differences were removed with a two-step feature calibration method. Finally, binary classification results based on the k-nearest neighbors algorithm and univariate feature selection method were evaluated by leave-one-subject-out nested cross-validation to compare feature types and identify discriminative features. Results The best classification results were achieved using power features from bins for engagement (0.95) and boredom (0.78), and HOC-based features for enjoyment (0.90), frustration (0.88), and workload (0.86). Conclusion Offline EEG-based group-level classification models are feasible for recognizing binary low and high intensity of affect and workload of individuals with ASD in the context of driving. However, while promising the applicability of the models in an online adaptive driving task requires further development. Significance The developed models provide a basis for an EEG-based passive brain computer interface system that has the potential to benefit individuals with ASD with an affect- and workload-based individualized driving skill training intervention. PMID:28422647
Thomaz, Ricardo de Lima; Carneiro, Pedro Cunha; Bonin, João Eliton; Macedo, Túlio Augusto Alves; Patrocinio, Ana Claudia; Soares, Alcimar Barbosa
2018-05-01
Detection of early hepatocellular carcinoma (HCC) is responsible for increasing survival rates in up to 40%. One-class classifiers can be used for modeling early HCC in multidetector computed tomography (MDCT), but demand the specific knowledge pertaining to the set of features that best describes the target class. Although the literature outlines several features for characterizing liver lesions, it is unclear which is most relevant for describing early HCC. In this paper, we introduce an unconstrained GA feature selection algorithm based on a multi-objective Mahalanobis fitness function to improve the classification performance for early HCC. We compared our approach to a constrained Mahalanobis function and two other unconstrained functions using Welch's t-test and Gaussian Data Descriptors. The performance of each fitness function was evaluated by cross-validating a one-class SVM. The results show that the proposed multi-objective Mahalanobis fitness function is capable of significantly reducing data dimensionality (96.4%) and improving one-class classification of early HCC (0.84 AUC). Furthermore, the results provide strong evidence that intensity features extracted at the arterial to portal and arterial to equilibrium phases are important for classifying early HCC.
Particle Swarm Optimization approach to defect detection in armour ceramics.
Kesharaju, Manasa; Nagarajah, Romesh
2017-03-01
In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function. Copyright © 2016. Published by Elsevier B.V.
Nayak, Deepak Ranjan; Dash, Ratnakar; Majhi, Banshidhar
2017-01-01
This paper presents an automatic classification system for segregating pathological brain from normal brains in magnetic resonance imaging scanning. The proposed system employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images. Two-dimensional stationary wavelet transform is harnessed to extract features from the preprocessed images. The feature vector is constructed using the energy and entropy values, computed from the level- 2 SWT coefficients. Then, the relevant and uncorrelated features are selected using symmetric uncertainty ranking filter. Subsequently, the selected features are given input to the proposed AdaBoost with support vector machine classifier, where SVM is used as the base classifier of AdaBoost algorithm. To validate the proposed system, three standard MR image datasets, Dataset-66, Dataset-160, and Dataset- 255 have been utilized. The 5 runs of k-fold stratified cross validation results indicate the suggested scheme offers better performance than other existing schemes in terms of accuracy and number of features. The proposed system earns ideal classification over Dataset-66 and Dataset-160; whereas, for Dataset- 255, an accuracy of 99.45% is achieved. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Detection of goal events in soccer videos
NASA Astrophysics Data System (ADS)
Kim, Hyoung-Gook; Roeber, Steffen; Samour, Amjad; Sikora, Thomas
2005-01-01
In this paper, we present an automatic extraction of goal events in soccer videos by using audio track features alone without relying on expensive-to-compute video track features. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The detection of soccer video highlights using audio contents comprises three steps: 1) extraction of audio features from a video sequence, 2) event candidate detection of highlight events based on the information provided by the feature extraction Methods and the Hidden Markov Model (HMM), 3) goal event selection to finally determine the video intervals to be included in the summary. For this purpose we compared the performance of the well known Mel-scale Frequency Cepstral Coefficients (MFCC) feature extraction method vs. MPEG-7 Audio Spectrum Projection feature (ASP) extraction method based on three different decomposition methods namely Principal Component Analysis( PCA), Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF). To evaluate our system we collected five soccer game videos from various sources. In total we have seven hours of soccer games consisting of eight gigabytes of data. One of five soccer games is used as the training data (e.g., announcers' excited speech, audience ambient speech noise, audience clapping, environmental sounds). Our goal event detection results are encouraging.
The effect of category learning on attentional modulation of visual cortex.
Folstein, Jonathan R; Fuller, Kelly; Howard, Dorothy; DePatie, Thomas
2017-09-01
Learning about visual object categories causes changes in the way we perceive those objects. One likely mechanism by which this occurs is the application of attention to potentially relevant objects. Here we test the hypothesis that category membership influences the allocation of attention, allowing attention to be applied not only to object features, but to entire categories. Participants briefly learned to categorize a set of novel cartoon animals after which EEG was recorded while participants distinguished between a target and non-target category. A second identical EEG session was conducted after two sessions of categorization practice. The category structure and task design allowed parametric manipulation of number of target features while holding feature frequency and category membership constant. We found no evidence that category membership influenced attentional selection: a postero-lateral negative component, labeled the selection negativity/N250, increased over time and was sensitive to number of target features, not target categories. In contrast, the right hemisphere N170 was not sensitive to target features. The P300 appeared sensitive to category in the first session, but showed a graded sensitivity to number of target features in the second session, possibly suggesting a transition from rule-based to similarity based categorization. Copyright © 2017. Published by Elsevier Ltd.
Bonet, Isis; Franco-Montero, Pedro; Rivero, Virginia; Teijeira, Marta; Borges, Fernanda; Uriarte, Eugenio; Morales Helguera, Aliuska
2013-12-23
A(2B) adenosine receptor antagonists may be beneficial in treating diseases like asthma, diabetes, diabetic retinopathy, and certain cancers. This has stimulated research for the development of potent ligands for this subtype, based on quantitative structure-affinity relationships. In this work, a new ensemble machine learning algorithm is proposed for classification and prediction of the ligand-binding affinity of A(2B) adenosine receptor antagonists. This algorithm is based on the training of different classifier models with multiple training sets (composed of the same compounds but represented by diverse features). The k-nearest neighbor, decision trees, neural networks, and support vector machines were used as single classifiers. To select the base classifiers for combining into the ensemble, several diversity measures were employed. The final multiclassifier prediction results were computed from the output obtained by using a combination of selected base classifiers output, by utilizing different mathematical functions including the following: majority vote, maximum and average probability. In this work, 10-fold cross- and external validation were used. The strategy led to the following results: i) the single classifiers, together with previous features selections, resulted in good overall accuracy, ii) a comparison between single classifiers, and their combinations in the multiclassifier model, showed that using our ensemble gave a better performance than the single classifier model, and iii) our multiclassifier model performed better than the most widely used multiclassifier models in the literature. The results and statistical analysis demonstrated the supremacy of our multiclassifier approach for predicting the affinity of A(2B) adenosine receptor antagonists, and it can be used to develop other QSAR models.
Ma, Xin; Guo, Jing; Sun, Xiao
2015-01-01
The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention
Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei
2016-01-01
An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features. PMID:26759193
Deep convolutional neural network for mammographic density segmentation
NASA Astrophysics Data System (ADS)
Wei, Jun; Li, Songfeng; Chan, Heang-Ping; Helvie, Mark A.; Roubidoux, Marilyn A.; Lu, Yao; Zhou, Chuan; Hadjiiski, Lubomir; Samala, Ravi K.
2018-02-01
Breast density is one of the most significant factors for cancer risk. In this study, we proposed a supervised deep learning approach for automated estimation of percentage density (PD) on digital mammography (DM). The deep convolutional neural network (DCNN) was trained to estimate a probability map of breast density (PMD). PD was calculated as the ratio of the dense area to the breast area based on the probability of each pixel belonging to dense region or fatty region at a decision threshold of 0.5. The DCNN estimate was compared to a feature-based statistical learning approach, in which gray level, texture and morphological features were extracted from each ROI and the least absolute shrinkage and selection operator (LASSO) was used to select and combine the useful features to generate the PMD. The reference PD of each image was provided by two experienced MQSA radiologists. With IRB approval, we retrospectively collected 347 DMs from patient files at our institution. The 10-fold cross-validation results showed a strong correlation r=0.96 between the DCNN estimation and interactive segmentation by radiologists while that of the feature-based statistical learning approach vs radiologists' segmentation had a correlation r=0.78. The difference between the segmentation by DCNN and by radiologists was significantly smaller than that between the feature-based learning approach and radiologists (p < 0.0001) by two-tailed paired t-test. This study demonstrated that the DCNN approach has the potential to replace radiologists' interactive thresholding in PD estimation on DMs.
EFS: an ensemble feature selection tool implemented as R-package and web-application.
Neumann, Ursula; Genze, Nikita; Heider, Dominik
2017-01-01
Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.
Method for indexing and retrieving manufacturing-specific digital imagery based on image content
Ferrell, Regina K.; Karnowski, Thomas P.; Tobin, Jr., Kenneth W.
2004-06-15
A method for indexing and retrieving manufacturing-specific digital images based on image content comprises three steps. First, at least one feature vector can be extracted from a manufacturing-specific digital image stored in an image database. In particular, each extracted feature vector corresponds to a particular characteristic of the manufacturing-specific digital image, for instance, a digital image modality and overall characteristic, a substrate/background characteristic, and an anomaly/defect characteristic. Notably, the extracting step includes generating a defect mask using a detection process. Second, using an unsupervised clustering method, each extracted feature vector can be indexed in a hierarchical search tree. Third, a manufacturing-specific digital image associated with a feature vector stored in the hierarchicial search tree can be retrieved, wherein the manufacturing-specific digital image has image content comparably related to the image content of the query image. More particularly, can include two data reductions, the first performed based upon a query vector extracted from a query image. Subsequently, a user can select relevant images resulting from the first data reduction. From the selection, a prototype vector can be calculated, from which a second-level data reduction can be performed. The second-level data reduction can result in a subset of feature vectors comparable to the prototype vector, and further comparable to the query vector. An additional fourth step can include managing the hierarchical search tree by substituting a vector average for several redundant feature vectors encapsulated by nodes in the hierarchical search tree.
NASA Astrophysics Data System (ADS)
Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan
2017-09-01
Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.
Alvarez-Meza, Andres M.; Orozco-Gutierrez, Alvaro; Castellanos-Dominguez, German
2017-01-01
We introduce Enhanced Kernel-based Relevance Analysis (EKRA) that aims to support the automatic identification of brain activity patterns using electroencephalographic recordings. EKRA is a data-driven strategy that incorporates two kernel functions to take advantage of the available joint information, associating neural responses to a given stimulus condition. Regarding this, a Centered Kernel Alignment functional is adjusted to learning the linear projection that best discriminates the input feature set, optimizing the required free parameters automatically. Our approach is carried out in two scenarios: (i) feature selection by computing a relevance vector from extracted neural features to facilitating the physiological interpretation of a given brain activity task, and (ii) enhanced feature selection to perform an additional transformation of relevant features aiming to improve the overall identification accuracy. Accordingly, we provide an alternative feature relevance analysis strategy that allows improving the system performance while favoring the data interpretability. For the validation purpose, EKRA is tested in two well-known tasks of brain activity: motor imagery discrimination and epileptic seizure detection. The obtained results show that the EKRA approach estimates a relevant representation space extracted from the provided supervised information, emphasizing the salient input features. As a result, our proposal outperforms the state-of-the-art methods regarding brain activity discrimination accuracy with the benefit of enhanced physiological interpretation about the task at hand. PMID:29056897
Sandhill crane roost selection, human disturbance, and forage resources
Pearse, Aaron T.; Krapu, Gary; Brandt, David
2017-01-01
Sites used for roosting represent a key habitat requirement for many species of birds because availability and quality of roost sites can influence individual fitness. Birds select roost sites based on numerous factors, requirements, and motivations, and selection of roosts can be dynamic in time and space because of various ecological and environmental influences. For sandhill cranes (Antigone canadensis) at their main spring-staging area along the Platte River in south-central Nebraska, USA, past investigations of roosting cranes focused on physical channel characteristics related to perceived security as motivating roost distribution. We used 6,310 roost sites selected by 313 sandhill cranes over 5 spring migration seasons (2003–2007) to quantify resource selection functions of roost sites on the central Platte River using a discrete choice analysis. Sandhill cranes generally showed stronger selection for wider channels with shorter bank vegetation situated farther from potential human disturbance features such as roads, bridges, and dwellings. Furthermore, selection for roost sites with preferable physical characteristics (wide channels with short bank vegetation) was more resilient to nearby disturbance features than more narrow channels with taller bank vegetation. The amount of cornfields surrounding sandhill crane roost sites positively influenced relative probability of use but only for more narrow channels < 100 m and those with shorter bank vegetation. We confirmed key resource features that sandhill cranes selected at river channels along the Platte River, and after incorporating spatial variation due to human disturbance, our understanding of roost site selection was more robust, providing insights on how disturbance may interact with physical habitat features. Managers can use information on roost-site selection when developing plans to increase probability of crane use at existing roost sites and to identify new areas for potential use if existing sites become limited.
NASA Astrophysics Data System (ADS)
Książek, Judyta
2015-10-01
At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.
Intrinsic feature-based pose measurement for imaging motion compensation
Baba, Justin S.; Goddard, Jr., James Samuel
2014-08-19
Systems and methods for generating motion corrected tomographic images are provided. A method includes obtaining first images of a region of interest (ROI) to be imaged and associated with a first time, where the first images are associated with different positions and orientations with respect to the ROI. The method also includes defining an active region in the each of the first images and selecting intrinsic features in each of the first images based on the active region. Second, identifying a portion of the intrinsic features temporally and spatially matching intrinsic features in corresponding ones of second images of the ROI associated with a second time prior to the first time and computing three-dimensional (3D) coordinates for the portion of the intrinsic features. Finally, the method includes computing a relative pose for the first images based on the 3D coordinates.
Wakui, Takashi; Matsumoto, Tsuyoshi; Matsubara, Kenta; Kawasaki, Tomoyuki; Yamaguchi, Hiroshi; Akutsu, Hidenori
2017-10-01
We propose an image analysis method for quality evaluation of human pluripotent stem cells based on biologically interpretable features. It is important to maintain the undifferentiated state of induced pluripotent stem cells (iPSCs) while culturing the cells during propagation. Cell culture experts visually select good quality cells exhibiting the morphological features characteristic of undifferentiated cells. Experts have empirically determined that these features comprise prominent and abundant nucleoli, less intercellular spacing, and fewer differentiating cellular nuclei. We quantified these features based on experts' visual inspection of phase contrast images of iPSCs and found that these features are effective for evaluating iPSC quality. We then developed an iPSC quality evaluation method using an image analysis technique. The method allowed accurate classification, equivalent to visual inspection by experts, of three iPSC cell lines.
NASA Astrophysics Data System (ADS)
Wang, Min; Cui, Qi; Wang, Jie; Ming, Dongping; Lv, Guonian
2017-01-01
In this paper, we first propose several novel concepts for object-based image analysis, which include line-based shape regularity, line density, and scale-based best feature value (SBV), based on the region-line primitive association framework (RLPAF). We then propose a raft cultivation area (RCA) extraction method for high spatial resolution (HSR) remote sensing imagery based on multi-scale feature fusion and spatial rule induction. The proposed method includes the following steps: (1) Multi-scale region primitives (segments) are obtained by image segmentation method HBC-SEG, and line primitives (straight lines) are obtained by phase-based line detection method. (2) Association relationships between regions and lines are built based on RLPAF, and then multi-scale RLPAF features are extracted and SBVs are selected. (3) Several spatial rules are designed to extract RCAs within sea waters after land and water separation. Experiments show that the proposed method can successfully extract different-shaped RCAs from HR images with good performance.
Hyperspectral image classification based on local binary patterns and PCANet
NASA Astrophysics Data System (ADS)
Yang, Huizhen; Gao, Feng; Dong, Junyu; Yang, Yang
2018-04-01
Hyperspectral image classification has been well acknowledged as one of the challenging tasks of hyperspectral data processing. In this paper, we propose a novel hyperspectral image classification framework based on local binary pattern (LBP) features and PCANet. In the proposed method, linear prediction error (LPE) is first employed to select a subset of informative bands, and LBP is utilized to extract texture features. Then, spectral and texture features are stacked into a high dimensional vectors. Next, the extracted features of a specified position are transformed to a 2-D image. The obtained images of all pixels are fed into PCANet for classification. Experimental results on real hyperspectral dataset demonstrate the effectiveness of the proposed method.
Saba, Luca; Jain, Pankaj K; Suri, Harman S; Ikeda, Nobutaka; Araki, Tadashi; Singh, Bikesh K; Nicolaides, Andrew; Shafique, Shoaib; Gupta, Ajay; Laird, John R; Suri, Jasjit S
2017-06-01
Severe atherosclerosis disease in carotid arteries causes stenosis which in turn leads to stroke. Machine learning systems have been previously developed for plaque wall risk assessment using morphology-based characterization. The fundamental assumption in such systems is the extraction of the grayscale features of the plaque region. Even though these systems have the ability to perform risk stratification, they lack the ability to achieve higher performance due their inability to select and retain dominant features. This paper introduces a polling-based principal component analysis (PCA) strategy embedded in the machine learning framework to select and retain dominant features, resulting in superior performance. This leads to more stability and reliability. The automated system uses offline image data along with the ground truth labels to generate the parameters, which are then used to transform the online grayscale features to predict the risk of stroke. A set of sixteen grayscale plaque features is computed. Utilizing the cross-validation protocol (K = 10), and the PCA cutoff of 0.995, the machine learning system is able to achieve an accuracy of 98.55 and 98.83%corresponding to the carotidfar wall and near wall plaques, respectively. The corresponding reliability of the system was 94.56 and 95.63%, respectively. The automated system was validated against the manual risk assessment system and the precision of merit for same cross-validation settings and PCA cutoffs are 98.28 and 93.92%for the far and the near wall, respectively.PCA-embedded morphology-based plaque characterization shows a powerful strategy for risk assessment and can be adapted in clinical settings.
Predictive models reduce talent development costs in female gymnastics.
Pion, Johan; Hohmann, Andreas; Liu, Tianbiao; Lenoir, Matthieu; Segers, Veerle
2017-04-01
This retrospective study focuses on the comparison of different predictive models based on the results of a talent identification test battery for female gymnasts. We studied to what extent these models have the potential to optimise selection procedures, and at the same time reduce talent development costs in female artistic gymnastics. The dropout rate of 243 female elite gymnasts was investigated, 5 years past talent selection, using linear (discriminant analysis) and non-linear predictive models (Kohonen feature maps and multilayer perceptron). The coaches classified 51.9% of the participants correct. Discriminant analysis improved the correct classification to 71.6% while the non-linear technique of Kohonen feature maps reached 73.7% correctness. Application of the multilayer perceptron even classified 79.8% of the gymnasts correctly. The combination of different predictive models for talent selection can avoid deselection of high-potential female gymnasts. The selection procedure based upon the different statistical analyses results in decrease of 33.3% of cost because the pool of selected athletes can be reduced to 92 instead of 138 gymnasts (as selected by the coaches). Reduction of the costs allows the limited resources to be fully invested in the high-potential athletes.
Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest
Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan
2018-01-01
Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548
NASA Astrophysics Data System (ADS)
Drwal, Malgorzata N.; Agama, Keli; Pommier, Yves; Griffith, Renate
2013-12-01
Purely structure-based pharmacophores (SBPs) are an alternative method to ligand-based approaches and have the advantage of describing the entire interaction capability of a binding pocket. Here, we present the development of SBPs for topoisomerase I, an anticancer target with an unusual ligand binding pocket consisting of protein and DNA atoms. Different approaches to cluster and select pharmacophore features are investigated, including hierarchical clustering and energy calculations. In addition, the performance of SBPs is evaluated retrospectively and compared to the performance of ligand- and complex-based pharmacophores. SBPs emerge as a valid method in virtual screening and a complementary approach to ligand-focussed methods. The study further reveals that the choice of pharmacophore feature clustering and selection methods has a large impact on the virtual screening hit lists. A prospective application of the SBPs in virtual screening reveals that they can be used successfully to identify novel topoisomerase inhibitors.
Fiot, Jean-Baptiste; Cohen, Laurent D; Raniga, Parnesh; Fripp, Jurgen
2013-09-01
Support vector machines (SVM) are machine learning techniques that have been used for segmentation and classification of medical images, including segmentation of white matter hyper-intensities (WMH). Current approaches using SVM for WMH segmentation extract features from the brain and classify these followed by complex post-processing steps to remove false positives. The method presented in this paper combines advanced pre-processing, tissue-based feature selection and SVM classification to obtain efficient and accurate WMH segmentation. Features from 125 patients, generated from up to four MR modalities [T1-w, T2-w, proton-density and fluid attenuated inversion recovery(FLAIR)], differing neighbourhood sizes and the use of multi-scale features were compared. We found that although using all four modalities gave the best overall classification (average Dice scores of 0.54 ± 0.12, 0.72 ± 0.06 and 0.82 ± 0.06 respectively for small, moderate and severe lesion loads); this was not significantly different (p = 0.50) from using just T1-w and FLAIR sequences (Dice scores of 0.52 ± 0.13, 0.71 ± 0.08 and 0.81 ± 0.07). Furthermore, there was a negligible difference between using 5 × 5 × 5 and 3 × 3 × 3 features (p = 0.93). Finally, we show that careful consideration of features and pre-processing techniques not only saves storage space and computation time but also leads to more efficient classification, which outperforms the one based on all features with post-processing. Copyright © 2013 John Wiley & Sons, Ltd.
Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei
2018-04-01
To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P < 0.05) and had good interobserver agreement. An optimal feature subset including 11 features was further selected by the SVM-RFE method. The SVM-RFE+SMOTE classifier achieved the best performance in discriminating between small AMLwvf and RCC, with the highest accuracy, sensitivity, specificity and AUC of 93.9 %, 87.8 %, 100 % and 0.955, respectively. Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.
Feature selection methods for big data bioinformatics: A survey from the search perspective.
Wang, Lipo; Wang, Yaoli; Chang, Qing
2016-12-01
This paper surveys main principles of feature selection and their recent applications in big data bioinformatics. Instead of the commonly used categorization into filter, wrapper, and embedded approaches to feature selection, we formulate feature selection as a combinatorial optimization or search problem and categorize feature selection methods into exhaustive search, heuristic search, and hybrid methods, where heuristic search methods may further be categorized into those with or without data-distilled feature ranking measures. Copyright © 2016 Elsevier Inc. All rights reserved.
Chao, Pei-Kuang; Wang, Chun-Li; Chan, Hsiao-Lung
2012-03-01
Predicting response after cardiac resynchronization therapy (CRT) has been a challenge of cardiologists. About 30% of selected patients based on the standard selection criteria for CRT do not show response after receiving the treatment. This study is aimed to build an intelligent classifier to assist in identifying potential CRT responders by speckle-tracking radial strain based on echocardiograms. The echocardiograms analyzed were acquired before CRT from 26 patients who have received CRT. Sequential forward selection was performed on the parameters obtained by peak-strain timing and phase space reconstruction on speckle-tracking radial strain to find an optimal set of features for creating intelligent classifiers. Support vector machine (SVM) with a linear, quadratic, and polynominal kernel were tested to build classifiers to identify potential responders and non-responders for CRT by selected features. Based on random sub-sampling validation, the best classification performance is correct rate about 95% with 96-97% sensitivity and 93-94% specificity achieved by applying SVM with a quadratic kernel on a set of 3 parameters. The selected 3 parameters contain both indexes extracted by peak-strain timing and phase space reconstruction. An intelligent classifier with an averaged correct rate, sensitivity and specificity above 90% for assisting in identifying CRT responders is built by speckle-tracking radial strain. The classifier can be applied to provide objective suggestion for patient selection of CRT. Copyright © 2011 Elsevier B.V. All rights reserved.
Park, Eunjeong; Chang, Hyuk-Jae; Nam, Hyo Suk
2017-04-18
The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3%. Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ©Eunjeong Park, Hyuk-Jae Chang, Hyo Suk Nam. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.04.2017.
An Android malware detection system based on machine learning
NASA Astrophysics Data System (ADS)
Wen, Long; Yu, Haiyang
2017-08-01
The Android smartphone, with its open source character and excellent performance, has attracted many users. However, the convenience of the Android platform also has motivated the development of malware. The traditional method which detects the malware based on the signature is unable to detect unknown applications. The article proposes a machine learning-based lightweight system that is capable of identifying malware on Android devices. In this system we extract features based on the static analysis and the dynamitic analysis, then a new feature selection approach based on principle component analysis (PCA) and relief are presented in the article to decrease the dimensions of the features. After that, a model will be constructed with support vector machine (SVM) for classification. Experimental results show that our system provides an effective method in Android malware detection.
Efficacy Evaluation of Different Wavelet Feature Extraction Methods on Brain MRI Tumor Detection
NASA Astrophysics Data System (ADS)
Nabizadeh, Nooshin; John, Nigel; Kubat, Miroslav
2014-03-01
Automated Magnetic Resonance Imaging brain tumor detection and segmentation is a challenging task. Among different available methods, feature-based methods are very dominant. While many feature extraction techniques have been employed, it is still not quite clear which of feature extraction methods should be preferred. To help improve the situation, we present the results of a study in which we evaluate the efficiency of using different wavelet transform features extraction methods in brain MRI abnormality detection. Applying T1-weighted brain image, Discrete Wavelet Transform (DWT), Discrete Wavelet Packet Transform (DWPT), Dual Tree Complex Wavelet Transform (DTCWT), and Complex Morlet Wavelet Transform (CMWT) methods are applied to construct the feature pool. Three various classifiers as Support Vector Machine, K Nearest Neighborhood, and Sparse Representation-Based Classifier are applied and compared for classifying the selected features. The results show that DTCWT and CMWT features classified with SVM, result in the highest classification accuracy, proving of capability of wavelet transform features to be informative in this application.