binary classification problem: Topics by Science.gov

Sample records for binary classification problem

About decomposition approach for solving the classification problem

NASA Astrophysics Data System (ADS)

Andrianova, A. A.

2016-11-01

This article describes the features of the application of an algorithm with using of decomposition methods for solving the binary classification problem of constructing a linear classifier based on Support Vector Machine method. Application of decomposition reduces the volume of calculations, in particular, due to the emerging possibilities to build parallel versions of the algorithm, which is a very important advantage for the solution of problems with big data. The analysis of the results of computational experiments conducted using the decomposition approach. The experiment use known data set for binary classification problem.
a Gsa-Svm Hybrid System for Classification of Binary Problems

NASA Astrophysics Data System (ADS)

Sarafrazi, Soroor; Nezamabadi-pour, Hossein; Barahman, Mojgan

2011-06-01

This paperhybridizesgravitational search algorithm (GSA) with support vector machine (SVM) and made a novel GSA-SVM hybrid system to improve the classification accuracy in binary problems. GSA is an optimization heuristic toolused to optimize the value of SVM kernel parameter (in this paper, radial basis function (RBF) is chosen as the kernel function). The experimental results show that this newapproach can achieve high classification accuracy and is comparable to or better than the particle swarm optimization (PSO)-SVM and genetic algorithm (GA)-SVM, which are two hybrid systems for classification.
A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and Kappa

Treesearch

Elizabeth A. Freeman; Gretchen G. Moisen

2008-01-01

Modelling techniques used in binary classification problems often result in a predicted probability surface, which is then translated into a presence - absence classification map. However, this translation requires a (possibly subjective) choice of threshold above which the variable of interest is predicted to be present. The selection of this threshold value can have...
Multiclass classification of microarray data samples with a reduced number of genes

PubMed Central

2011-01-01

Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522
Optical Neural Classification Of Binary Patterns

NASA Astrophysics Data System (ADS)

Gustafson, Steven C.; Little, Gordon R.

1988-05-01

Binary pattern classification that may be implemented using optical hardware and neural network algorithms is considered. Pattern classification problems that have no concise description (as in classifying handwritten characters) or no concise computation (as in NP-complete problems) are expected to be particularly amenable to this approach. For example, optical processors that efficiently classify binary patterns in accordance with their Boolean function complexity might be designed. As a candidate for such a design, an optical neural network model is discussed that is designed for binary pattern classification and that consists of an optical resonator with a dynamic multiplex-recorded reflection hologram and a phase conjugate mirror with thresholding and gain. In this model, learning or training examples of binary patterns may be recorded on the hologram such that one bit in each pattern marks the pattern class. Any input pattern, including one with an unknown class or marker bit, will be modified by a large number of parallel interactions with the reflection hologram and nonlinear mirror. After perhaps several seconds and 100 billion interactions, a steady-state pattern may develop with a marker bit that represents a minimum-Boolean-complexity classification of the input pattern. Computer simulations are presented that illustrate progress in understanding the behavior of this model and in developing a processor design that could have commanding and enduring performance advantages compared to current pattern classification techniques.
Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies.

PubMed

Zheng, Wenjing; Balzer, Laura; van der Laan, Mark; Petersen, Maya

2018-01-30

Binary classification problems are ubiquitous in health and social sciences. In many cases, one wishes to balance two competing optimality considerations for a binary classifier. For instance, in resource-limited settings, an human immunodeficiency virus prevention program based on offering pre-exposure prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program. In this article, we consider a general class of constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold. These include the minimization of the rate of positive predictions subject to a minimum sensitivity, the maximization of sensitivity subject to a maximum rate of positive predictions, and the Neyman-Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner methodology. This approach linearly combines a user-supplied library of scoring algorithms, with combination weights and a discriminating threshold chosen to minimize the constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individualized PrEP targeting strategy in a resource-limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Optimal aggregation of binary classifiers for multiclass cancer diagnosis using gene expression profiles.

PubMed

Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin

2009-01-01

Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.
Combining multiple decisions: applications to bioinformatics

NASA Astrophysics Data System (ADS)

Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.

2008-01-01

Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.
Binary image classification

NASA Technical Reports Server (NTRS)

Morris, Carl N.

1987-01-01

Motivated by the LANDSAT problem of estimating the probability of crop or geological types based on multi-channel satellite imagery data, Morris and Kostal (1983), Hill, Hinkley, Kostal, and Morris (1984), and Morris, Hinkley, and Johnston (1985) developed an empirical Bayes approach to this problem. Here, researchers return to those developments, making certain improvements and extensions, but restricting attention to the binary case of only two attributes.
A multiple maximum scatter difference discriminant criterion for facial feature extraction.

PubMed

Song, Fengxi; Zhang, David; Mei, Dayong; Guo, Zhongwei

2007-12-01

Maximum scatter difference (MSD) discriminant criterion was a recently presented binary discriminant criterion for pattern classification that utilizes the generalized scatter difference rather than the generalized Rayleigh quotient as a class separability measure, thereby avoiding the singularity problem when addressing small-sample-size problems. MSD classifiers based on this criterion have been quite effective on face-recognition tasks, but as they are binary classifiers, they are not as efficient on large-scale classification tasks. To address the problem, this paper generalizes the classification-oriented binary criterion to its multiple counterpart--multiple MSD (MMSD) discriminant criterion for facial feature extraction. The MMSD feature-extraction method, which is based on this novel discriminant criterion, is a new subspace-based feature-extraction method. Unlike most other subspace-based feature-extraction methods, the MMSD computes its discriminant vectors from both the range of the between-class scatter matrix and the null space of the within-class scatter matrix. The MMSD is theoretically elegant and easy to calculate. Extensive experimental studies conducted on the benchmark database, FERET, show that the MMSD out-performs state-of-the-art facial feature-extraction methods such as null space method, direct linear discriminant analysis (LDA), eigenface, Fisherface, and complete LDA.
Motor Oil Classification using Color Histograms and Pattern Recognition Techniques.

PubMed

Ahmadi, Shiva; Mani-Varnosfaderani, Ahmad; Habibi, Biuck

2018-04-20

Motor oil classification is important for quality control and the identification of oil adulteration. In thiswork, we propose a simple, rapid, inexpensive and nondestructive approach based on image analysis and pattern recognition techniques for the classification of nine different types of motor oils according to their corresponding color histograms. For this, we applied color histogram in different color spaces such as red green blue (RGB), grayscale, and hue saturation intensity (HSI) in order to extract features that can help with the classification procedure. These color histograms and their combinations were used as input for model development and then were statistically evaluated by using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) techniques. Here, two common solutions for solving a multiclass classification problem were applied: (1) transformation to binary classification problem using a one-against-all (OAA) approach and (2) extension from binary classifiers to a single globally optimized multilabel classification model. In the OAA strategy, LDA, QDA, and SVM reached up to 97% in terms of accuracy, sensitivity, and specificity for both the training and test sets. In extension from binary case, despite good performances by the SVM classification model, QDA and LDA provided better results up to 92% for RGB-grayscale-HSI color histograms and up to 93% for the HSI color map, respectively. In order to reduce the numbers of independent variables for modeling, a principle component analysis algorithm was used. Our results suggest that the proposed method is promising for the identification and classification of different types of motor oils.
A Ternary Hybrid EEG-NIRS Brain-Computer Interface for the Classification of Brain Activation Patterns during Mental Arithmetic, Motor Imagery, and Idle State.

PubMed

Shin, Jaeyoung; Kwon, Jinuk; Im, Chang-Hwan

2018-01-01

The performance of a brain-computer interface (BCI) can be enhanced by simultaneously using two or more modalities to record brain activity, which is generally referred to as a hybrid BCI. To date, many BCI researchers have tried to implement a hybrid BCI system by combining electroencephalography (EEG) and functional near-infrared spectroscopy (NIRS) to improve the overall accuracy of binary classification. However, since hybrid EEG-NIRS BCI, which will be denoted by hBCI in this paper, has not been applied to ternary classification problems, paradigms and classification strategies appropriate for ternary classification using hBCI are not well investigated. Here we propose the use of an hBCI for the classification of three brain activation patterns elicited by mental arithmetic, motor imagery, and idle state, with the aim to elevate the information transfer rate (ITR) of hBCI by increasing the number of classes while minimizing the loss of accuracy. EEG electrodes were placed over the prefrontal cortex and the central cortex, and NIRS optodes were placed only on the forehead. The ternary classification problem was decomposed into three binary classification problems using the "one-versus-one" (OVO) classification strategy to apply the filter-bank common spatial patterns filter to EEG data. A 10 × 10-fold cross validation was performed using shrinkage linear discriminant analysis (sLDA) to evaluate the average classification accuracies for EEG-BCI, NIRS-BCI, and hBCI when the meta-classification method was adopted to enhance classification accuracy. The ternary classification accuracies for EEG-BCI, NIRS-BCI, and hBCI were 76.1 ± 12.8, 64.1 ± 9.7, and 82.2 ± 10.2%, respectively. The classification accuracy of the proposed hBCI was thus significantly higher than those of the other BCIs ( p < 0.005). The average ITR for the proposed hBCI was calculated to be 4.70 ± 1.92 bits/minute, which was 34.3% higher than that reported for a previous binary hBCI study.
Stock Market Index Data and indicators for Day Trading as a Binary Classification problem.

PubMed

Bruni, Renato

2017-02-01

Classification is the attribution of labels to records according to a criterion automatically learned from a training set of labeled records. This task is needed in a huge number of practical applications, and consequently it has been studied intensively and several classification algorithms are available today. In finance, a stock market index is a measurement of value of a section of the stock market. It is often used to describe the aggregate trend of a market. One basic financial issue would be forecasting this trend. Clearly, such a stochastic value is very difficult to predict. However, technical analysis is a security analysis methodology developed to forecast the direction of prices through the study of past market data. Day trading consists in buying and selling financial instruments within the same trading day. In this case, one interesting problem is the automatic individuation of favorable days for trading. We model this problem as a binary classification problem, and we provide datasets containing daily index values, the corresponding values of a selection of technical indicators, and the class label, which is 1 if the subsequent time period is favorable for day trading and 0 otherwise. These datasets can be used to test the behavior of different approaches in solving the day trading problem.
Fast Solution in Sparse LDA for Binary Classification

NASA Technical Reports Server (NTRS)

Moghaddam, Baback

2010-01-01

An algorithm that performs sparse linear discriminant analysis (Sparse-LDA) finds near-optimal solutions in far less time than the prior art when specialized to binary classification (of 2 classes). Sparse-LDA is a type of feature- or variable- selection problem with numerous applications in statistics, machine learning, computer vision, computational finance, operations research, and bio-informatics. Because of its combinatorial nature, feature- or variable-selection problems are NP-hard or computationally intractable in cases involving more than 30 variables or features. Therefore, one typically seeks approximate solutions by means of greedy search algorithms. The prior Sparse-LDA algorithm was a greedy algorithm that considered the best variable or feature to add/ delete to/ from its subsets in order to maximally discriminate between multiple classes of data. The present algorithm is designed for the special but prevalent case of 2-class or binary classification (e.g. 1 vs. 0, functioning vs. malfunctioning, or change versus no change). The present algorithm provides near-optimal solutions on large real-world datasets having hundreds or even thousands of variables or features (e.g. selecting the fewest wavelength bands in a hyperspectral sensor to do terrain classification) and does so in typical computation times of minutes as compared to days or weeks as taken by the prior art. Sparse LDA requires solving generalized eigenvalue problems for a large number of variable subsets (represented by the submatrices of the input within-class and between-class covariance matrices). In the general (fullrank) case, the amount of computation scales at least cubically with the number of variables and thus the size of the problems that can be solved is limited accordingly. However, in binary classification, the principal eigenvalues can be found using a special analytic formula, without resorting to costly iterative techniques. The present algorithm exploits this analytic form along with the inherent sequential nature of greedy search itself. Together this enables the use of highly-efficient partitioned-matrix-inverse techniques that result in large speedups of computation in both the forward-selection and backward-elimination stages of greedy algorithms in general.
Non-parametric analysis of LANDSAT maps using neural nets and parallel computers

NASA Technical Reports Server (NTRS)

Salu, Yehuda; Tilton, James

1991-01-01

Nearest neighbor approaches and a new neural network, the Binary Diamond, are used for the classification of images of ground pixels obtained by LANDSAT satellite. The performances are evaluated by comparing classifications of a scene in the vicinity of Washington DC. The problem of optimal selection of categories is addressed as a step in the classification process.
Multiple kernel learning using single stage function approximation for binary classification problems

NASA Astrophysics Data System (ADS)

Shiju, S.; Sumitra, S.

2017-12-01

In this paper, the multiple kernel learning (MKL) is formulated as a supervised classification problem. We dealt with binary classification data and hence the data modelling problem involves the computation of two decision boundaries of which one related with that of kernel learning and the other with that of input data. In our approach, they are found with the aid of a single cost function by constructing a global reproducing kernel Hilbert space (RKHS) as the direct sum of the RKHSs corresponding to the decision boundaries of kernel learning and input data and searching that function from the global RKHS, which can be represented as the direct sum of the decision boundaries under consideration. In our experimental analysis, the proposed model had shown superior performance in comparison with that of existing two stage function approximation formulation of MKL, where the decision functions of kernel learning and input data are found separately using two different cost functions. This is due to the fact that single stage representation helps the knowledge transfer between the computation procedures for finding the decision boundaries of kernel learning and input data, which inturn boosts the generalisation capacity of the model.
A binary genetic programing model for teleconnection identification between global sea surface temperature and local maximum monthly rainfall events

NASA Astrophysics Data System (ADS)

Danandeh Mehr, Ali; Nourani, Vahid; Hrnjica, Bahrudin; Molajou, Amir

2017-12-01

The effectiveness of genetic programming (GP) for solving regression problems in hydrology has been recognized in recent studies. However, its capability to solve classification problems has not been sufficiently explored so far. This study develops and applies a novel classification-forecasting model, namely Binary GP (BGP), for teleconnection studies between sea surface temperature (SST) variations and maximum monthly rainfall (MMR) events. The BGP integrates certain types of data pre-processing and post-processing methods with conventional GP engine to enhance its ability to solve both regression and classification problems simultaneously. The model was trained and tested using SST series of Black Sea, Mediterranean Sea, and Red Sea as potential predictors as well as classified MMR events at two locations in Iran as predictand. Skill of the model was measured in regard to different rainfall thresholds and SST lags and compared to that of the hybrid decision tree-association rule (DTAR) model available in the literature. The results indicated that the proposed model can identify potential teleconnection signals of surrounding seas beneficial to long-term forecasting of the occurrence of the classified MMR events.
Medical image classification using spatial adjacent histogram based on adaptive local binary patterns.

PubMed

Liu, Dong; Wang, Shengsheng; Huang, Dezhi; Deng, Gang; Zeng, Fantao; Chen, Huiling

2016-05-01

Medical image recognition is an important task in both computer vision and computational biology. In the field of medical image classification, representing an image based on local binary patterns (LBP) descriptor has become popular. However, most existing LBP-based methods encode the binary patterns in a fixed neighborhood radius and ignore the spatial relationships among local patterns. The ignoring of the spatial relationships in the LBP will cause a poor performance in the process of capturing discriminative features for complex samples, such as medical images obtained by microscope. To address this problem, in this paper we propose a novel method to improve local binary patterns by assigning an adaptive neighborhood radius for each pixel. Based on these adaptive local binary patterns, we further propose a spatial adjacent histogram strategy to encode the micro-structures for image representation. An extensive set of evaluations are performed on four medical datasets which show that the proposed method significantly improves standard LBP and compares favorably with several other prevailing approaches. Copyright © 2016 Elsevier Ltd. All rights reserved.
Dynamic and scalable audio classification by collective network of binary classifiers framework: an evolutionary approach.

PubMed

Kiranyaz, Serkan; Mäkinen, Toni; Gabbouj, Moncef

2012-10-01

In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases. Copyright © 2012 Elsevier Ltd. All rights reserved.
Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification

NASA Astrophysics Data System (ADS)

Anwer, Rao Muhammad; Khan, Fahad Shahbaz; van de Weijer, Joost; Molinier, Matthieu; Laaksonen, Jorma

2018-04-01

Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The de facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Local Binary Patterns (LBP) encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit LBP based texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Furthermore, our final combination leads to consistent improvement over the state-of-the-art for remote sensing scene classification.

Spiking Neural Classifier with Lumped Dendritic Nonlinearity and Binary Synapses: A Current Mode VLSI Implementation and Analysis.

PubMed

Bhaduri, Aritra; Banerjee, Amitava; Roy, Subhrajit; Kar, Sougata; Basu, Arindam

2018-03-01

We present a neuromorphic current mode implementation of a spiking neural classifier with lumped square law dendritic nonlinearity. It has been shown previously in software simulations that such a system with binary synapses can be trained with structural plasticity algorithms to achieve comparable classification accuracy with fewer synaptic resources than conventional algorithms. We show that even in real analog systems with manufacturing imperfections (CV of 23.5% and 14.4% for dendritic branch gains and leaks respectively), this network is able to produce comparable results with fewer synaptic resources. The chip fabricated in [Formula: see text]m complementary metal oxide semiconductor has eight dendrites per cell and uses two opposing cells per class to cancel common-mode inputs. The chip can operate down to a [Formula: see text] V and dissipates 19 nW of static power per neuronal cell and [Formula: see text] 125 pJ/spike. For two-class classification problems of high-dimensional rate encoded binary patterns, the hardware achieves comparable performance as software implementation of the same with only about a 0.5% reduction in accuracy. On two UCI data sets, the IC integrated circuit has classification accuracy comparable to standard machine learners like support vector machines and extreme learning machines while using two to five times binary synapses. We also show that the system can operate on mean rate encoded spike patterns, as well as short bursts of spikes. To the best of our knowledge, this is the first attempt in hardware to perform classification exploiting dendritic properties and binary synapses.
An ensemble learning system for a 4-way classification of Alzheimer's disease and mild cognitive impairment.

PubMed

Yao, Dongren; Calhoun, Vince D; Fu, Zening; Du, Yuhui; Sui, Jing

2018-05-15

Discriminating Alzheimer's disease (AD) from its prodromal form, mild cognitive impairment (MCI), is a significant clinical problem that may facilitate early diagnosis and intervention, in which a more challenging issue is to classify MCI subtypes, i.e., those who eventually convert to AD (cMCI) versus those who do not (MCI). To solve this difficult 4-way classification problem (AD, MCI, cMCI and healthy controls), a competition was hosted by Kaggle to invite the scientific community to apply their machine learning approaches on pre-processed sets of T1-weighted magnetic resonance images (MRI) data and the demographic information from the international Alzheimer's disease neuroimaging initiative (ADNI) database. This paper summarizes our competition results. We first proposed a hierarchical process by turning the 4-way classification into five binary classification problems. A new feature selection technology based on relative importance was also proposed, aiming to identify a more informative and concise subset from 426 sMRI morphometric and 3 demographic features, to ensure each binary classifier to achieve its highest accuracy. As a result, about 2% of the original features were selected to build a new feature space, which can achieve the final four-way classification with a 54.38% accuracy on testing data through hierarchical grouping, higher than several alternative methods in comparison. More importantly, the selected discriminative features such as hippocampal volume, parahippocampal surface area, and medial orbitofrontal thickness, etc. as well as the MMSE score, are reasonable and consistent with those reported in AD/MCI deficits. In summary, the proposed method provides a new framework for multi-way classification using hierarchical grouping and precise feature selection. Copyright © 2018 Elsevier B.V. All rights reserved.
An information-based network approach for protein classification

PubMed Central

Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.

2017-01-01

Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835
Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

PubMed

Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

2013-01-01

DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
Reduction from cost-sensitive ordinal ranking to weighted binary classification.

PubMed

Lin, Hsuan-Tien; Li, Ling

2012-05-01

We present a reduction framework from ordinal ranking to binary classification. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranker from the binary classifier. Based on the framework, we show that a weighted 0/1 loss of the binary classifier upper-bounds the mislabeling cost of the ranker, both error-wise and regret-wise. Our framework allows not only the design of good ordinal ranking algorithms based on well-tuned binary classification approaches, but also the derivation of new generalization bounds for ordinal ranking from known bounds for binary classification. In addition, our framework unifies many existing ordinal ranking algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms. In addition, the newly designed algorithms lead to better cost-sensitive ordinal ranking performance, as well as improved listwise ranking performance.
Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi-class Classification Problems

DTIC Science & Technology

2013-05-28

those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new
Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval.

PubMed

Xu, Xing; Shen, Fumin; Yang, Yang; Shen, Heng Tao; Li, Xuelong

2017-05-01

Hashing based methods have attracted considerable attention for efficient cross-modal retrieval on large-scale multimedia data. The core problem of cross-modal hashing is how to learn compact binary codes that construct the underlying correlations between heterogeneous features from different modalities. A majority of recent approaches aim at learning hash functions to preserve the pairwise similarities defined by given class labels. However, these methods fail to explicitly explore the discriminative property of class labels during hash function learning. In addition, they usually discard the discrete constraints imposed on the to-be-learned binary codes, and compromise to solve a relaxed problem with quantization to obtain the approximate binary solution. Therefore, the binary codes generated by these methods are suboptimal and less discriminative to different classes. To overcome these drawbacks, we propose a novel cross-modal hashing method, termed discrete cross-modal hashing (DCH), which directly learns discriminative binary codes while retaining the discrete constraints. Specifically, DCH learns modality-specific hash functions for generating unified binary codes, and these binary codes are viewed as representative features for discriminative classification with class labels. An effective discrete optimization algorithm is developed for DCH to jointly learn the modality-specific hash function and the unified binary codes. Extensive experiments on three benchmark data sets highlight the superiority of DCH under various cross-modal scenarios and show its state-of-the-art performance.
Multi-task feature selection in microarray data by binary integer programming.

PubMed

Lan, Liang; Vucetic, Slobodan

2013-12-20

A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.
Classification of close binary systems by Svechnikov

NASA Astrophysics Data System (ADS)

Dryomova, G. N.

The paper presents the historical overview of classification schemes of eclipsing variable stars with the foreground of advantages of the classification scheme by Svechnikov being widely appreciated for Close Binary Systems due to simplicity of classification criteria and brevity.
Binary Classification using Decision Tree based Genetic Programming and Its Application to Analysis of Bio-mass Data

NASA Astrophysics Data System (ADS)

To, Cuong; Pham, Tuan D.

2010-01-01

In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.
Tabu search and binary particle swarm optimization for feature selection using microarray data.

PubMed

Chuang, Li-Yeh; Yang, Cheng-Huei; Yang, Cheng-Hong

2009-12-01

Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.
On Correlations, Distances and Error Rates.

ERIC Educational Resources Information Center

Dorans, Neil J.

The nature of the criterion (dependent) variable may play a useful role in structuring a list of classification/prediction problems. Such criteria are continuous in nature, binary dichotomous, or multichotomous. In this paper, discussion is limited to the continuous normally distributed criterion scenarios. For both cases, it is assumed that the…
Social interaction as a heuristic for combinatorial optimization problems

NASA Astrophysics Data System (ADS)

Fontanari, José F.

2010-11-01

We investigate the performance of a variant of Axelrod’s model for dissemination of culture—the Adaptive Culture Heuristic (ACH)—on solving an NP-Complete optimization problem, namely, the classification of binary input patterns of size F by a Boolean Binary Perceptron. In this heuristic, N agents, characterized by binary strings of length F which represent possible solutions to the optimization problem, are fixed at the sites of a square lattice and interact with their nearest neighbors only. The interactions are such that the agents’ strings (or cultures) become more similar to the low-cost strings of their neighbors resulting in the dissemination of these strings across the lattice. Eventually the dynamics freezes into a homogeneous absorbing configuration in which all agents exhibit identical solutions to the optimization problem. We find through extensive simulations that the probability of finding the optimal solution is a function of the reduced variable F/N1/4 so that the number of agents must increase with the fourth power of the problem size, N∝F4 , to guarantee a fixed probability of success. In this case, we find that the relaxation time to reach an absorbing configuration scales with F6 which can be interpreted as the overall computational cost of the ACH to find an optimal set of weights for a Boolean binary perceptron, given a fixed probability of success.
[Analysis of binary classification repeated measurement data with GEE and GLMMs using SPSS software].

PubMed

An, Shengli; Zhang, Yanhong; Chen, Zheng

2012-12-01

To analyze binary classification repeated measurement data with generalized estimating equations (GEE) and generalized linear mixed models (GLMMs) using SPSS19.0. GEE and GLMMs models were tested using binary classification repeated measurement data sample using SPSS19.0. Compared with SAS, SPSS19.0 allowed convenient analysis of categorical repeated measurement data using GEE and GLMMs.
Elitist Binary Wolf Search Algorithm for Heuristic Feature Selection in High-Dimensional Bioinformatics Datasets.

PubMed

Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L

2017-06-28

Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.
Applying local binary patterns in image clustering problems

NASA Astrophysics Data System (ADS)

Skorokhod, Nikolai N.; Elizarov, Alexey I.

2017-11-01

Due to the fact that the cloudiness plays a critical role in the Earth radiative balance, the study of the distribution of different types of clouds and their movements is relevant. The main sources of such information are artificial satellites that provide data in the form of images. The most commonly used method of solving tasks of processing and classification of images of clouds is based on the description of texture features. The use of a set of local binary patterns is proposed to describe the texture image.
Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

PubMed

Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

2015-06-01

Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
Recognition Using Hybrid Classifiers.

PubMed

Osadchy, Margarita; Keren, Daniel; Raviv, Dolev

2016-04-01

A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.
Weakly supervised classification in high energy physics

DOE PAGES

Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco; ...

2017-05-01

As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less
Weakly supervised classification in high energy physics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco

As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less

Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.

PubMed

Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G

2018-05-15

Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.
Multi-Scale Distributed Representation for Deep Learning and its Application to b-Jet Tagging

NASA Astrophysics Data System (ADS)

Lee, Jason Sang Hun; Park, Inkyu; Park, Sangnam

2018-06-01

Recently machine learning algorithms based on deep layered artificial neural networks (DNNs) have been applied to a wide variety of high energy physics problems such as jet tagging or event classification. We explore a simple but effective preprocessing step which transforms each realvalued observational quantity or input feature into a binary number with a fixed number of digits. Each binary digit represents the quantity or magnitude in different scales. We have shown that this approach improves the performance of DNNs significantly for some specific tasks without any further complication in feature engineering. We apply this multi-scale distributed binary representation to deep learning on b-jet tagging using daughter particles' momenta and vertex information.
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

PubMed

Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

2017-06-01

Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Identifying predictive features in drug response using machine learning: opportunities and challenges.

PubMed

Vidyasagar, Mathukumalli

2015-01-01

This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.
On the decoding process in ternary error-correcting output codes.

PubMed

Escalera, Sergio; Pujol, Oriol; Radeva, Petia

2010-01-01

A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-Correcting Output Codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a "do not care" symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI Machine Learning Repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved.
Multicategory Composite Least Squares Classifiers

PubMed Central

Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul

2010-01-01

Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128
Single image super-resolution based on approximated Heaviside functions and iterative refinement

PubMed Central

Wang, Xin-Yu; Huang, Ting-Zhu; Deng, Liang-Jian

2018-01-01

One method of solving the single-image super-resolution problem is to use Heaviside functions. This has been done previously by making a binary classification of image components as “smooth” and “non-smooth”, describing these with approximated Heaviside functions (AHFs), and iteration including l1 regularization. We now introduce a new method in which the binary classification of image components is extended to different degrees of smoothness and non-smoothness, these components being represented by various classes of AHFs. Taking into account the sparsity of the non-smooth components, their coefficients are l1 regularized. In addition, to pick up more image details, the new method uses an iterative refinement for the residuals between the original low-resolution input and the downsampled resulting image. Experimental results showed that the new method is superior to the original AHF method and to four other published methods. PMID:29329298
A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

PubMed

Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

2017-09-01

Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

PubMed

Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann

2012-09-24

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Minimal perceptrons for memorizing complex patterns

NASA Astrophysics Data System (ADS)

Pastor, Marissa; Song, Juyong; Hoang, Danh-Tai; Jo, Junghyo

2016-11-01

Feedforward neural networks have been investigated to understand learning and memory, as well as applied to numerous practical problems in pattern classification. It is a rule of thumb that more complex tasks require larger networks. However, the design of optimal network architectures for specific tasks is still an unsolved fundamental problem. In this study, we consider three-layered neural networks for memorizing binary patterns. We developed a new complexity measure of binary patterns, and estimated the minimal network size for memorizing them as a function of their complexity. We formulated the minimal network size for regular, random, and complex patterns. In particular, the minimal size for complex patterns, which are neither ordered nor disordered, was predicted by measuring their Hamming distances from known ordered patterns. Our predictions agree with simulations based on the back-propagation algorithm.
The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

PubMed Central

Wei, Qiong; Dunbrack, Roland L.

2013-01-01

Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand. PMID:23874456
Comparison of support vector machine classification to partial least squares dimension reduction with logistic descrimination of hyperspectral data

NASA Astrophysics Data System (ADS)

Wilson, Machelle; Ustin, Susan L.; Rocke, David

2003-03-01

Remote sensing technologies with high spatial and spectral resolution show a great deal of promise in addressing critical environmental monitoring issues, but the ability to analyze and interpret the data lags behind the technology. Robust analytical methods are required before the wealth of data available through remote sensing can be applied to a wide range of environmental problems for which remote detection is the best method. In this study we compare the classification effectiveness of two relatively new techniques on data consisting of leaf-level reflectance from plants that have been exposed to varying levels of heavy metal toxicity. If these methodologies work well on leaf-level data, then there is some hope that they will also work well on data from airborne and space-borne platforms. The classification methods compared were support vector machine classification of exposed and non-exposed plants based on the reflectance data, and partial east squares compression of the reflectance data followed by classification using logistic discrimination (PLS/LD). PLS/LD was performed in two ways. We used the continuous concentration data as the response during compression, and then used the binary response required during logistic discrimination. We also used a binary response during compression followed by logistic discrimination. The statistics we used to compare the effectiveness of the methodologies was the leave-one-out cross validation estimate of the prediction error.
Automatic intelligibility classification of sentence-level pathological speech

PubMed Central

Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas; Li, Ming; Narayanan, Shrikanth S.

2014-01-01

Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects’ data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes). PMID:25414544
Treelets Binary Feature Retrieval for Fast Keypoint Recognition.

PubMed

Zhu, Jianke; Wu, Chenxia; Chen, Chun; Cai, Deng

2015-10-01

Fast keypoint recognition is essential to many vision tasks. In contrast to the classification-based approaches, we directly formulate the keypoint recognition as an image patch retrieval problem, which enjoys the merit of finding the matched keypoint and its pose simultaneously. To effectively extract the binary features from each patch surrounding the keypoint, we make use of treelets transform that can group the highly correlated data together and reduce the noise through the local analysis. Treelets is a multiresolution analysis tool, which provides an orthogonal basis to reflect the geometry of the noise-free data. To facilitate the real-world applications, we have proposed two novel approaches. One is the convolutional treelets that capture the image patch information locally and globally while reducing the computational cost. The other is the higher-order treelets that reflect the relationship between the rows and columns within image patch. An efficient sub-signature-based locality sensitive hashing scheme is employed for fast approximate nearest neighbor search in patch retrieval. Experimental evaluations on both synthetic data and the real-world Oxford dataset have shown that our proposed treelets binary feature retrieval methods outperform the state-of-the-art feature descriptors and classification-based approaches.
Many local pattern texture features: which is better for image-based multilabel human protein subcellular localization classification?

PubMed

Yang, Fan; Xu, Ying-Ying; Shen, Hong-Bin

2014-01-01

Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

PubMed Central

Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

2007-01-01

Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
A Fast Optimization Method for General Binary Code Learning.

PubMed

Shen, Fumin; Zhou, Xiang; Yang, Yang; Song, Jingkuan; Shen, Heng; Tao, Dacheng

2016-09-22

Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely-used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this work, we propose a novel binary code optimization method, dubbed Discrete Proximal Linearized Minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this work by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised `2 loss encodes the whole NUS-WIDE database into 64-bit binary codes within 10 seconds on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale datasets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.
Classification of multispectral image data by the Binary Diamond neural network and by nonparametric, pixel-by-pixel methods

NASA Technical Reports Server (NTRS)

Salu, Yehuda; Tilton, James

1993-01-01

The classification of multispectral image data obtained from satellites has become an important tool for generating ground cover maps. This study deals with the application of nonparametric pixel-by-pixel classification methods in the classification of pixels, based on their multispectral data. A new neural network, the Binary Diamond, is introduced, and its performance is compared with a nearest neighbor algorithm and a back-propagation network. The Binary Diamond is a multilayer, feed-forward neural network, which learns from examples in unsupervised, 'one-shot' mode. It recruits its neurons according to the actual training set, as it learns. The comparisons of the algorithms were done by using a realistic data base, consisting of approximately 90,000 Landsat 4 Thematic Mapper pixels. The Binary Diamond and the nearest neighbor performances were close, with some advantages to the Binary Diamond. The performance of the back-propagation network lagged behind. An efficient nearest neighbor algorithm, the binned nearest neighbor, is described. Ways for improving the performances, such as merging categories, and analyzing nonboundary pixels, are addressed and evaluated.
Model selection for anomaly detection

NASA Astrophysics Data System (ADS)

Burnaev, E.; Erofeev, P.; Smolyakov, D.

2015-12-01

Anomaly detection based on one-class classification algorithms is broadly used in many applied domains like image processing (e.g. detection of whether a patient is "cancerous" or "healthy" from mammography image), network intrusion detection, etc. Performance of an anomaly detection algorithm crucially depends on a kernel, used to measure similarity in a feature space. The standard approaches (e.g. cross-validation) for kernel selection, used in two-class classification problems, can not be used directly due to the specific nature of a data (absence of a second, abnormal, class data). In this paper we generalize several kernel selection methods from binary-class case to the case of one-class classification and perform extensive comparison of these approaches using both synthetic and real-world data.
Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding.

PubMed

Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston

2016-10-28

Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.

A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification

PubMed Central

Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong

2016-01-01

Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs). PMID:26985826
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.

PubMed

Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong

2016-01-01

Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).
Underwater object classification using scattering transform of sonar signals

NASA Astrophysics Data System (ADS)

Saito, Naoki; Weber, David S.

2017-08-01

In this paper, we apply the scattering transform (ST)-a nonlinear map based off of a convolutional neural network (CNN)-to classification of underwater objects using sonar signals. The ST formalizes the observation that the filters learned by a CNN have wavelet-like structure. We achieve effective binary classification both on a real dataset of Unexploded Ordinance (UXOs), as well as synthetically generated examples. We also explore the effects on the waveforms with respect to changes in the object domain (e.g., translation, rotation, and acoustic impedance, etc.), and examine the consequences coming from theoretical results for the scattering transform. We show that the scattering transform is capable of excellent classification on both the synthetic and real problems, thanks to having more quasi-invariance properties that are well-suited to translation and rotation of the object.
Binary Image Classification: A Genetic Programming Approach to the Problem of Limited Training Instances.

PubMed

Al-Sahaf, Harith; Zhang, Mengjie; Johnston, Mark

2016-01-01

In the computer vision and pattern recognition fields, image classification represents an important yet difficult task. It is a challenge to build effective computer models to replicate the remarkable ability of the human visual system, which relies on only one or a few instances to learn a completely new class or an object of a class. Recently we proposed two genetic programming (GP) methods, one-shot GP and compound-GP, that aim to evolve a program for the task of binary classification in images. The two methods are designed to use only one or a few instances per class to evolve the model. In this study, we investigate these two methods in terms of performance, robustness, and complexity of the evolved programs. We use ten data sets that vary in difficulty to evaluate these two methods. We also compare them with two other GP and six non-GP methods. The results show that one-shot GP and compound-GP outperform or achieve results comparable to competitor methods. Moreover, the features extracted by these two methods improve the performance of other classifiers with handcrafted features and those extracted by a recently developed GP-based method in most cases.
Improved semi-supervised online boosting for object tracking

NASA Astrophysics Data System (ADS)

Li, Yicui; Qi, Lin; Tan, Shukun

2016-10-01

The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.
Towards a ternary NIRS-BCI: single-trial classification of verbal fluency task, Stroop task and unconstrained rest

NASA Astrophysics Data System (ADS)

Schudlo, Larissa C.; Chau, Tom

2015-12-01

Objective. The majority of near-infrared spectroscopy (NIRS) brain-computer interface (BCI) studies have investigated binary classification problems. Limited work has considered differentiation of more than two mental states, or multi-class differentiation of higher-level cognitive tasks using measurements outside of the anterior prefrontal cortex. Improvements in accuracies are needed to deliver effective communication with a multi-class NIRS system. We investigated the feasibility of a ternary NIRS-BCI that supports mental states corresponding to verbal fluency task (VFT) performance, Stroop task performance, and unconstrained rest using prefrontal and parietal measurements. Approach. Prefrontal and parietal NIRS signals were acquired from 11 able-bodied adults during rest and performance of the VFT or Stroop task. Classification was performed offline using bagging with a linear discriminant base classifier trained on a 10 dimensional feature set. Main results. VFT, Stroop task and rest were classified at an average accuracy of 71.7% ± 7.9%. The ternary classification system provided a statistically significant improvement in information transfer rate relative to a binary system controlled by either mental task (0.87 ± 0.35 bits/min versus 0.73 ± 0.24 bits/min). Significance. These results suggest that effective communication can be achieved with a ternary NIRS-BCI that supports VFT, Stroop task and rest via measurements from the frontal and parietal cortices. Further development of such a system is warranted. Accurate ternary classification can enhance communication rates offered by NIRS-BCIs, improving the practicality of this technology.
TEMPORAL CORRELATION OF CLASSIFICATIONS IN REMOTE SENSING

EPA Science Inventory

A bivariate binary model is developed for estimating the change in land cover from satellite images obtained at two different times. The binary classifications of a pixel at the two times are modeled as potentially correlated random variables, conditional on the true states of th...
Texture Classification by Texton: Statistical versus Binary

PubMed Central

Guo, Zhenhua; Zhang, Zhongcheng; Li, Xiu; Li, Qin; You, Jane

2014-01-01

Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8), image patch (Statistical_Joint) and locally invariant fractal (Statistical_Fractal) are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor. PMID:24520346
A Three-Dimensional Receiver Operator Characteristic Surface Diagnostic Metric

NASA Technical Reports Server (NTRS)

Simon, Donald L.

2011-01-01

Receiver Operator Characteristic (ROC) curves are commonly applied as metrics for quantifying the performance of binary fault detection systems. An ROC curve provides a visual representation of a detection system s True Positive Rate versus False Positive Rate sensitivity as the detection threshold is varied. The area under the curve provides a measure of fault detection performance independent of the applied detection threshold. While the standard ROC curve is well suited for quantifying binary fault detection performance, it is not suitable for quantifying the classification performance of multi-fault classification problems. Furthermore, it does not provide a measure of diagnostic latency. To address these shortcomings, a novel three-dimensional receiver operator characteristic (3D ROC) surface metric has been developed. This is done by generating and applying two separate curves: the standard ROC curve reflecting fault detection performance, and a second curve reflecting fault classification performance. A third dimension, diagnostic latency, is added giving rise to 3D ROC surfaces. Applying numerical integration techniques, the volumes under and between the surfaces are calculated to produce metrics of the diagnostic system s detection and classification performance. This paper will describe the 3D ROC surface metric in detail, and present an example of its application for quantifying the performance of aircraft engine gas path diagnostic methods. Metric limitations and potential enhancements are also discussed
Classification of octet AB-type binary compounds using dynamical charges: A materials informatics perspective

DOE PAGES

Pilania, G.; Gubernatis, J. E.; Lookman, T.

2015-12-03

The role of dynamical (or Born effective) charges in classification of octet AB-type binary compounds between four-fold (zincblende/wurtzite crystal structures) and six-fold (rocksalt crystal structure) coordinated systems is discussed. We show that the difference in the dynamical charges of the fourfold and sixfold coordinated structures, in combination with Harrison’s polarity, serves as an excellent feature to classify the coordination of 82 sp–bonded binary octet compounds. We use a support vector machine classifier to estimate the average classification accuracy and the associated variance in our model where a decision boundary is learned in a supervised manner. Lastly, we compare the out-of-samplemore » classification accuracy achieved by our feature pair with those reported previously.« less
Comparison of rule induction, decision trees and formal concept analysis approaches for classification

NASA Astrophysics Data System (ADS)

Kotelnikov, E. V.; Milov, V. R.

2018-05-01

Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
Application of machine learning on brain cancer multiclass classification

NASA Astrophysics Data System (ADS)

Panca, V.; Rustam, Z.

2017-07-01

Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Statistical learning from nonrecurrent experience with discrete input variables and recursive-error-minimization equations

NASA Astrophysics Data System (ADS)

Carter, Jeffrey R.; Simon, Wayne E.

1990-08-01

Neural networks are trained using Recursive Error Minimization (REM) equations to perform statistical classification. Using REM equations with continuous input variables reduces the required number of training experiences by factors of one to two orders of magnitude over standard back propagation. Replacing the continuous input variables with discrete binary representations reduces the number of connections by a factor proportional to the number of variables reducing the required number of experiences by another order of magnitude. Undesirable effects of using recurrent experience to train neural networks for statistical classification problems are demonstrated and nonrecurrent experience used to avoid these undesirable effects. 1. THE 1-41 PROBLEM The statistical classification problem which we address is is that of assigning points in ddimensional space to one of two classes. The first class has a covariance matrix of I (the identity matrix) the covariance matrix of the second class is 41. For this reason the problem is known as the 1-41 problem. Both classes have equal probability of occurrence and samples from both classes may appear anywhere throughout the ddimensional space. Most samples near the origin of the coordinate system will be from the first class while most samples away from the origin will be from the second class. Since the two classes completely overlap it is impossible to have a classifier with zero error. The minimum possible error is known as the Bayes error and
On multi-site damage identification using single-site training data

NASA Astrophysics Data System (ADS)

Barthorpe, R. J.; Manson, G.; Worden, K.

2017-11-01

This paper proposes a methodology for developing multi-site damage location systems for engineering structures that can be trained using single-site damaged state data only. The methodology involves training a sequence of binary classifiers based upon single-site damage data and combining the developed classifiers into a robust multi-class damage locator. In this way, the multi-site damage identification problem may be decomposed into a sequence of binary decisions. In this paper Support Vector Classifiers are adopted as the means of making these binary decisions. The proposed methodology represents an advancement on the state of the art in the field of multi-site damage identification which require either: (1) full damaged state data from single- and multi-site damage cases or (2) the development of a physics-based model to make multi-site model predictions. The potential benefit of the proposed methodology is that a significantly reduced number of recorded damage states may be required in order to train a multi-site damage locator without recourse to physics-based model predictions. In this paper it is first demonstrated that Support Vector Classification represents an appropriate approach to the multi-site damage location problem, with methods for combining binary classifiers discussed. Next, the proposed methodology is demonstrated and evaluated through application to a real engineering structure - a Piper Tomahawk trainer aircraft wing - with its performance compared to classifiers trained using the full damaged-state dataset.
Classification of skin cancer images using local binary pattern and SVM classifier

NASA Astrophysics Data System (ADS)

Adjed, Faouzi; Faye, Ibrahima; Ababsa, Fakhreddine; Gardezi, Syed Jamal; Dass, Sarat Chandra

2016-11-01

In this paper, a classification method for melanoma and non-melanoma skin cancer images has been presented using the local binary patterns (LBP). The LBP computes the local texture information from the skin cancer images, which is later used to compute some statistical features that have capability to discriminate the melanoma and non-melanoma skin tissues. Support vector machine (SVM) is applied on the feature matrix for classification into two skin image classes (malignant and benign). The method achieves good classification accuracy of 76.1% with sensitivity of 75.6% and specificity of 76.7%.
Support vector machines-based fault diagnosis for turbo-pump rotor

NASA Astrophysics Data System (ADS)

Yuan, Sheng-Fa; Chu, Fu-Lei

2006-05-01

Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
A comparison of fitness-case sampling methods for genetic programming

NASA Astrophysics Data System (ADS)

Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel

2017-11-01

Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.
Methodology for the Evaluation of the Algorithms for Text Line Segmentation Based on Extended Binary Classification

NASA Astrophysics Data System (ADS)

Brodic, D.

2011-01-01

Text line segmentation represents the key element in the optical character recognition process. Hence, testing of text line segmentation algorithms has substantial relevance. All previously proposed testing methods deal mainly with text database as a template. They are used for testing as well as for the evaluation of the text segmentation algorithm. In this manuscript, methodology for the evaluation of the algorithm for text segmentation based on extended binary classification is proposed. It is established on the various multiline text samples linked with text segmentation. Their results are distributed according to binary classification. Final result is obtained by comparative analysis of cross linked data. At the end, its suitability for different types of scripts represents its main advantage.
Orbit classification in an equal-mass non-spinning binary black hole pseudo-Newtonian system

NASA Astrophysics Data System (ADS)

Zotos, Euaggelos E.; Dubeibe, Fredy L.; González, Guillermo A.

2018-07-01

The dynamics of a test particle in a non-spinning binary black hole system of equal masses is numerically investigated. The binary system is modelled in the context of the pseudo-Newtonian circular restricted three-body problem, such that the primaries are separated by a fixed distance and move in a circular orbit around each other. In particular, the Paczyński-Wiita potential is used for describing the gravitational field of the two non-Newtonian primaries. The orbital properties of the test particle are determined through the classification of the initial conditions of the orbits, using several values of the Jacobi constant, in the Hill's regions of possible motion. The initial conditions are classified into three main categories: (i) bounded, (ii) escaping, and (iii) displaying close encounters. Using the smaller alignment index chaos indicator, we further classify bounded orbits into regular, sticky, or chaotic. To gain a complete view of the dynamics of the system, we define grids of initial conditions on different types of two-dimensional planes. The orbital structure of the configuration plane, along with the corresponding distributions of the escape and collision/close encounter times, allow us to observe the transition from the classical Newtonian to the pseudo-Newtonian regime. Our numerical results reveal a strong dependence of the properties of the considered basins with the Jacobi constant as well as with the Schwarzschild radius of the black holes.
Particle Swarm Optimization approach to defect detection in armour ceramics.

PubMed

Kesharaju, Manasa; Nagarajah, Romesh

2017-03-01

In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function. Copyright © 2016. Published by Elsevier B.V.

Learning Rotation-Invariant Local Binary Descriptor.

PubMed

Duan, Yueqi; Lu, Jiwen; Feng, Jianjiang; Zhou, Jie

2017-08-01

In this paper, we propose a rotation-invariant local binary descriptor (RI-LBD) learning method for visual recognition. Compared with hand-crafted local binary descriptors, such as local binary pattern and its variants, which require strong prior knowledge, local binary feature learning methods are more efficient and data-adaptive. Unlike existing learning-based local binary descriptors, such as compact binary face descriptor and simultaneous local binary feature learning and encoding, which are susceptible to rotations, our RI-LBD first categorizes each local patch into a rotational binary pattern (RBP), and then jointly learns the orientation for each pattern and the projection matrix to obtain RI-LBDs. As all the rotation variants of a patch belong to the same RBP, they are rotated into the same orientation and projected into the same binary descriptor. Then, we construct a codebook by a clustering method on the learned binary codes, and obtain a histogram feature for each image as the final representation. In order to exploit higher order statistical information, we extend our RI-LBD to the triple rotation-invariant co-occurrence local binary descriptor (TRICo-LBD) learning method, which learns a triple co-occurrence binary code for each local patch. Extensive experimental results on four different visual recognition tasks, including image patch matching, texture classification, face recognition, and scene classification, show that our RI-LBD and TRICo-LBD outperform most existing local descriptors.
Hierarchical Spatio-Temporal Probabilistic Graphical Model with Multiple Feature Fusion for Binary Facial Attribute Classification in Real-World Face Videos.

PubMed

Demirkus, Meltem; Precup, Doina; Clark, James J; Arbel, Tal

2016-06-01

Recent literature shows that facial attributes, i.e., contextual facial information, can be beneficial for improving the performance of real-world applications, such as face verification, face recognition, and image search. Examples of face attributes include gender, skin color, facial hair, etc. How to robustly obtain these facial attributes (traits) is still an open problem, especially in the presence of the challenges of real-world environments: non-uniform illumination conditions, arbitrary occlusions, motion blur and background clutter. What makes this problem even more difficult is the enormous variability presented by the same subject, due to arbitrary face scales, head poses, and facial expressions. In this paper, we focus on the problem of facial trait classification in real-world face videos. We have developed a fully automatic hierarchical and probabilistic framework that models the collective set of frame class distributions and feature spatial information over a video sequence. The experiments are conducted on a large real-world face video database that we have collected, labelled and made publicly available. The proposed method is flexible enough to be applied to any facial classification problem. Experiments on a large, real-world video database McGillFaces [1] of 18,000 video frames reveal that the proposed framework outperforms alternative approaches, by up to 16.96 and 10.13%, for the facial attributes of gender and facial hair, respectively.
Predicting Chemically Induced Duodenal Ulcer and Adrenal Necrosis with Classification Trees

NASA Astrophysics Data System (ADS)

Giampaolo, Casimiro; Gray, Andrew T.; Olshen, Richard A.; Szabo, Sandor

1991-07-01

Binary tree-structured statistical classification algorithms and properties of 56 model alkyl nucleophiles were brought to bear on two problems of experimental pharmacology and toxicology. Each rat of a learning sample of 745 was administered one compound and autopsied to determine the presence of duodenal ulcer or adrenal hemorrhagic necrosis. The cited statistical classification schemes were then applied to these outcomes and 67 features of the compounds to ascertain those characteristics that are associated with biologic activity. For predicting duodenal ulceration, dipole moment, melting point, and solubility in octanol are particularly important, while for predicting adrenal necrosis, important features include the number of sulfhydryl groups and double bonds. These methods may constitute inexpensive but powerful ways to screen untested compounds for possible organ-specific toxicity. Mechanisms for the etiology and pathogenesis of the duodenal and adrenal lesions are suggested, as are additional avenues for drug design.
Retargeted Least Squares Regression Algorithm.

PubMed

Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin

2015-09-01

This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.
Deep classification hashing for person re-identification

NASA Astrophysics Data System (ADS)

Wang, Jiabao; Li, Yang; Zhang, Xiancai; Miao, Zhuang; Tao, Gang

2018-04-01

As the development of surveillance in public, person re-identification becomes more and more important. The largescale databases call for efficient computation and storage, hashing technique is one of the most important methods. In this paper, we proposed a new deep classification hashing network by introducing a new binary appropriation layer in the traditional ImageNet pre-trained CNN models. It outputs binary appropriate features, which can be easily quantized into binary hash-codes for hamming similarity comparison. Experiments show that our deep hashing method can outperform the state-of-the-art methods on the public CUHK03 and Market1501 datasets.
Mining EEG with SVM for Understanding Cognitive Underpinnings of Math Problem Solving Strategies

PubMed Central

López, Julio

2018-01-01

We have developed a new methodology for examining and extracting patterns from brain electric activity by using data mining and machine learning techniques. Data was collected from experiments focused on the study of cognitive processes that might evoke different specific strategies in the resolution of math problems. A binary classification problem was constructed using correlations and phase synchronization between different electroencephalographic channels as characteristics and, as labels or classes, the math performances of individuals participating in specially designed experiments. The proposed methodology is based on using well-established procedures of feature selection, which were used to determine a suitable brain functional network size related to math problem solving strategies and also to discover the most relevant links in this network without including noisy connections or excluding significant connections. PMID:29670667
Mining EEG with SVM for Understanding Cognitive Underpinnings of Math Problem Solving Strategies.

PubMed

Bosch, Paul; Herrera, Mauricio; López, Julio; Maldonado, Sebastián

2018-01-01

We have developed a new methodology for examining and extracting patterns from brain electric activity by using data mining and machine learning techniques. Data was collected from experiments focused on the study of cognitive processes that might evoke different specific strategies in the resolution of math problems. A binary classification problem was constructed using correlations and phase synchronization between different electroencephalographic channels as characteristics and, as labels or classes, the math performances of individuals participating in specially designed experiments. The proposed methodology is based on using well-established procedures of feature selection, which were used to determine a suitable brain functional network size related to math problem solving strategies and also to discover the most relevant links in this network without including noisy connections or excluding significant connections.
Case-based statistical learning applied to SPECT image classification

NASA Astrophysics Data System (ADS)

Górriz, Juan M.; Ramírez, Javier; Illán, I. A.; Martínez-Murcia, Francisco J.; Segovia, Fermín.; Salas-Gonzalez, Diego; Ortiz, A.

2017-03-01

Statistical learning and decision theory play a key role in many areas of science and engineering. Some examples include time series regression and prediction, optical character recognition, signal detection in communications or biomedical applications for diagnosis and prognosis. This paper deals with the topic of learning from biomedical image data in the classification problem. In a typical scenario we have a training set that is employed to fit a prediction model or learner and a testing set on which the learner is applied to in order to predict the outcome for new unseen patterns. Both processes are usually completely separated to avoid over-fitting and due to the fact that, in practice, the unseen new objects (testing set) have unknown outcomes. However, the outcome yields one of a discrete set of values, i.e. the binary diagnosis problem. Thus, assumptions on these outcome values could be established to obtain the most likely prediction model at the training stage, that could improve the overall classification accuracy on the testing set, or keep its performance at least at the level of the selected statistical classifier. In this sense, a novel case-based learning (c-learning) procedure is proposed which combines hypothesis testing from a discrete set of expected outcomes and a cross-validated classification stage.
LBP and SIFT based facial expression recognition

NASA Astrophysics Data System (ADS)

Sumer, Omer; Gunes, Ece O.

2015-02-01

This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.
Hyperspectral image classification based on local binary patterns and PCANet

NASA Astrophysics Data System (ADS)

Yang, Huizhen; Gao, Feng; Dong, Junyu; Yang, Yang

2018-04-01

Hyperspectral image classification has been well acknowledged as one of the challenging tasks of hyperspectral data processing. In this paper, we propose a novel hyperspectral image classification framework based on local binary pattern (LBP) features and PCANet. In the proposed method, linear prediction error (LPE) is first employed to select a subset of informative bands, and LBP is utilized to extract texture features. Then, spectral and texture features are stacked into a high dimensional vectors. Next, the extracted features of a specified position are transformed to a 2-D image. The obtained images of all pixels are fed into PCANet for classification. Experimental results on real hyperspectral dataset demonstrate the effectiveness of the proposed method.
Visual Recognition Software for Binary Classification and Its Application to Spruce Pollen Identification

PubMed Central

Tcheng, David K.; Nayak, Ashwin K.; Fowlkes, Charless C.; Punyasena, Surangi W.

2016-01-01

Discriminating between black and white spruce (Picea mariana and Picea glauca) is a difficult palynological classification problem that, if solved, would provide valuable data for paleoclimate reconstructions. We developed an open-source visual recognition software (ARLO, Automated Recognition with Layered Optimization) capable of differentiating between these two species at an accuracy on par with human experts. The system applies pattern recognition and machine learning to the analysis of pollen images and discovers general-purpose image features, defined by simple features of lines and grids of pixels taken at different dimensions, size, spacing, and resolution. It adapts to a given problem by searching for the most effective combination of both feature representation and learning strategy. This results in a powerful and flexible framework for image classification. We worked with images acquired using an automated slide scanner. We first applied a hash-based “pollen spotting” model to segment pollen grains from the slide background. We next tested ARLO’s ability to reconstruct black to white spruce pollen ratios using artificially constructed slides of known ratios. We then developed a more scalable hash-based method of image analysis that was able to distinguish between the pollen of black and white spruce with an estimated accuracy of 83.61%, comparable to human expert performance. Our results demonstrate the capability of machine learning systems to automate challenging taxonomic classifications in pollen analysis, and our success with simple image representations suggests that our approach is generalizable to many other object recognition problems. PMID:26867017
Orbit classification in an equal-mass non-spinning binary black hole pseudo-Newtonian system

NASA Astrophysics Data System (ADS)

Zotos, Euaggelos E.; Dubeibe, F. L.; González, Guillermo A.

2018-04-01

The dynamics of a test particle in a non-spinning binary black hole system of equal masses is numerically investigated. The binary system is modeled in the context of the pseudo-Newtonian circular restricted three-body problem, such that the primaries are separated by a fixed distance and move in a circular orbit around each other. In particular, the Paczyński-Wiita potential is used for describing the gravitational field of the two non-Newtonian primaries. The orbital properties of the test particle are determined through the classification of the initial conditions of the orbits, using several values of the Jacobi constant, in the Hill's regions of possible motion. The initial conditions are classified into three main categories: (i) bounded, (ii) escaping and (iii) displaying close encounters. Using the smaller alignment index (SALI) chaos indicator, we further classify bounded orbits into regular, sticky or chaotic. To gain a complete view of the dynamics of the system, we define grids of initial conditions on different types of two-dimensional planes. The orbital structure of the configuration plane, along with the corresponding distributions of the escape and collision/close encounter times, allow us to observe the transition from the classical Newtonian to the pseudo-Newtonian regime. Our numerical results reveal a strong dependence of the properties of the considered basins with the Jacobi constant as well as with the Schwarzschild radius of the black holes.
Locally Weighted Score Estimation for Quantile Classification in Binary Regression Models

PubMed Central

Rice, John D.; Taylor, Jeremy M. G.

2016-01-01

One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined. This work has much in common with robust estimation, but diers from previous approaches in this area in its focus on prediction, specifically classification into high- and low-risk groups. Simulation results are given showing the reduction in error rates that can be obtained with this method when compared with maximum likelihood estimation, especially under certain forms of model misspecification. Analysis of a melanoma data set is presented to illustrate the use of the method in practice. PMID:28018492
A novel hybrid auditory BCI paradigm combining ASSR and P300.

PubMed

Kaongoen, Netiwit; Jo, Sungho

2017-03-01

Brain-computer interface (BCI) is a technology that provides an alternative way of communication by translating brain activities into digital commands. Due to the incapability of using the vision-dependent BCI for patients who have visual impairment, auditory stimuli have been used to substitute the conventional visual stimuli. This paper introduces a hybrid auditory BCI that utilizes and combines auditory steady state response (ASSR) and spatial-auditory P300 BCI to improve the performance for the auditory BCI system. The system works by simultaneously presenting auditory stimuli with different pitches and amplitude modulation (AM) frequencies to the user with beep sounds occurring randomly between all sound sources. Attention to different auditory stimuli yields different ASSR and beep sounds trigger the P300 response when they occur in the target channel, thus the system can utilize both features for classification. The proposed ASSR/P300-hybrid auditory BCI system achieves 85.33% accuracy with 9.11 bits/min information transfer rate (ITR) in binary classification problem. The proposed system outperformed the P300 BCI system (74.58% accuracy with 4.18 bits/min ITR) and the ASSR BCI system (66.68% accuracy with 2.01 bits/min ITR) in binary-class problem. The system is completely vision-independent. This work demonstrates that combining ASSR and P300 BCI into a hybrid system could result in a better performance and could help in the development of the future auditory BCI. Copyright © 2017 Elsevier B.V. All rights reserved.
Building Change Detection from Bi-Temporal Dense-Matching Point Clouds and Aerial Images.

PubMed

Pang, Shiyan; Hu, Xiangyun; Cai, Zhongliang; Gong, Jinqi; Zhang, Mi

2018-03-24

In this work, a novel building change detection method from bi-temporal dense-matching point clouds and aerial images is proposed to address two major problems, namely, the robust acquisition of the changed objects above ground and the automatic classification of changed objects into buildings or non-buildings. For the acquisition of changed objects above ground, the change detection problem is converted into a binary classification, in which the changed area above ground is regarded as the foreground and the other area as the background. For the gridded points of each period, the graph cuts algorithm is adopted to classify the points into foreground and background, followed by the region-growing algorithm to form candidate changed building objects. A novel structural feature that was extracted from aerial images is constructed to classify the candidate changed building objects into buildings and non-buildings. The changed building objects are further classified as "newly built", "taller", "demolished", and "lower" by combining the classification and the digital surface models of two periods. Finally, three typical areas from a large dataset are used to validate the proposed method. Numerous experiments demonstrate the effectiveness of the proposed algorithm.
Extension of mixture-of-experts networks for binary classification of hierarchical data.

PubMed

Ng, Shu-Kay; McLachlan, Geoffrey J

2007-09-01

For many applied problems in the context of medically relevant artificial intelligence, the data collected exhibit a hierarchical or clustered structure. Ignoring the interdependence between hierarchical data can result in misleading classification. In this paper, we extend the mechanism for mixture-of-experts (ME) networks for binary classification of hierarchical data. Another extension is to quantify cluster-specific information on data hierarchy by random effects via the generalized linear mixed-effects model (GLMM). The extension of ME networks is implemented by allowing for correlation in the hierarchical data in both the gating and expert networks via the GLMM. The proposed model is illustrated using a real thyroid disease data set. In our study, we consider 7652 thyroid diagnosis records from 1984 to early 1987 with complete information on 20 attribute values. We obtain 10 independent random splits of the data into a training set and a test set in the proportions 85% and 15%. The test sets are used to assess the generalization performance of the proposed model, based on the percentage of misclassifications. For comparison, the results obtained from the ME network with independence assumption are also included. With the thyroid disease data, the misclassification rate on test sets for the extended ME network is 8.9%, compared to 13.9% for the ME network. In addition, based on model selection methods described in Section 2, a network with two experts is selected. These two expert networks can be considered as modeling two groups of patients with high and low incidence rates. Significant variation among the predicted cluster-specific random effects is detected in the patient group with low incidence rate. It is shown that the extended ME network outperforms the ME network for binary classification of hierarchical data. With the thyroid disease data, useful information on the relative log odds of patients with diagnosed conditions at different periods can be evaluated. This information can be taken into consideration for the assessment of treatment planning of the disease. The proposed extended ME network thus facilitates a more general approach to incorporate data hierarchy mechanism in network modeling.
Stretchy binary classification.

PubMed

Toh, Kar-Ann; Lin, Zhiping; Sun, Lei; Li, Zhengguo

2018-01-01

In this article, we introduce an analytic formulation for compressive binary classification. The formulation seeks to solve the least ℓ p -norm of the parameter vector subject to a classification error constraint. An analytic and stretchable estimation is conjectured where the estimation can be viewed as an extension of the pseudoinverse with left and right constructions. Our variance analysis indicates that the estimation based on the left pseudoinverse is unbiased and the estimation based on the right pseudoinverse is biased. Sparseness can be obtained for the biased estimation under certain mild conditions. The proposed estimation is investigated numerically using both synthetic and real-world data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Learning classification models with soft-label information.

PubMed

Nguyen, Quang; Valizadegan, Hamed; Hauskrecht, Milos

2014-01-01

Learning of classification models in medicine often relies on data labeled by a human expert. Since labeling of clinical data may be time-consuming, finding ways of alleviating the labeling costs is critical for our ability to automatically learn such models. In this paper we propose a new machine learning approach that is able to learn improved binary classification models more efficiently by refining the binary class information in the training phase with soft labels that reflect how strongly the human expert feels about the original class labels. Two types of methods that can learn improved binary classification models from soft labels are proposed. The first relies on probabilistic/numeric labels, the other on ordinal categorical labels. We study and demonstrate the benefits of these methods for learning an alerting model for heparin induced thrombocytopenia. The experiments are conducted on the data of 377 patient instances labeled by three different human experts. The methods are compared using the area under the receiver operating characteristic curve (AUC) score. Our AUC results show that the new approach is capable of learning classification models more efficiently compared to traditional learning methods. The improvement in AUC is most remarkable when the number of examples we learn from is small. A new classification learning framework that lets us learn from auxiliary soft-label information provided by a human expert is a promising new direction for learning classification models from expert labels, reducing the time and cost needed to label data.
EEG Responses to Auditory Stimuli for Automatic Affect Recognition

PubMed Central

Hettich, Dirk T.; Bolinger, Elaina; Matuz, Tamara; Birbaumer, Niels; Rosenstiel, Wolfgang; Spüler, Martin

2016-01-01

Brain state classification for communication and control has been well established in the area of brain-computer interfaces over the last decades. Recently, the passive and automatic extraction of additional information regarding the psychological state of users from neurophysiological signals has gained increased attention in the interdisciplinary field of affective computing. We investigated how well specific emotional reactions, induced by auditory stimuli, can be detected in EEG recordings. We introduce an auditory emotion induction paradigm based on the International Affective Digitized Sounds 2nd Edition (IADS-2) database also suitable for disabled individuals. Stimuli are grouped in three valence categories: unpleasant, neutral, and pleasant. Significant differences in time domain domain event-related potentials are found in the electroencephalogram (EEG) between unpleasant and neutral, as well as pleasant and neutral conditions over midline electrodes. Time domain data were classified in three binary classification problems using a linear support vector machine (SVM) classifier. We discuss three classification performance measures in the context of affective computing and outline some strategies for conducting and reporting affect classification studies. PMID:27375410
Kernel Wiener filter and its application to pattern recognition.

PubMed

Yoshino, Hirokazu; Dong, Chen; Washizawa, Yoshikazu; Yamashita, Yukihiko

2010-11-01

The Wiener filter (WF) is widely used for inverse problems. From an observed signal, it provides the best estimated signal with respect to the squared error averaged over the original and the observed signals among linear operators. The kernel WF (KWF), extended directly from WF, has a problem that an additive noise has to be handled by samples. Since the computational complexity of kernel methods depends on the number of samples, a huge computational cost is necessary for the case. By using the first-order approximation of kernel functions, we realize KWF that can handle such a noise not by samples but as a random variable. We also propose the error estimation method for kernel filters by using the approximations. In order to show the advantages of the proposed methods, we conducted the experiments to denoise images and estimate errors. We also apply KWF to classification since KWF can provide an approximated result of the maximum a posteriori classifier that provides the best recognition accuracy. The noise term in the criterion can be used for the classification in the presence of noise or a new regularization to suppress changes in the input space, whereas the ordinary regularization for the kernel method suppresses changes in the feature space. In order to show the advantages of the proposed methods, we conducted experiments of binary and multiclass classifications and classification in the presence of noise.

Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification.

PubMed

Spinnato, J; Roubaud, M-C; Burle, B; Torrésani, B

2015-06-01

The main goal of this work is to develop a model for multisensor signals, such as magnetoencephalography or electroencephalography (EEG) signals that account for inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI-type experiments. The method involves the linear mixed effects statistical model, wavelet transform, and spatial filtering, and aims at the characterization of localized discriminant features in multisensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e., discriminant) and background noise, using a very simple Gaussian linear mixed model. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. The combination of the linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves upon earlier results on similar problems, and the three main ingredients all play an important role.
Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

NASA Astrophysics Data System (ADS)

Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin

2010-12-01

We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
A comparative study for chest radiograph image retrieval using binary texture and deep learning classification.

PubMed

Anavi, Yaron; Kogan, Ilya; Gelbart, Elad; Geva, Ofer; Greenspan, Hayit

2015-08-01

In this work various approaches are investigated for X-ray image retrieval and specifically chest pathology retrieval. Given a query image taken from a data set of 443 images, the objective is to rank images according to similarity. Different features, including binary features, texture features, and deep learning (CNN) features are examined. In addition, two approaches are investigated for the retrieval task. One approach is based on the distance of image descriptors using the above features (hereon termed the "descriptor"-based approach); the second approach ("classification"-based approach) is based on a probability descriptor, generated by a pair-wise classification of each two classes (pathologies) and their decision values using an SVM classifier. Best results are achieved using deep learning features in a classification scheme.
Cirrhosis Diagnosis and Liver Fibrosis Staging: Transient Elastometry Versus Cirrhosis Blood Test.

PubMed

Calès, Paul; Boursier, Jérôme; Oberti, Frédéric; Bardou, Derek; Zarski, Jean-Pierre; de Lédinghen, Victor

2015-07-01

Elastometry is more accurate than blood tests for cirrhosis diagnosis. However, blood tests were developed for significant fibrosis, with the exception of CirrhoMeter developed for cirrhosis. We compared the performance of Fibroscan and CirrhoMeter, and classic binary cirrhosis diagnosis versus new fibrosis staging for cirrhosis diagnosis. The diagnostic population included 679 patients with hepatitis C and liver biopsy (Metavir staging and morphometry), Fibroscan, and CirrhoMeter. The prognostic population included 1110 patients with chronic liver disease and both tests. Binary diagnosis: AUROCs for cirrhosis were: Fibroscan: 0.905; CirrhoMeter: 0.857; and P=0.041. Accuracy (Youden cutoff) was: Fibroscan: 85.4%; CirrhoMeter: 79.2%; and P<0.001. Fibrosis classification provided 6 classes (F0/1, F1/2, F2±1, F3±1, F3/4, and F4). Accuracy was: Fibroscan: 88.2%; CirrhoMeter: 88.8%; and P=0.77. A simplified fibrosis classification comprised 3 categories: discrete (F1±1), moderate (F2±1), and severe (F3/4) fibrosis. Using this simplified classification, CirrhoMeter predicted survival better than Fibroscan (respectively, χ=37.9 and 19.7 by log-rank test), but both predicted it well (P<0.001 by log-rank test). Comparison: binary diagnosis versus fibrosis classification, respectively, overall accuracy: CirrhoMeter: 79.2% versus 88.8% (P<0.001); Fibroscan: 85.4% versus 88.2% (P=0.127); positive predictive value for cirrhosis by Fibroscan: Youden cutoff (11.1 kPa): 49.1% versus cutoffs of F3/4 (17.6 kPa): 67.6% and F4 classes (25.7 kPa): 82.4%. Fibroscan's usual binary cutoffs for cirrhosis diagnosis are not sufficiently accurate. Fibrosis classification should be preferred over binary diagnosis. A cirrhosis-specific blood test markedly attenuates the accuracy deficit for cirrhosis diagnosis of usual blood tests versus transient elastometry, and may offer better prognostication.
Natural Language Processing Based Instrument for Classification of Free Text Medical Records

PubMed Central

2016-01-01

According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260
A feature dictionary supporting a multi-domain medical knowledge base.

PubMed

Naeymi-Rad, F

1989-01-01

Because different terminology is used by physicians of different specialties in different locations to refer to the same feature (signs, symptoms, test results), it is essential that our knowledge development tools provide a means to access a common pool of terms. This paper discusses the design of an online medical dictionary that provides a solution to this problem for developers of multi-domain knowledge bases for MEDAS (Medical Emergency Decision Assistance System). Our Feature Dictionary supports phrase equivalents for features, feature interactions, feature classifications, and translations to the binary features generated by the expert during knowledge creation. It is also used in the conversion of a domain knowledge to the database used by the MEDAS inference diagnostic sessions. The Feature Dictionary also provides capabilities for complex queries across multiple domains using the supported relations. The Feature Dictionary supports three methods for feature representation: (1) for binary features, (2) for continuous valued features, and (3) for derived features.
Periodic activation function and a modified learning algorithm for the multivalued neuron.

PubMed

Aizenberg, Igor

2010-12-01

In this paper, we consider a new periodic activation function for the multivalued neuron (MVN). The MVN is a neuron with complex-valued weights and inputs/output, which are located on the unit circle. Although the MVN outperforms many other neurons and MVN-based neural networks have shown their high potential, the MVN still has a limited capability of learning highly nonlinear functions. A periodic activation function, which is introduced in this paper, makes it possible to learn nonlinearly separable problems and non-threshold multiple-valued functions using a single multivalued neuron. We call this neuron a multivalued neuron with a periodic activation function (MVN-P). The MVN-Ps functionality is much higher than that of the regular MVN. The MVN-P is more efficient in solving various classification problems. A learning algorithm based on the error-correction rule for the MVN-P is also presented. It is shown that a single MVN-P can easily learn and solve those benchmark classification problems that were considered unsolvable using a single neuron. It is also shown that a universal binary neuron, which can learn nonlinearly separable Boolean functions, and a regular MVN are particular cases of the MVN-P.
Joint Sparse Recovery With Semisupervised MUSIC

NASA Astrophysics Data System (ADS)

Wen, Zaidao; Hou, Biao; Jiao, Licheng

2017-05-01

Discrete multiple signal classification (MUSIC) with its low computational cost and mild condition requirement becomes a significant noniterative algorithm for joint sparse recovery (JSR). However, it fails in rank defective problem caused by coherent or limited amount of multiple measurement vectors (MMVs). In this letter, we provide a novel sight to address this problem by interpreting JSR as a binary classification problem with respect to atoms. Meanwhile, MUSIC essentially constructs a supervised classifier based on the labeled MMVs so that its performance will heavily depend on the quality and quantity of these training samples. From this viewpoint, we develop a semisupervised MUSIC (SS-MUSIC) in the spirit of machine learning, which declares that the insufficient supervised information in the training samples can be compensated from those unlabeled atoms. Instead of constructing a classifier in a fully supervised manner, we iteratively refine a semisupervised classifier by exploiting the labeled MMVs and some reliable unlabeled atoms simultaneously. Through this way, the required conditions and iterations can be greatly relaxed and reduced. Numerical experimental results demonstrate that SS-MUSIC can achieve much better recovery performances than other MUSIC extended algorithms as well as some typical greedy algorithms for JSR in terms of iterations and recovery probability.
From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

PubMed

Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

2010-01-30

Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

PubMed Central

2010-01-01

Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515
AllergenFP: allergenicity prediction by descriptor fingerprints.

PubMed

Dimitrov, Ivan; Naneva, Lyudmila; Doytchinova, Irini; Bangov, Ivan

2014-03-15

Allergenicity, like antigenicity and immunogenicity, is a property encoded linearly and non-linearly, and therefore the alignment-based approaches are not able to identify this property unambiguously. A novel alignment-free descriptor-based fingerprint approach is presented here and applied to identify allergens and non-allergens. The approach was implemented into a four step algorithm. Initially, the protein sequences are described by amino acid principal properties as hydrophobicity, size, relative abundance, helix and β-strand forming propensities. Then, the generated strings of different length are converted into vectors with equal length by auto- and cross-covariance (ACC). The vectors were transformed into binary fingerprints and compared in terms of Tanimoto coefficient. The approach was applied to a set of 2427 known allergens and 2427 non-allergens and identified correctly 88% of them with Matthews correlation coefficient of 0.759. The descriptor fingerprint approach presented here is universal. It could be applied for any classification problem in computational biology. The set of E-descriptors is able to capture the main structural and physicochemical properties of amino acids building the proteins. The ACC transformation overcomes the main problem in the alignment-based comparative studies arising from the different length of the aligned protein sequences. The conversion of protein ACC values into binary descriptor fingerprints allows similarity search and classification. The algorithm described in the present study was implemented in a specially designed Web site, named AllergenFP (FP stands for FingerPrint). AllergenFP is written in Python, with GIU in HTML. It is freely accessible at http://ddg-pharmfac.net/Allergen FP. idoytchinova@pharmfac.net or ivanbangov@shu-bg.net.
Vehicle classification using mobile sensors.

DOT National Transportation Integrated Search

2013-04-01

In this research, the feasibility of using mobile traffic sensors for binary vehicle classification on arterial roads is investigated. Features (e.g. : speed related, acceleration/deceleration related, etc.) are extracted from vehicle traces (passeng...
An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes

PubMed Central

2013-01-01

Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960
LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research

NASA Astrophysics Data System (ADS)

Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team

2011-01-01

The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.
A robust probabilistic collaborative representation based classification for multimodal biometrics

NASA Astrophysics Data System (ADS)

Zhang, Jing; Liu, Huanxi; Ding, Derui; Xiao, Jianli

2018-04-01

Most of the traditional biometric recognition systems perform recognition with a single biometric indicator. These systems have suffered noisy data, interclass variations, unacceptable error rates, forged identity, and so on. Due to these inherent problems, it is not valid that many researchers attempt to enhance the performance of unimodal biometric systems with single features. Thus, multimodal biometrics is investigated to reduce some of these defects. This paper proposes a new multimodal biometric recognition approach by fused faces and fingerprints. For more recognizable features, the proposed method extracts block local binary pattern features for all modalities, and then combines them into a single framework. For better classification, it employs the robust probabilistic collaborative representation based classifier to recognize individuals. Experimental results indicate that the proposed method has improved the recognition accuracy compared to the unimodal biometrics.
Markerless gating for lung cancer radiotherapy based on machine learning techniques

NASA Astrophysics Data System (ADS)

Lin, Tong; Li, Ruijiang; Tang, Xiaoli; Dy, Jennifer G.; Jiang, Steve B.

2009-03-01

In lung cancer radiotherapy, radiation to a mobile target can be delivered by respiratory gating, for which we need to know whether the target is inside or outside a predefined gating window at any time point during the treatment. This can be achieved by tracking one or more fiducial markers implanted inside or near the target, either fluoroscopically or electromagnetically. However, the clinical implementation of marker tracking is limited for lung cancer radiotherapy mainly due to the risk of pneumothorax. Therefore, gating without implanted fiducial markers is a promising clinical direction. We have developed several template-matching methods for fluoroscopic marker-less gating. Recently, we have modeled the gating problem as a binary pattern classification problem, in which principal component analysis (PCA) and support vector machine (SVM) are combined to perform the classification task. Following the same framework, we investigated different combinations of dimensionality reduction techniques (PCA and four nonlinear manifold learning methods) and two machine learning classification methods (artificial neural networks—ANN and SVM). Performance was evaluated on ten fluoroscopic image sequences of nine lung cancer patients. We found that among all combinations of dimensionality reduction techniques and classification methods, PCA combined with either ANN or SVM achieved a better performance than the other nonlinear manifold learning methods. ANN when combined with PCA achieves a better performance than SVM in terms of classification accuracy and recall rate, although the target coverage is similar for the two classification methods. Furthermore, the running time for both ANN and SVM with PCA is within tolerance for real-time applications. Overall, ANN combined with PCA is a better candidate than other combinations we investigated in this work for real-time gated radiotherapy.
Scattering features for lung cancer detection in fibered confocal fluorescence microscopy images.

PubMed

Rakotomamonjy, Alain; Petitjean, Caroline; Salaün, Mathieu; Thiberville, Luc

2014-06-01

To assess the feasibility of lung cancer diagnosis using fibered confocal fluorescence microscopy (FCFM) imaging technique and scattering features for pattern recognition. FCFM imaging technique is a new medical imaging technique for which interest has yet to be established for diagnosis. This paper addresses the problem of lung cancer detection using FCFM images and, as a first contribution, assesses the feasibility of computer-aided diagnosis through these images. Towards this aim, we have built a pattern recognition scheme which involves a feature extraction stage and a classification stage. The second contribution relies on the features used for discrimination. Indeed, we have employed the so-called scattering transform for extracting discriminative features, which are robust to small deformations in the images. We have also compared and combined these features with classical yet powerful features like local binary patterns (LBP) and their variants denoted as local quinary patterns (LQP). We show that scattering features yielded to better recognition performances than classical features like LBP and their LQP variants for the FCFM image classification problems. Another finding is that LBP-based and scattering-based features provide complementary discriminative information and, in some situations, we empirically establish that performance can be improved when jointly using LBP, LQP and scattering features. In this work we analyze the joint capability of FCFM images and scattering features for lung cancer diagnosis. The proposed method achieves a good recognition rate for such a diagnosis problem. It also performs well when used in conjunction with other features for other classical medical imaging classification problems. Copyright © 2014 Elsevier B.V. All rights reserved.
GPU-Based Point Cloud Superpositioning for Structural Comparisons of Protein Binding Sites.

PubMed

Leinweber, Matthias; Fober, Thomas; Freisleben, Bernd

2018-01-01

In this paper, we present a novel approach to solve the labeled point cloud superpositioning problem for performing structural comparisons of protein binding sites. The solution is based on a parallel evolution strategy that operates on large populations and runs on GPU hardware. The proposed evolution strategy reduces the likelihood of getting stuck in a local optimum of the multimodal real-valued optimization problem represented by labeled point cloud superpositioning. The performance of the GPU-based parallel evolution strategy is compared to a previously proposed CPU-based sequential approach for labeled point cloud superpositioning, indicating that the GPU-based parallel evolution strategy leads to qualitatively better results and significantly shorter runtimes, with speed improvements of up to a factor of 1,500 for large populations. Binary classification tests based on the ATP, NADH, and FAD protein subsets of CavBase, a database containing putative binding sites, show average classification rate improvements from about 92 percent (CPU) to 96 percent (GPU). Further experiments indicate that the proposed GPU-based labeled point cloud superpositioning approach can be superior to traditional protein comparison approaches based on sequence alignments.
Designing boosting ensemble of relational fuzzy systems.

PubMed

Scherer, Rafał

2010-10-01

A method frequently used in classification systems for improving classification accuracy is to combine outputs of several classifiers. Among various types of classifiers, fuzzy ones are tempting because of using intelligible fuzzy if-then rules. In the paper we build an AdaBoost ensemble of relational neuro-fuzzy classifiers. Relational fuzzy systems bond input and output fuzzy linguistic values by a binary relation; thus, fuzzy rules have additional, comparing to traditional fuzzy systems, weights - elements of a fuzzy relation matrix. Thanks to this the system is better adjustable to data during learning. In the paper an ensemble of relational fuzzy systems is proposed. The problem is that such an ensemble contains separate rule bases which cannot be directly merged. As systems are separate, we cannot treat fuzzy rules coming from different systems as rules from the same (single) system. In the paper, the problem is addressed by a novel design of fuzzy systems constituting the ensemble, resulting in normalization of individual rule bases during learning. The method described in the paper is tested on several known benchmarks and compared with other machine learning solutions from the literature.
Spectroscopic classification of X-ray sources in the Galactic Bulge Survey

NASA Astrophysics Data System (ADS)

Wevers, T.; Torres, M. A. P.; Jonker, P. G.; Nelemans, G.; Heinke, C.; Mata Sánchez, D.; Johnson, C. B.; Gazer, R.; Steeghs, D. T. H.; Maccarone, T. J.; Hynes, R. I.; Casares, J.; Udalski, A.; Wetuski, J.; Britt, C. T.; Kostrzewa-Rutkowska, Z.; Wyrzykowski, Ł.

2017-10-01

We present the classification of 26 optical counterparts to X-ray sources discovered in the Galactic Bulge Survey. We use (time-resolved) photometric and spectroscopic observations to classify the X-ray sources based on their multiwavelength properties. We find a variety of source classes, spanning different phases of stellar/binary evolution. We classify CX21 as a quiescent cataclysmic variable (CV) below the period gap, and CX118 as a high accretion rate (nova-like) CV. CXB12 displays excess UV emission, and could contain a compact object with a giant star companion, making it a candidate symbiotic binary or quiescent low-mass X-ray binary (although other scenarios cannot be ruled out). CXB34 is a magnetic CV (polar) that shows photometric evidence for a change in accretion state. The magnetic classification is based on the detection of X-ray pulsations with a period of 81 ± 2 min. CXB42 is identified as a young stellar object, namely a weak-lined T Tauri star exhibiting (to date unexplained) UX Ori-like photometric variability. The optical spectrum of CXB43 contains two (resolved) unidentified double-peaked emission lines. No known scenario, such as an active galactic nucleus or symbiotic binary, can easily explain its characteristics. We additionally classify 20 objects as likely active stars based on optical spectroscopy, their X-ray to optical flux ratios and photometric variability. In four cases we identify the sources as binary stars.

Texture operator for snow particle classification into snowflake and graupel

NASA Astrophysics Data System (ADS)

Nurzyńska, Karolina; Kubo, Mamoru; Muramoto, Ken-ichiro

2012-11-01

In order to improve the estimation of precipitation, the coefficients of Z-R relation should be determined for each snow type. Therefore, it is necessary to identify the type of falling snow. Consequently, this research addresses a problem of snow particle classification into snowflake and graupel in an automatic manner (as these types are the most common in the study region). Having correctly classified precipitation events, it is believed that it will be possible to estimate the related parameters accurately. The automatic classification system presented here describes the images with texture operators. Some of them are well-known from the literature: first order features, co-occurrence matrix, grey-tone difference matrix, run length matrix, and local binary pattern, but also a novel approach to design simple local statistic operators is introduced. In this work the following texture operators are defined: mean histogram, min-max histogram, and mean-variance histogram. Moreover, building a feature vector, which is based on the structure created in many from mentioned algorithms is also suggested. For classification, the k-nearest neighbourhood classifier was applied. The results showed that it is possible to achieve correct classification accuracy above 80% by most of the techniques. The best result of 86.06%, was achieved for operator built from a structure achieved in the middle stage of the co-occurrence matrix calculation. Next, it was noticed that describing an image with two texture operators does not improve the classification results considerably. In the best case the correct classification efficiency was 87.89% for a pair of texture operators created from local binary pattern and structure build in a middle stage of grey-tone difference matrix calculation. This also suggests that the information gathered by each texture operator is redundant. Therefore, the principal component analysis was applied in order to remove the unnecessary information and additionally reduce the length of the feature vectors. The improvement of the correct classification efficiency for up to 100% is possible for methods: min-max histogram, texture operator built from structure achieved in a middle stage of co-occurrence matrix calculation, texture operator built from a structure achieved in a middle stage of grey-tone difference matrix creation, and texture operator based on a histogram, when the feature vector stores 99% of initial information.
Improved opponent color local binary patterns: an effective local image descriptor for color texture classification

NASA Astrophysics Data System (ADS)

Bianconi, Francesco; Bello-Cerezo, Raquel; Napoletano, Paolo

2018-01-01

Texture classification plays a major role in many computer vision applications. Local binary patterns (LBP) encoding schemes have largely been proven to be very effective for this task. Improved LBP (ILBP) are conceptually simple, easy to implement, and highly effective LBP variants based on a point-to-average thresholding scheme instead of a point-to-point one. We propose the use of this encoding scheme for extracting intra- and interchannel features for color texture classification. We experimentally evaluated the resulting improved opponent color LBP alone and in concatenation with the ILBP of the local color contrast map on a set of image classification tasks over 9 datasets of generic color textures and 11 datasets of biomedical textures. The proposed approach outperformed other grayscale and color LBP variants in nearly all the datasets considered and proved competitive even against image features from last generation convolutional neural networks, particularly for the classification of biomedical images.
Performance of fusion algorithms for computer-aided detection and classification of mines in very shallow water obtained from testing in navy Fleet Battle Exercise-Hotel 2000

NASA Astrophysics Data System (ADS)

Ciany, Charles M.; Zurawski, William; Kerfoot, Ian

2001-10-01

The performance of Computer Aided Detection/Computer Aided Classification (CAD/CAC) Fusion algorithms on side-scan sonar images was evaluated using data taken at the Navy's's Fleet Battle Exercise-Hotel held in Panama City, Florida, in August 2000. A 2-of-3 binary fusion algorithm is shown to provide robust performance. The algorithm accepts the classification decisions and associated contact locations form three different CAD/CAC algorithms, clusters the contacts based on Euclidian distance, and then declares a valid target when a clustered contact is declared by at least 2 of the 3 individual algorithms. This simple binary fusion provided a 96 percent probability of correct classification at a false alarm rate of 0.14 false alarms per image per side. The performance represented a 3.8:1 reduction in false alarms over the best performing single CAD/CAC algorithm, with no loss in probability of correct classification.
Method for protein structure alignment

DOEpatents

Blankenbecler, Richard; Ohlsson, Mattias; Peterson, Carsten; Ringner, Markus

2005-02-22

This invention provides a method for protein structure alignment. More particularly, the present invention provides a method for identification, classification and prediction of protein structures. The present invention involves two key ingredients. First, an energy or cost function formulation of the problem simultaneously in terms of binary (Potts) assignment variables and real-valued atomic coordinates. Second, a minimization of the energy or cost function by an iterative method, where in each iteration (1) a mean field method is employed for the assignment variables and (2) exact rotation and/or translation of atomic coordinates is performed, weighted with the corresponding assignment variables.
Multiscale Rotation-Invariant Convolutional Neural Networks for Lung Texture Classification.

PubMed

Wang, Qiangchang; Zheng, Yuanjie; Yang, Gongping; Jin, Weidong; Chen, Xinjian; Yin, Yilong

2018-01-01

We propose a new multiscale rotation-invariant convolutional neural network (MRCNN) model for classifying various lung tissue types on high-resolution computed tomography. MRCNN employs Gabor-local binary pattern that introduces a good property in image analysis-invariance to image scales and rotations. In addition, we offer an approach to deal with the problems caused by imbalanced number of samples between different classes in most of the existing works, accomplished by changing the overlapping size between the adjacent patches. Experimental results on a public interstitial lung disease database show a superior performance of the proposed method to state of the art.
A multi-label, semi-supervised classification approach applied to personality prediction in social media.

PubMed

Lima, Ana Carolina E S; de Castro, Leandro Nunes

2014-10-01

Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others. Copyright © 2014 Elsevier Ltd. All rights reserved.
Classifying Black Hole States with Machine Learning

NASA Astrophysics Data System (ADS)

Huppenkothen, Daniela

2018-01-01

Galactic black hole binaries are known to go through different states with apparent signatures in both X-ray light curves and spectra, leading to important implications for accretion physics as well as our knowledge of General Relativity. Existing frameworks of classification are usually based on human interpretation of low-dimensional representations of the data, and generally only apply to fairly small data sets. Machine learning, in contrast, allows for rapid classification of large, high-dimensional data sets. In this talk, I will report on advances made in classification of states observed in Black Hole X-ray Binaries, focusing on the two sources GRS 1915+105 and Cygnus X-1, and show both the successes and limitations of using machine learning to derive physical constraints on these systems.
The Sequential Probability Ratio Test and Binary Item Response Models

ERIC Educational Resources Information Center

Nydick, Steven W.

2014-01-01

The sequential probability ratio test (SPRT) is a common method for terminating item response theory (IRT)-based adaptive classification tests. To decide whether a classification test should stop, the SPRT compares a simple log-likelihood ratio, based on the classification bound separating two categories, to prespecified critical values. As has…
Modified Mahalanobis Taguchi System for Imbalance Data Classification

PubMed Central

2017-01-01

The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA). PMID:28811820
Gaia eclipsing binary and multiple systems. Supervised classification and self-organizing maps

NASA Astrophysics Data System (ADS)

Süveges, M.; Barblan, F.; Lecoeur-Taïbi, I.; Prša, A.; Holl, B.; Eyer, L.; Kochoska, A.; Mowlavi, N.; Rimoldini, L.

2017-07-01

Context. Large surveys producing tera- and petabyte-scale databases require machine-learning and knowledge discovery methods to deal with the overwhelming quantity of data and the difficulties of extracting concise, meaningful information with reliable assessment of its uncertainty. This study investigates the potential of a few machine-learning methods for the automated analysis of eclipsing binaries in the data of such surveys. Aims: We aim to aid the extraction of samples of eclipsing binaries from such databases and to provide basic information about the objects. We intend to estimate class labels according to two different, well-known classification systems, one based on the light curve morphology (EA/EB/EW classes) and the other based on the physical characteristics of the binary system (system morphology classes; detached through overcontact systems). Furthermore, we explore low-dimensional surfaces along which the light curves of eclipsing binaries are concentrated, and consider their use in the characterization of the binary systems and in the exploration of biases of the full unknown Gaia data with respect to the training sets. Methods: We have explored the performance of principal component analysis (PCA), linear discriminant analysis (LDA), Random Forest classification and self-organizing maps (SOM) for the above aims. We pre-processed the photometric time series by combining a double Gaussian profile fit and a constrained smoothing spline, in order to de-noise and interpolate the observed light curves. We achieved further denoising, and selected the most important variability elements from the light curves using PCA. Supervised classification was performed using Random Forest and LDA based on the PC decomposition, while SOM gives a continuous 2-dimensional manifold of the light curves arranged by a few important features. We estimated the uncertainty of the supervised methods due to the specific finite training set using ensembles of models constructed on randomized training sets. Results: We obtain excellent results (about 5% global error rate) with classification into light curve morphology classes on the Hipparcos data. The classification into system morphology classes using the Catalog and Atlas of Eclipsing binaries (CALEB) has a higher error rate (about 10.5%), most importantly due to the (sometimes strong) similarity of the photometric light curves originating from physically different systems. When trained on CALEB and then applied to Kepler-detected eclipsing binaries subsampled according to Gaia observing times, LDA and SOM provide tractable, easy-to-visualize subspaces of the full (functional) space of light curves that summarize the most important phenomenological elements of the individual light curves. The sequence of light curves ordered by their first linear discriminant coefficient is compared to results obtained using local linear embedding. The SOM method proves able to find a 2-dimensional embedded surface in the space of the light curves which separates the system morphology classes in its different regions, and also identifies a few other phenomena, such as the asymmetry of the light curves due to spots, eccentric systems, and systems with a single eclipse. Furthermore, when data from other surveys are projected to the same SOM surface, the resulting map yields a good overview of the general biases and distortions due to differences in time sampling or population.
Electrode channel selection based on backtracking search optimization in motor imagery brain-computer interfaces.

PubMed

Dai, Shengfa; Wei, Qingguo

2017-01-01

Common spatial pattern algorithm is widely used to estimate spatial filters in motor imagery based brain-computer interfaces. However, use of a large number of channels will make common spatial pattern tend to over-fitting and the classification of electroencephalographic signals time-consuming. To overcome these problems, it is necessary to choose an optimal subset of the whole channels to save computational time and improve the classification accuracy. In this paper, a novel method named backtracking search optimization algorithm is proposed to automatically select the optimal channel set for common spatial pattern. Each individual in the population is a N-dimensional vector, with each component representing one channel. A population of binary codes generate randomly in the beginning, and then channels are selected according to the evolution of these codes. The number and positions of 1's in the code denote the number and positions of chosen channels. The objective function of backtracking search optimization algorithm is defined as the combination of classification error rate and relative number of channels. Experimental results suggest that higher classification accuracy can be achieved with much fewer channels compared to standard common spatial pattern with whole channels.
Real-time, resource-constrained object classification on a micro-air vehicle

NASA Astrophysics Data System (ADS)

Buck, Louis; Ray, Laura

2013-12-01

A real-time embedded object classification algorithm is developed through the novel combination of binary feature descriptors, a bag-of-visual-words object model and the cortico-striatal loop (CSL) learning algorithm. The BRIEF, ORB and FREAK binary descriptors are tested and compared to SIFT descriptors with regard to their respective classification accuracies, execution times, and memory requirements when used with CSL on a 12.6 g ARM Cortex embedded processor running at 800 MHz. Additionally, the effect of x2 feature mapping and opponent-color representations used with these descriptors is examined. These tests are performed on four data sets of varying sizes and difficulty, and the BRIEF descriptor is found to yield the best combination of speed and classification accuracy. Its use with CSL achieves accuracies between 67% and 95% of those achieved with SIFT descriptors and allows for the embedded classification of a 128x192 pixel image in 0.15 seconds, 60 times faster than classification with SIFT. X2 mapping is found to provide substantial improvements in classification accuracy for all of the descriptors at little cost, while opponent-color descriptors are offer accuracy improvements only on colorful datasets.
Solid phase excitation-emission fluorescence method for the classification of complex substances: Cortex Phellodendri and other traditional Chinese medicines as examples.

PubMed

Gu, Yao; Ni, Yongnian; Kokot, Serge

2012-09-13

A novel, simple and direct fluorescence method for analysis of complex substances and their potential substitutes has been researched and developed. Measurements involved excitation and emission (EEM) fluorescence spectra of powdered, complex, medicinal herbs, Cortex Phellodendri Chinensis (CPC) and the similar Cortex Phellodendri Amurensis (CPA); these substances were compared and discriminated from each other and the potentially adulterated samples (Caulis mahoniae (CM) and David poplar bark (DPB)). Different chemometrics methods were applied for resolution of the complex spectra, and the excitation spectra were found to be the most informative; only the rank-ordering PROMETHEE method was able to classify the samples with single ingredients (CPA, CPC, CM) or those with binary mixtures (CPA/CPC, CPA/CM, CPC/CM). Interestingly, it was essential to use the geometrical analysis for interactive aid (GAIA) display for a full understanding of the classification results. However, these two methods, like the other chemometrics models, were unable to classify composite spectral matrices consisting of data from samples of single ingredients and binary mixtures; this suggested that the excitation spectra of the different samples were very similar. However, the method is useful for classification of single-ingredient samples and, separately, their binary mixtures; it may also be applied for similar classification work with other complex substances.
Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor

PubMed Central

Alamedine, D.; Khalil, M.; Marque, C.

2013-01-01

Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536
Multiclass Classification by Adaptive Network of Dendritic Neurons with Binary Synapses Using Structural Plasticity

PubMed Central

Hussain, Shaista; Basu, Arindam

2016-01-01

The development of power-efficient neuromorphic devices presents the challenge of designing spike pattern classification algorithms which can be implemented on low-precision hardware and can also achieve state-of-the-art performance. In our pursuit of meeting this challenge, we present a pattern classification model which uses a sparse connection matrix and exploits the mechanism of nonlinear dendritic processing to achieve high classification accuracy. A rate-based structural learning rule for multiclass classification is proposed which modifies a connectivity matrix of binary synaptic connections by choosing the best “k” out of “d” inputs to make connections on every dendritic branch (k < < d). Because learning only modifies connectivity, the model is well suited for implementation in neuromorphic systems using address-event representation (AER). We develop an ensemble method which combines several dendritic classifiers to achieve enhanced generalization over individual classifiers. We have two major findings: (1) Our results demonstrate that an ensemble created with classifiers comprising moderate number of dendrites performs better than both ensembles of perceptrons and of complex dendritic trees. (2) In order to determine the moderate number of dendrites required for a specific classification problem, a two-step solution is proposed. First, an adaptive approach is proposed which scales the relative size of the dendritic trees of neurons for each class. It works by progressively adding dendrites with fixed number of synapses to the network, thereby allocating synaptic resources as per the complexity of the given problem. As a second step, theoretical capacity calculations are used to convert each neuronal dendritic tree to its optimal topology where dendrites of each class are assigned different number of synapses. The performance of the model is evaluated on classification of handwritten digits from the benchmark MNIST dataset and compared with other spike classifiers. We show that our system can achieve classification accuracy within 1 − 2% of other reported spike-based classifiers while using much less synaptic resources (only 7%) compared to that used by other methods. Further, an ensemble classifier created with adaptively learned sizes can attain accuracy of 96.4% which is at par with the best reported performance of spike-based classifiers. Moreover, the proposed method achieves this by using about 20% of the synapses used by other spike algorithms. We also present results of applying our algorithm to classify the MNIST-DVS dataset collected from a real spike-based image sensor and show results comparable to the best reported ones (88.1% accuracy). For VLSI implementations, we show that the reduced synaptic memory can save upto 4X area compared to conventional crossbar topologies. Finally, we also present a biologically realistic spike-based version for calculating the correlations required by the structural learning rule and demonstrate the correspondence between the rate-based and spike-based methods of learning. PMID:27065782
From wide to close binaries?

NASA Astrophysics Data System (ADS)

Eggleton, Peter P.

The mechanisms by which the periods of wide binaries (mass 8 solar mass or less and period 10-3000 d) are lengthened or shortened are discussed, synthesizing the results of recent theoretical investigations. A system of nomenclature involving seven evolutionary states, three geometrical states, and 10 types of orbital-period evolution is developed and applied; classifications of 71 binaries are presented in a table along with the basic observational parameters. Evolutionary processes in wide binaries (single-star-type winds, magnetic braking with tidal friction, and companion-reinforced attrition), late case B systems, low-mass X-ray binaries, and triple systems are examined in detail, and possible evolutionary paths are shown in diagrams.
Assessment of various supervised learning algorithms using different performance metrics

NASA Astrophysics Data System (ADS)

Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

2017-11-01

Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.
Robust feature extraction for rapid classification of damage in composites

NASA Astrophysics Data System (ADS)

Coelho, Clyde K.; Reynolds, Whitney; Chattopadhyay, Aditi

2009-03-01

The ability to detect anomalies in signals from sensors is imperative for structural health monitoring (SHM) applications. Many of the candidate algorithms for these applications either require a lot of training examples or are very computationally inefficient for large sample sizes. The damage detection framework presented in this paper uses a combination of Linear Discriminant Analysis (LDA) along with Support Vector Machines (SVM) to obtain a computationally efficient classification scheme for rapid damage state determination. LDA was used for feature extraction of damage signals from piezoelectric sensors on a composite plate and these features were used to train the SVM algorithm in parts, reducing the computational intensity associated with the quadratic optimization problem that needs to be solved during training. SVM classifiers were organized into a binary tree structure to speed up classification, which also reduces the total training time required. This framework was validated on composite plates that were impacted at various locations. The results show that the algorithm was able to correctly predict the different impact damage cases in composite laminates using less than 21 percent of the total available training data after data reduction.
Texture classification using non-Euclidean Minkowski dilation

NASA Astrophysics Data System (ADS)

Florindo, Joao B.; Bruno, Odemir M.

2018-03-01

This study presents a new method to extract meaningful descriptors of gray-scale texture images using Minkowski morphological dilation based on the Lp metric. The proposed approach is motivated by the success previously achieved by Bouligand-Minkowski fractal descriptors on texture classification. In essence, such descriptors are directly derived from the morphological dilation of a three-dimensional representation of the gray-level pixels using the classical Euclidean metric. In this way, we generalize the dilation for different values of p in the Lp metric (Euclidean is a particular case when p = 2) and obtain the descriptors from the cumulated distribution of the distance transform computed over the texture image. The proposed method is compared to other state-of-the-art approaches (such as local binary patterns and textons for example) in the classification of two benchmark data sets (UIUC and Outex). The proposed descriptors outperformed all the other approaches in terms of rate of images correctly classified. The interesting results suggest the potential of these descriptors in this type of task, with a wide range of possible applications to real-world problems.
Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning

PubMed Central

Gönen, Mehmet

2014-01-01

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks. PMID:24532862

Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning.

PubMed

Gönen, Mehmet

2014-03-01

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F 1 , and micro F 1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.
Artificial intelligence techniques for embryo and oocyte classification.

PubMed

Manna, Claudio; Nanni, Loris; Lumini, Alessandra; Pappalardo, Sebastiana

2013-01-01

One of the most relevant aspects in assisted reproduction technology is the possibility of characterizing and identifying the most viable oocytes or embryos. In most cases, embryologists select them by visual examination and their evaluation is totally subjective. Recently, due to the rapid growth in the capacity to extract texture descriptors from a given image, a growing interest has been shown in the use of artificial intelligence methods for embryo or oocyte scoring/selection in IVF programmes. This work concentrates the efforts on the possible prediction of the quality of embryos and oocytes in order to improve the performance of assisted reproduction technology, starting from their images. The artificial intelligence system proposed in this work is based on a set of Levenberg-Marquardt neural networks trained using textural descriptors (the local binary patterns). The proposed system was tested on two data sets of 269 oocytes and 269 corresponding embryos from 104 women and compared with other machine learning methods already proposed in the past for similar classification problems. Although the results are only preliminary, they show an interesting classification performance. This technique may be of particular interest in those countries where legislation restricts embryo selection. One of the most relevant aspects in assisted reproduction technology is the possibility of characterizing and identifying the most viable oocytes or embryos. In most cases, embryologists select them by visual examination and their evaluation is totally subjective. Recently, due to the rapid growth in our capacity to extract texture descriptors from a given image, a growing interest has been shown in the use of artificial intelligence methods for embryo or oocyte scoring/selection in IVF programmes. In this work, we concentrate our efforts on the possible prediction of the quality of embryos and oocytes in order to improve the performance of assisted reproduction technology, starting from their images. The artificial intelligence system proposed in this work is based on a set of Levenberg-Marquardt neural networks trained using textural descriptors (the 'local binary patterns'). The proposed system is tested on two data sets, of 269 oocytes and 269 corresponding embryos from 104 women, and compared with other machine learning methods already proposed in the past for similar classification problems. Although the results are only preliminary, they showed an interesting classification performance. This technique may be of particular interest in those countries where legislation restricts embryo selection. Copyright © 2012 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.
The search for an elusive cutoff remains: Problems of binary classification of heavy drinking as an endpoint for alcohol clinical trials.

PubMed

Pearson, Matthew R; Bravo, Adrian J; Kirouac, Megan; Witkiewitz, Katie

2017-02-01

To examine whether a clinically meaningful alcohol consumption cutoff can be created for clinical samples, we used receiver operator characteristic (ROC) curves to derive gender-specific consumption cutoffs that maximized sensitivity and specificity in the prediction of a wide range of negative consequences from drinking. We conducted secondary data analyses using data from two large clinical trials targeting alcohol use disorders: Project MATCH (n=1726) and COMBINE (n=1383). In both studies, we found that the ideal cutoff for men and women that maximized sensitivity/specificity varied substantially both across different alcohol consumption variables and alcohol consequence outcomes. Further, the levels of sensitivity/specificity were poor across all consequences. These results fail to provide support for a clinically meaningful alcohol consumption cutoff and suggest that binary classification of levels of alcohol consumption is a poor proxy for maximizing sensitivity/specificity in the prediction of negative consequences from drinking. Future research examining consumption-consequence associations should take advantage of continuous measures of alcohol consumption and alternative approaches for assessing the link between levels of consumption and consequences (e.g., ecological momentary assessment). Clinical researchers should consider focusing more directly on the consequences they aim to reduce instead of relying on consumption as a proxy for more clinically meaningful outcomes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The Search for an Elusive Cutoff Remains: Problems of Binary Classification of Heavy Drinking as an Endpoint for Alcohol Clinical Trials

PubMed Central

Pearson, Matthew R.; Bravo, Adrian J.; Kirouac, Megan; Witkiewitz, Katie

2017-01-01

Background To examine whether a clinically meaningful alcohol consumption cutoff can be created for clinical samples, we used receiver operator characteristic (ROC) curves to derive gender-specific consumption cutoffs that maximized sensitivity and specificity in the prediction of a wide range of negative consequences from drinking. Methods We conducted secondary data analyses using data from two large clinical trials targeting alcohol use disorders: Project MATCH (n = 1,726) and COMBINE (n = 1,383). Results In both studies, we found that the ideal cutoff for men and women that maximized sensitivity/specificity varied substantially both across different alcohol consumption variables and alcohol consequence outcomes. Further, the levels of sensitivity/specificity were poor across all consequences. Conclusions These results fail to provide support for a clinically meaningful alcohol consumption cutoff and suggest that binary classification of levels of alcohol consumption is a poor proxy for maximizing sensitivity/specificity in the prediction of negative consequences from drinking. Future research examining consumption-consequence associations should take advantage of continuous measures of alcohol consumption and alternative approaches for assessing the link between levels of consumption and consequences (e.g., ecological momentary assessment). Clinical researchers should consider focusing more directly on the consequences they aim to reduce instead of relying on consumption as a proxy for more clinically meaningful outcomes. PMID:28038361
Dynamic Inertia Weight Binary Bat Algorithm with Neighborhood Search

PubMed Central

2017-01-01

Binary bat algorithm (BBA) is a binary version of the bat algorithm (BA). It has been proven that BBA is competitive compared to other binary heuristic algorithms. Since the update processes of velocity in the algorithm are consistent with BA, in some cases, this algorithm also faces the premature convergence problem. This paper proposes an improved binary bat algorithm (IBBA) to solve this problem. To evaluate the performance of IBBA, standard benchmark functions and zero-one knapsack problems have been employed. The numeric results obtained by benchmark functions experiment prove that the proposed approach greatly outperforms the original BBA and binary particle swarm optimization (BPSO). Compared with several other heuristic algorithms on zero-one knapsack problems, it also verifies that the proposed algorithm is more able to avoid local minima. PMID:28634487
Dynamic Inertia Weight Binary Bat Algorithm with Neighborhood Search.

PubMed

Huang, Xingwang; Zeng, Xuewen; Han, Rui

2017-01-01

Binary bat algorithm (BBA) is a binary version of the bat algorithm (BA). It has been proven that BBA is competitive compared to other binary heuristic algorithms. Since the update processes of velocity in the algorithm are consistent with BA, in some cases, this algorithm also faces the premature convergence problem. This paper proposes an improved binary bat algorithm (IBBA) to solve this problem. To evaluate the performance of IBBA, standard benchmark functions and zero-one knapsack problems have been employed. The numeric results obtained by benchmark functions experiment prove that the proposed approach greatly outperforms the original BBA and binary particle swarm optimization (BPSO). Compared with several other heuristic algorithms on zero-one knapsack problems, it also verifies that the proposed algorithm is more able to avoid local minima.
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

PubMed Central

Zhao, Xin; Cheung, Leo Wang-Kit

2007-01-01

Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. Conclusion Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently. PMID:17328811
Multiclass fMRI data decoding and visualization using supervised self-organizing maps.

PubMed

Hausfeld, Lars; Valente, Giancarlo; Formisano, Elia

2014-08-01

When multivariate pattern decoding is applied to fMRI studies entailing more than two experimental conditions, a most common approach is to transform the multiclass classification problem into a series of binary problems. Furthermore, for decoding analyses, classification accuracy is often the only outcome reported although the topology of activation patterns in the high-dimensional features space may provide additional insights into underlying brain representations. Here we propose to decode and visualize voxel patterns of fMRI datasets consisting of multiple conditions with a supervised variant of self-organizing maps (SSOMs). Using simulations and real fMRI data, we evaluated the performance of our SSOM-based approach. Specifically, the analysis of simulated fMRI data with varying signal-to-noise and contrast-to-noise ratio suggested that SSOMs perform better than a k-nearest-neighbor classifier for medium and large numbers of features (i.e. 250 to 1000 or more voxels) and similar to support vector machines (SVMs) for small and medium numbers of features (i.e. 100 to 600voxels). However, for a larger number of features (>800voxels), SSOMs performed worse than SVMs. When applied to a challenging 3-class fMRI classification problem with datasets collected to examine the neural representation of three human voices at individual speaker level, the SSOM-based algorithm was able to decode speaker identity from auditory cortical activation patterns. Classification performances were similar between SSOMs and other decoding algorithms; however, the ability to visualize decoding models and underlying data topology of SSOMs promotes a more comprehensive understanding of classification outcomes. We further illustrated this visualization ability of SSOMs with a re-analysis of a dataset examining the representation of visual categories in the ventral visual cortex (Haxby et al., 2001). This analysis showed that SSOMs could retrieve and visualize topography and neighborhood relations of the brain representation of eight visual categories. We conclude that SSOMs are particularly suited for decoding datasets consisting of more than two classes and are optimally combined with approaches that reduce the number of voxels used for classification (e.g. region-of-interest or searchlight approaches). Copyright © 2014. Published by Elsevier Inc.
The construction of support vector machine classifier using the firefly algorithm.

PubMed

Chao, Chih-Feng; Horng, Ming-Huwi

2015-01-01

The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

PubMed Central

Chao, Chih-Feng; Horng, Ming-Huwi

2015-01-01

The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511
Ground-based cloud classification by learning stable local binary patterns

NASA Astrophysics Data System (ADS)

Wang, Yu; Shi, Cunzhao; Wang, Chunheng; Xiao, Baihua

2018-07-01

Feature selection and extraction is the first step in implementing pattern classification. The same is true for ground-based cloud classification. Histogram features based on local binary patterns (LBPs) are widely used to classify texture images. However, the conventional uniform LBP approach cannot capture all the dominant patterns in cloud texture images, thereby resulting in low classification performance. In this study, a robust feature extraction method by learning stable LBPs is proposed based on the averaged ranks of the occurrence frequencies of all rotation invariant patterns defined in the LBPs of cloud images. The proposed method is validated with a ground-based cloud classification database comprising five cloud types. Experimental results demonstrate that the proposed method achieves significantly higher classification accuracy than the uniform LBP, local texture patterns (LTP), dominant LBP (DLBP), completed LBP (CLTP) and salient LBP (SaLBP) methods in this cloud image database and under different noise conditions. And the performance of the proposed method is comparable with that of the popular deep convolutional neural network (DCNN) method, but with less computation complexity. Furthermore, the proposed method also achieves superior performance on an independent test data set.
On the Complexity of Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.

PubMed

Kordi, Misagh; Bansal, Mukul S

2017-01-01

Duplication-Transfer-Loss (DTL) reconciliation has emerged as a powerful technique for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation takes as input a gene family phylogeny and the corresponding species phylogeny, and reconciles the two by postulating speciation, gene duplication, horizontal gene transfer, and gene loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. However, gene trees are frequently non-binary. With such non-binary gene trees, the reconciliation problem seeks to find a binary resolution of the gene tree that minimizes the reconciliation cost. Given the prevalence of non-binary gene trees, many efficient algorithms have been developed for this problem in the context of the simpler Duplication-Loss (DL) reconciliation model. Yet, no efficient algorithms exist for DTL reconciliation with non-binary gene trees and the complexity of the problem remains unknown. In this work, we resolve this open question by showing that the problem is, in fact, NP-hard. Our reduction applies to both the dated and undated formulations of DTL reconciliation. By resolving this long-standing open problem, this work will spur the development of both exact and heuristic algorithms for this important problem.
Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

PubMed

Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

2017-11-02

Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
Detecting nonsense for Chinese comments based on logistic regression

NASA Astrophysics Data System (ADS)

Zhuolin, Ren; Guang, Chen; Shu, Chen

2016-07-01

To understand cyber citizens' opinion accurately from Chinese news comments, the clear definition on nonsense is present, and a detection model based on logistic regression (LR) is proposed. The detection of nonsense can be treated as a binary-classification problem. Besides of traditional lexical features, we propose three kinds of features in terms of emotion, structure and relevance. By these features, we train an LR model and demonstrate its effect in understanding Chinese news comments. We find that each of proposed features can significantly promote the result. In our experiments, we achieve a prediction accuracy of 84.3% which improves the baseline 77.3% by 7%.
Seismic event classification system

DOEpatents

Dowla, F.U.; Jarpe, S.P.; Maurer, W.

1994-12-13

In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities. 21 figures.
Seismic event classification system

DOEpatents

Dowla, Farid U.; Jarpe, Stephen P.; Maurer, William

1994-01-01

In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities.
Cascaded discrimination of normal, abnormal, and confounder classes in histopathology: Gleason grading of prostate cancer

PubMed Central

2012-01-01

Background Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC), where all classes are identified simultaneously, and one-versus-all (OVA), where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer), while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS) approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity. Results We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN), and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV) of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV) and OSC approaches (0.76 PPV). Conclusions Use of the CAS strategy increases the PPV for a multi-category classification system over two common alternative strategies. In classification problems such as histopathology, where multiple class groups exist with varying degrees of heterogeneity, the CAS system can intelligently assign class labels to objects by performing multiple binary classifications according to domain knowledge. PMID:23110677
Use of Binary Partition Tree and energy minimization for object-based classification of urban land cover

NASA Astrophysics Data System (ADS)

Li, Mengmeng; Bijker, Wietske; Stein, Alfred

2015-04-01

Two main challenges are faced when classifying urban land cover from very high resolution satellite images: obtaining an optimal image segmentation and distinguishing buildings from other man-made objects. For optimal segmentation, this work proposes a hierarchical representation of an image by means of a Binary Partition Tree (BPT) and an unsupervised evaluation of image segmentations by energy minimization. For building extraction, we apply fuzzy sets to create a fuzzy landscape of shadows which in turn involves a two-step procedure. The first step is a preliminarily image classification at a fine segmentation level to generate vegetation and shadow information. The second step models the directional relationship between building and shadow objects to extract building information at the optimal segmentation level. We conducted the experiments on two datasets of Pléiades images from Wuhan City, China. To demonstrate its performance, the proposed classification is compared at the optimal segmentation level with Maximum Likelihood Classification and Support Vector Machine classification. The results show that the proposed classification produced the highest overall accuracies and kappa coefficients, and the smallest over-classification and under-classification geometric errors. We conclude first that integrating BPT with energy minimization offers an effective means for image segmentation. Second, we conclude that the directional relationship between building and shadow objects represented by a fuzzy landscape is important for building extraction.
Main-sequence magnetic CP stars: II. Physical parameters and chemical composition of the atmosphere

NASA Astrophysics Data System (ADS)

Romanyuk, I. I.

2007-03-01

This paper continues a series of reviews dedicated to magnetic CP stars. The occurrence frequency of CP stars among B5 F0-type main-sequence stars is shown to be equal to about 15 20%. The problems of identification and classification of these objects are addressed. We prefer the classification of Preston, which subdivides chemically peculiar stars into the following groups: Am, λ Boo, Ap/Bp, Hg-Mn, He-weak, and He-strong stars. The main characteristic features of objects of each group are briefly analyzed. The rotation velocities of CP stars are shown to be about three times lower than those of normal stars of the same spectral types (except for λ Boo and He-strong objects). The rotation periods of CP stars range from 0.5 to 100 days, however, there is also a small group of objects with especially long (up to several tens of years) variability periods. All kinds of peculiar stars can be found in visual binaries, with Am-and Hg-Mn-type stars occurring mostly in short-period binaries with P < 10 days, and the binary rate of these stars is close to normal. The percentage of binaries among magnetic stars (20%) is lower than among normal stars. A rather large fraction of CP1-and CP2-type stars was found to occur in young clusters (with ages smaller than 107 years). Photometric and spectral variability of peculiar stars of various types is discussed, and it is shown that only objects possessing magnetic fields exhibit light and spectral variations. The chemical composition of the atmospheres of CP stars of various types is considered. The abundances of various elements are usually determined by comparing the line profiles in the observed spectrum with those of the synthetic spectra computed for various model atmospheres. Different mechanisms are shown to contribute to chemical inhomogeneity at the star’s surface, and the hypothesis of selective diffusion of atoms in a stable atmosphere is developed. Attention is also paid to the problems of the determination of local chemical composition including the stratification of elements. Some of the coolest SrCrEu peculiar stars are found to exhibit fast light variations with periods ranging from 6 to 15 min. These variations are unassociated with rotation, but are due to nonradial pulsations. The final part of the the review considers the fundamental parameters of CP stars. The effective temperatures, luminosities, radii, and masses of these objects are shown to agree with the corresponding physical parameters of normal main-sequence stars of the same spectral types.
Orientation selectivity based structure for texture classification

NASA Astrophysics Data System (ADS)

Wu, Jinjian; Lin, Weisi; Shi, Guangming; Zhang, Yazhong; Lu, Liu

2014-10-01

Local structure, e.g., local binary pattern (LBP), is widely used in texture classification. However, LBP is too sensitive to disturbance. In this paper, we introduce a novel structure for texture classification. Researches on cognitive neuroscience indicate that the primary visual cortex presents remarkable orientation selectivity for visual information extraction. Inspired by this, we investigate the orientation similarities among neighbor pixels, and propose an orientation selectivity based pattern for local structure description. Experimental results on texture classification demonstrate that the proposed structure descriptor is quite robust to disturbance.

How Binary Skills Obscure the Transition from Non-Mastery to Mastery

ERIC Educational Resources Information Center

Karelitz, Tzur M.

2008-01-01

What is the nature of latent predictors that facilitate diagnostic classification? Rupp and Templin (this issue) suggest that these predictors should be multidimensional, categorical variables that can be combined in various ways. Diagnostic Classification Models (DCM) typically use multiple categorical predictors to classify respondents into…
An approach for combining airborne LiDAR and high-resolution aerial color imagery using Gaussian processes

NASA Astrophysics Data System (ADS)

Liu, Yansong; Monteiro, Sildomar T.; Saber, Eli

2015-10-01

Changes in vegetation cover, building construction, road network and traffic conditions caused by urban expansion affect the human habitat as well as the natural environment in rapidly developing cities. It is crucial to assess these changes and respond accordingly by identifying man-made and natural structures with accurate classification algorithms. With the increase in use of multi-sensor remote sensing systems, researchers are able to obtain a more complete description of the scene of interest. By utilizing multi-sensor data, the accuracy of classification algorithms can be improved. In this paper, we propose a method for combining 3D LiDAR point clouds and high-resolution color images to classify urban areas using Gaussian processes (GP). GP classification is a powerful non-parametric classification method that yields probabilistic classification results. It makes predictions in a way that addresses the uncertainty of real world. In this paper, we attempt to identify man-made and natural objects in urban areas including buildings, roads, trees, grass, water and vehicles. LiDAR features are derived from the 3D point clouds and the spatial and color features are extracted from RGB images. For classification, we use the Laplacian approximation for GP binary classification on the new combined feature space. The multiclass classification has been implemented by using one-vs-all binary classification strategy. The result of applying support vector machines (SVMs) and logistic regression (LR) classifier is also provided for comparison. Our experiments show a clear improvement of classification results by using the two sensors combined instead of each sensor separately. Also we found the advantage of applying GP approach to handle the uncertainty in classification result without compromising accuracy compared to SVM, which is considered as the state-of-the-art classification method.
Entropy coders for image compression based on binary forward classification

NASA Astrophysics Data System (ADS)

Yoo, Hoon; Jeong, Jechang

2000-12-01

Entropy coders as a noiseless compression method are widely used as final step compression for images, and there have been many contributions to increase of entropy coder performance and to reduction of entropy coder complexity. In this paper, we propose some entropy coders based on the binary forward classification (BFC). The BFC requires overhead of classification but there is no change between the amount of input information and the total amount of classified output information, which we prove this property in this paper. And using the proved property, we propose entropy coders that are the BFC followed by Golomb-Rice coders (BFC+GR) and the BFC followed by arithmetic coders (BFC+A). The proposed entropy coders introduce negligible additional complexity due to the BFC. Simulation results also show better performance than other entropy coders that have similar complexity to the proposed coders.
Performances of Machine Learning Algorithms for Binary Classification of Network Anomaly Detection System

NASA Astrophysics Data System (ADS)

Nawir, Mukrimah; Amir, Amiza; Lynn, Ong Bi; Yaakob, Naimah; Badlishah Ahmad, R.

2018-05-01

The rapid growth of technologies might endanger them to various network attacks due to the nature of data which are frequently exchange their data through Internet and large-scale data that need to be handle. Moreover, network anomaly detection using machine learning faced difficulty when dealing the involvement of dataset where the number of labelled network dataset is very few in public and this caused many researchers keep used the most commonly network dataset (KDDCup99) which is not relevant to employ the machine learning (ML) algorithms for a classification. Several issues regarding these available labelled network datasets are discussed in this paper. The aim of this paper to build a network anomaly detection system using machine learning algorithms that are efficient, effective and fast processing. The finding showed that AODE algorithm is performed well in term of accuracy and processing time for binary classification towards UNSW-NB15 dataset.
Improvements on ν-Twin Support Vector Machine.

PubMed

Khemchandani, Reshma; Saigal, Pooja; Chandra, Suresh

2016-07-01

In this paper, we propose two novel binary classifiers termed as "Improvements on ν-Twin Support Vector Machine: Iν-TWSVM and Iν-TWSVM (Fast)" that are motivated by ν-Twin Support Vector Machine (ν-TWSVM). Similar to ν-TWSVM, Iν-TWSVM determines two nonparallel hyperplanes such that they are closer to their respective classes and are at least ρ distance away from the other class. The significant advantage of Iν-TWSVM over ν-TWSVM is that Iν-TWSVM solves one smaller-sized Quadratic Programming Problem (QPP) and one Unconstrained Minimization Problem (UMP); as compared to solving two related QPPs in ν-TWSVM. Further, Iν-TWSVM (Fast) avoids solving a smaller sized QPP and transforms it as a unimodal function, which can be solved using line search methods and similar to Iν-TWSVM, the other problem is solved as a UMP. Due to their novel formulation, the proposed classifiers are faster than ν-TWSVM and have comparable generalization ability. Iν-TWSVM also implements structural risk minimization (SRM) principle by introducing a regularization term, along with minimizing the empirical risk. The other properties of Iν-TWSVM, related to support vectors (SVs), are similar to that of ν-TWSVM. To test the efficacy of the proposed method, experiments have been conducted on a wide range of UCI and a skewed variation of NDC datasets. We have also given the application of Iν-TWSVM as a binary classifier for pixel classification of color images. Copyright © 2016 Elsevier Ltd. All rights reserved.
Binary classification of items of interest in a repeatable process

DOEpatents

Abell, Jeffrey A.; Spicer, John Patrick; Wincek, Michael Anthony; Wang, Hui; Chakraborty, Debejyo

2014-06-24

A system includes host and learning machines in electrical communication with sensors positioned with respect to an item of interest, e.g., a weld, and memory. The host executes instructions from memory to predict a binary quality status of the item. The learning machine receives signals from the sensor(s), identifies candidate features, and extracts features from the candidates that are more predictive of the binary quality status relative to other candidate features. The learning machine maps the extracted features to a dimensional space that includes most of the items from a passing binary class and excludes all or most of the items from a failing binary class. The host also compares the received signals for a subsequent item of interest to the dimensional space to thereby predict, in real time, the binary quality status of the subsequent item of interest.
Computer assisted optical biopsy for colorectal polyps

NASA Astrophysics Data System (ADS)

Navarro-Avila, Fernando J.; Saint-Hill-Febles, Yadira; Renner, Janis; Klare, Peter; von Delius, Stefan; Navab, Nassir; Mateus, Diana

2017-03-01

We propose a method for computer-assisted optical biopsy for colorectal polyps, with the final goal of assisting the medical expert during the colonoscopy. In particular, we target the problem of automatic classification of polyp images in two classes: adenomatous vs non-adenoma. Our approach is based on recent advancements in convolutional neural networks (CNN) for image representation. In the paper, we describe and compare four different methodologies to address the binary classification task: a baseline with classical features and a Random Forest classifier, two methods based on features obtained from a pre-trained network, and finally, the end-to-end training of a CNN. With the pre-trained network, we show the feasibility of transferring a feature extraction mechanism trained on millions of natural images, to the task of classifying adenomatous polyps. We then demonstrate further performance improvements when training the CNN for our specific classification task. In our study, 776 polyp images were acquired and histologically analyzed after polyp resection. We report a performance increase of the CNN-based approaches with respect to both, the conventional engineered features and to a state-of-the-art method based on videos and 3D shape features.
Resampling approach for anomalous change detection

NASA Astrophysics Data System (ADS)

Theiler, James; Perkins, Simon

2007-04-01

We investigate the problem of identifying pixels in pairs of co-registered images that correspond to real changes on the ground. Changes that are due to environmental differences (illumination, atmospheric distortion, etc.) or sensor differences (focus, contrast, etc.) will be widespread throughout the image, and the aim is to avoid these changes in favor of changes that occur in only one or a few pixels. Formal outlier detection schemes (such as the one-class support vector machine) can identify rare occurrences, but will be confounded by pixels that are "equally rare" in both images: they may be anomalous, but they are not changes. We describe a resampling scheme we have developed that formally addresses both of these issues, and reduces the problem to a binary classification, a problem for which a large variety of machine learning tools have been developed. In principle, the effects of misregistration will manifest themselves as pervasive changes, and our method will be robust against them - but in practice, misregistration remains a serious issue.
Odds Ratio, Delta, ETS Classification, and Standardization Measures of DIF Magnitude for Binary Logistic Regression

ERIC Educational Resources Information Center

Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.

2007-01-01

Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
Stable Sparse Classifiers Identify qEEG Signatures that Predict Learning Disabilities (NOS) Severity

PubMed Central

Bosch-Bayard, Jorge; Galán-García, Lídice; Fernandez, Thalia; Lirio, Rolando B.; Bringas-Vega, Maria L.; Roca-Stappung, Milene; Ricardo-Garcell, Josefina; Harmony, Thalía; Valdes-Sosa, Pedro A.

2018-01-01

In this paper, we present a novel methodology to solve the classification problem, based on sparse (data-driven) regressions, combined with techniques for ensuring stability, especially useful for high-dimensional datasets and small samples number. The sensitivity and specificity of the classifiers are assessed by a stable ROC procedure, which uses a non-parametric algorithm for estimating the area under the ROC curve. This method allows assessing the performance of the classification by the ROC technique, when more than two groups are involved in the classification problem, i.e., when the gold standard is not binary. We apply this methodology to the EEG spectral signatures to find biomarkers that allow discriminating between (and predicting pertinence to) different subgroups of children diagnosed as Not Otherwise Specified Learning Disabilities (LD-NOS) disorder. Children with LD-NOS have notable learning difficulties, which affect education but are not able to be put into some specific category as reading (Dyslexia), Mathematics (Dyscalculia), or Writing (Dysgraphia). By using the EEG spectra, we aim to identify EEG patterns that may be related to specific learning disabilities in an individual case. This could be useful to develop subject-based methods of therapy, based on information provided by the EEG. Here we study 85 LD-NOS children, divided in three subgroups previously selected by a clustering technique over the scores of cognitive tests. The classification equation produced stable marginal areas under the ROC of 0.71 for discrimination between Group 1 vs. Group 2; 0.91 for Group 1 vs. Group 3; and 0.75 for Group 2 vs. Group1. A discussion of the EEG characteristics of each group related to the cognitive scores is also presented. PMID:29379411
Stable Sparse Classifiers Identify qEEG Signatures that Predict Learning Disabilities (NOS) Severity.

PubMed

Bosch-Bayard, Jorge; Galán-García, Lídice; Fernandez, Thalia; Lirio, Rolando B; Bringas-Vega, Maria L; Roca-Stappung, Milene; Ricardo-Garcell, Josefina; Harmony, Thalía; Valdes-Sosa, Pedro A

2017-01-01

In this paper, we present a novel methodology to solve the classification problem, based on sparse (data-driven) regressions, combined with techniques for ensuring stability, especially useful for high-dimensional datasets and small samples number. The sensitivity and specificity of the classifiers are assessed by a stable ROC procedure, which uses a non-parametric algorithm for estimating the area under the ROC curve. This method allows assessing the performance of the classification by the ROC technique, when more than two groups are involved in the classification problem, i.e., when the gold standard is not binary. We apply this methodology to the EEG spectral signatures to find biomarkers that allow discriminating between (and predicting pertinence to) different subgroups of children diagnosed as Not Otherwise Specified Learning Disabilities (LD-NOS) disorder. Children with LD-NOS have notable learning difficulties, which affect education but are not able to be put into some specific category as reading (Dyslexia), Mathematics (Dyscalculia), or Writing (Dysgraphia). By using the EEG spectra, we aim to identify EEG patterns that may be related to specific learning disabilities in an individual case. This could be useful to develop subject-based methods of therapy, based on information provided by the EEG. Here we study 85 LD-NOS children, divided in three subgroups previously selected by a clustering technique over the scores of cognitive tests. The classification equation produced stable marginal areas under the ROC of 0.71 for discrimination between Group 1 vs. Group 2; 0.91 for Group 1 vs. Group 3; and 0.75 for Group 2 vs. Group1. A discussion of the EEG characteristics of each group related to the cognitive scores is also presented.
Unveiling relevant non-motor Parkinson's disease severity symptoms using a machine learning approach.

PubMed

Armañanzas, Rubén; Bielza, Concha; Chaudhuri, Kallol Ray; Martinez-Martin, Pablo; Larrañaga, Pedro

2013-07-01

Is it possible to predict the severity staging of a Parkinson's disease (PD) patient using scores of non-motor symptoms? This is the kickoff question for a machine learning approach to classify two widely known PD severity indexes using individual tests from a broad set of non-motor PD clinical scales only. The Hoehn & Yahr index and clinical impression of severity index are global measures of PD severity. They constitute the labels to be assigned in two supervised classification problems using only non-motor symptom tests as predictor variables. Such predictors come from a wide range of PD symptoms, such as cognitive impairment, psychiatric complications, autonomic dysfunction or sleep disturbance. The classification was coupled with a feature subset selection task using an advanced evolutionary algorithm, namely an estimation of distribution algorithm. Results show how five different classification paradigms using a wrapper feature selection scheme are capable of predicting each of the class variables with estimated accuracy in the range of 72-92%. In addition, classification into the main three severity categories (mild, moderate and severe) was split into dichotomic problems where binary classifiers perform better and select different subsets of non-motor symptoms. The number of jointly selected symptoms throughout the whole process was low, suggesting a link between the selected non-motor symptoms and the general severity of the disease. Quantitative results are discussed from a medical point of view, reflecting a clear translation to the clinical manifestations of PD. Moreover, results include a brief panel of non-motor symptoms that could help clinical practitioners to identify patients who are at different stages of the disease from a limited set of symptoms, such as hallucinations, fainting, inability to control body sphincters or believing in unlikely facts. Copyright © 2013 Elsevier B.V. All rights reserved.
Using classification tree analysis to predict oak wilt distribution in Minnesota and Texas

Treesearch

Marla c. Downing; Vernon L. Thomas; Jennifer Juzwik; David N. Appel; Robin M. Reich; Kim Camilli

2008-01-01

We developed a methodology and compared results for predicting the potential distribution of Ceratocystis fagacearum (causal agent of oak wilt), in both Anoka County, MN, and Fort Hood, TX. The Potential Distribution of Oak Wilt (PDOW) utilizes a binary classification tree statistical technique that incorporates: geographical information systems (GIS...
Mammogram classification scheme using 2D-discrete wavelet and local binary pattern for detection of breast cancer

NASA Astrophysics Data System (ADS)

Adi Putra, Januar

2018-04-01

In this paper, we propose a new mammogram classification scheme to classify the breast tissues as normal or abnormal. Feature matrix is generated using Local Binary Pattern to all the detailed coefficients from 2D-DWT of the region of interest (ROI) of a mammogram. Feature selection is done by selecting the relevant features that affect the classification. Feature selection is used to reduce the dimensionality of data and features that are not relevant, in this paper the F-test and Ttest will be performed to the results of the feature extraction dataset to reduce and select the relevant feature. The best features are used in a Neural Network classifier for classification. In this research we use MIAS and DDSM database. In addition to the suggested scheme, the competent schemes are also simulated for comparative analysis. It is observed that the proposed scheme has a better say with respect to accuracy, specificity and sensitivity. Based on experiments, the performance of the proposed scheme can produce high accuracy that is 92.71%, while the lowest accuracy obtained is 77.08%.
Gender classification from face images by using local binary pattern and gray-level co-occurrence matrix

NASA Astrophysics Data System (ADS)

Uzbaş, Betül; Arslan, Ahmet

2018-04-01

Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.
AUC-Maximizing Ensembles through Metalearning.

PubMed

LeDell, Erin; van der Laan, Mark J; Petersen, Maya

2016-05-01

Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree.
AUC-Maximizing Ensembles through Metalearning

PubMed Central

LeDell, Erin; van der Laan, Mark J.; Peterson, Maya

2016-01-01

Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree. PMID:27227721
Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification.

PubMed

Mirza, Bilal; Lin, Zhiping

2016-08-01

In this paper, a meta-cognitive online sequential extreme learning machine (MOS-ELM) is proposed for class imbalance and concept drift learning. In MOS-ELM, meta-cognition is used to self-regulate the learning by selecting suitable learning strategies for class imbalance and concept drift problems. MOS-ELM is the first sequential learning method to alleviate the imbalance problem for both binary class and multi-class data streams with concept drift. In MOS-ELM, a new adaptive window approach is proposed for concept drift learning. A single output update equation is also proposed which unifies various application specific OS-ELM methods. The performance of MOS-ELM is evaluated under different conditions and compared with methods each specific to some of the conditions. On most of the datasets in comparison, MOS-ELM outperforms the competing methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
A novel encoding scheme for effective biometric discretization: Linearly Separable Subcode.

PubMed

Lim, Meng-Hui; Teoh, Andrew Beng Jin

2013-02-01

Separability in a code is crucial in guaranteeing a decent Hamming-distance separation among the codewords. In multibit biometric discretization where a code is used for quantization-intervals labeling, separability is necessary for preserving distance dissimilarity when feature components are mapped from a discrete space to a Hamming space. In this paper, we examine separability of Binary Reflected Gray Code (BRGC) encoding and reveal its inadequacy in tackling interclass variation during the discrete-to-binary mapping, leading to a tradeoff between classification performance and entropy of binary output. To overcome this drawback, we put forward two encoding schemes exhibiting full-ideal and near-ideal separability capabilities, known as Linearly Separable Subcode (LSSC) and Partially Linearly Separable Subcode (PLSSC), respectively. These encoding schemes convert the conventional entropy-performance tradeoff into an entropy-redundancy tradeoff in the increase of code length. Extensive experimental results vindicate the superiority of our schemes over the existing encoding schemes in discretization performance. This opens up possibilities of achieving much greater classification performance with high output entropy.
Data fusion with artificial neural networks (ANN) for classification of earth surface from microwave satellite measurements

NASA Technical Reports Server (NTRS)

Lure, Y. M. Fleming; Grody, Norman C.; Chiou, Y. S. Peter; Yeh, H. Y. Michael

1993-01-01

A data fusion system with artificial neural networks (ANN) is used for fast and accurate classification of five earth surface conditions and surface changes, based on seven SSMI multichannel microwave satellite measurements. The measurements include brightness temperatures at 19, 22, 37, and 85 GHz at both H and V polarizations (only V at 22 GHz). The seven channel measurements are processed through a convolution computation such that all measurements are located at same grid. Five surface classes including non-scattering surface, precipitation over land, over ocean, snow, and desert are identified from ground-truth observations. The system processes sensory data in three consecutive phases: (1) pre-processing to extract feature vectors and enhance separability among detected classes; (2) preliminary classification of Earth surface patterns using two separate and parallely acting classifiers: back-propagation neural network and binary decision tree classifiers; and (3) data fusion of results from preliminary classifiers to obtain the optimal performance in overall classification. Both the binary decision tree classifier and the fusion processing centers are implemented by neural network architectures. The fusion system configuration is a hierarchical neural network architecture, in which each functional neural net will handle different processing phases in a pipelined fashion. There is a total of around 13,500 samples for this analysis, of which 4 percent are used as the training set and 96 percent as the testing set. After training, this classification system is able to bring up the detection accuracy to 94 percent compared with 88 percent for back-propagation artificial neural networks and 80 percent for binary decision tree classifiers. The neural network data fusion classification is currently under progress to be integrated in an image processing system at NOAA and to be implemented in a prototype of a massively parallel and dynamically reconfigurable Modular Neural Ring (MNR).

Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

PubMed

Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

2017-01-01

A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Binary classification of items of interest in a repeatable process

DOEpatents

Abell, Jeffrey A; Spicer, John Patrick; Wincek, Michael Anthony; Wang, Hui; Chakraborty, Debejyo

2015-01-06

A system includes host and learning machines. Each machine has a processor in electrical communication with at least one sensor. Instructions for predicting a binary quality status of an item of interest during a repeatable process are recorded in memory. The binary quality status includes passing and failing binary classes. The learning machine receives signals from the at least one sensor and identifies candidate features. Features are extracted from the candidate features, each more predictive of the binary quality status. The extracted features are mapped to a dimensional space having a number of dimensions proportional to the number of extracted features. The dimensional space includes most of the passing class and excludes at least 90 percent of the failing class. Received signals are compared to the boundaries of the recorded dimensional space to predict, in real time, the binary quality status of a subsequent item of interest.
Toward more intuitive brain-computer interfacing: classification of binary covert intentions using functional near-infrared spectroscopy

NASA Astrophysics Data System (ADS)

Hwang, Han-Jeong; Choi, Han; Kim, Jeong-Youn; Chang, Won-Du; Kim, Do-Won; Kim, Kiwoong; Jo, Sungho; Im, Chang-Hwan

2016-09-01

In traditional brain-computer interface (BCI) studies, binary communication systems have generally been implemented using two mental tasks arbitrarily assigned to "yes" or "no" intentions (e.g., mental arithmetic calculation for "yes"). A recent pilot study performed with one paralyzed patient showed the possibility of a more intuitive paradigm for binary BCI communications, in which the patient's internal yes/no intentions were directly decoded from functional near-infrared spectroscopy (fNIRS). We investigated whether such an "fNIRS-based direct intention decoding" paradigm can be reliably used for practical BCI communications. Eight healthy subjects participated in this study, and each participant was administered 70 disjunctive questions. Brain hemodynamic responses were recorded using a multichannel fNIRS device, while the participants were internally expressing "yes" or "no" intentions to each question. Different feature types, feature numbers, and time window sizes were tested to investigate optimal conditions for classifying the internal binary intentions. About 75% of the answers were correctly classified when the individual best feature set was employed (75.89% ±1.39 and 74.08% ±2.87 for oxygenated and deoxygenated hemoglobin responses, respectively), which was significantly higher than a random chance level (68.57% for p<0.001). The kurtosis feature showed the highest mean classification accuracy among all feature types. The grand-averaged hemodynamic responses showed that wide brain regions are associated with the processing of binary implicit intentions. Our experimental results demonstrated that direct decoding of internal binary intention has the potential to be used for implementing more intuitive and user-friendly communication systems for patients with motor disabilities.
Speech Segregation based on Binary Classification

DTIC Science & Technology

2016-07-15

including the IBM, the target binary mask (TBM), the IRM, the short -time Fourier transform spectral magnitude (FFT-MAG) and its corresponding mask (FFT...complementary features and a fixed DNN as the discriminative learning machine. For evaluation metrics, besides SNR, we use the Short -Time Objective...target analysis is a recent successful intelligibility test conducted on both normal-hearing (NH) and hearing-impaired (HI) listeners. The speech
Multi-task linear programming discriminant analysis for the identification of progressive MCI individuals.

PubMed

Yu, Guan; Liu, Yufeng; Thung, Kim-Han; Shen, Dinggang

2014-01-01

Accurately identifying mild cognitive impairment (MCI) individuals who will progress to Alzheimer's disease (AD) is very important for making early interventions. Many classification methods focus on integrating multiple imaging modalities such as magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (FDG-PET). However, the main challenge for MCI classification using multiple imaging modalities is the existence of a lot of missing data in many subjects. For example, in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, almost half of the subjects do not have PET images. In this paper, we propose a new and flexible binary classification method, namely Multi-task Linear Programming Discriminant (MLPD) analysis, for the incomplete multi-source feature learning. Specifically, we decompose the classification problem into different classification tasks, i.e., one for each combination of available data sources. To solve all different classification tasks jointly, our proposed MLPD method links them together by constraining them to achieve the similar estimated mean difference between the two classes (under classification) for those shared features. Compared with the state-of-the-art incomplete Multi-Source Feature (iMSF) learning method, instead of constraining different classification tasks to choose a common feature subset for those shared features, MLPD can flexibly and adaptively choose different feature subsets for different classification tasks. Furthermore, our proposed MLPD method can be efficiently implemented by linear programming. To validate our MLPD method, we perform experiments on the ADNI baseline dataset with the incomplete MRI and PET images from 167 progressive MCI (pMCI) subjects and 226 stable MCI (sMCI) subjects. We further compared our method with the iMSF method (using incomplete MRI and PET images) and also the single-task classification method (using only MRI or only subjects with both MRI and PET images). Experimental results show very promising performance of our proposed MLPD method.
Multi-Task Linear Programming Discriminant Analysis for the Identification of Progressive MCI Individuals

PubMed Central

Yu, Guan; Liu, Yufeng; Thung, Kim-Han; Shen, Dinggang

2014-01-01

Accurately identifying mild cognitive impairment (MCI) individuals who will progress to Alzheimer's disease (AD) is very important for making early interventions. Many classification methods focus on integrating multiple imaging modalities such as magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (FDG-PET). However, the main challenge for MCI classification using multiple imaging modalities is the existence of a lot of missing data in many subjects. For example, in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, almost half of the subjects do not have PET images. In this paper, we propose a new and flexible binary classification method, namely Multi-task Linear Programming Discriminant (MLPD) analysis, for the incomplete multi-source feature learning. Specifically, we decompose the classification problem into different classification tasks, i.e., one for each combination of available data sources. To solve all different classification tasks jointly, our proposed MLPD method links them together by constraining them to achieve the similar estimated mean difference between the two classes (under classification) for those shared features. Compared with the state-of-the-art incomplete Multi-Source Feature (iMSF) learning method, instead of constraining different classification tasks to choose a common feature subset for those shared features, MLPD can flexibly and adaptively choose different feature subsets for different classification tasks. Furthermore, our proposed MLPD method can be efficiently implemented by linear programming. To validate our MLPD method, we perform experiments on the ADNI baseline dataset with the incomplete MRI and PET images from 167 progressive MCI (pMCI) subjects and 226 stable MCI (sMCI) subjects. We further compared our method with the iMSF method (using incomplete MRI and PET images) and also the single-task classification method (using only MRI or only subjects with both MRI and PET images). Experimental results show very promising performance of our proposed MLPD method. PMID:24820966
Searching for Unresolved Binary Brown Dwarfs

NASA Astrophysics Data System (ADS)

Albretsen, Jacob; Stephens, Denise

2007-10-01

There are currently L and T brown dwarfs (BDs) with errors in their classification of +/- 1 to 2 spectra types. Metallicity and gravitational differences have accounted for some of these discrepancies, and recent studies have shown unresolved binary BDs may offer some explanation as well. However limitations in technology and resources often make it difficult to clearly resolve an object that may be binary in nature. Stephens and Noll (2006) identified statistically strong binary source candidates from Hubble Space Telescope (HST) images of Trans-Neptunian Objects (TNOs) that were apparently unresolved using model point-spread functions for single and binary sources. The HST archive contains numerous observations of BDs using the Near Infrared Camera and Multi-Object Spectrometer (NICMOS) that have never been rigorously analyzed for binary properties. Using methods developed by Stephens and Noll (2006), BD observations from the HST data archive are being analyzed for possible unresolved binaries. Preliminary results will be presented. This technique will identify potential candidates for future observations to determine orbital information.
Optimal Methods for Classification of Digitally Modulated Signals

DTIC Science & Technology

2013-03-01

of using a ratio of likelihood functions, the proposed approach uses the Kullback - Leibler (KL) divergence. KL...58 List of Acronyms ALRT Average LRT BPSK Binary Shift Keying BPSK-SS BPSK Spread Spectrum or CDMA DKL Kullback - Leibler Information Divergence...blind demodulation for develop classification algorithms for wider set of signals types. Two methodologies were used : Likelihood Ratio Test
mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

PubMed Central

Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

2015-01-01

An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028
mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

PubMed

Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

2015-01-01

An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.
Geometry-based ensembles: toward a structural characterization of the classification boundary.

PubMed

Pujol, Oriol; Masip, David

2009-06-01

This paper introduces a novel binary discriminative learning technique based on the approximation of the nonlinear decision boundary by a piecewise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points-points that belong to the optimal boundary under a certain notion of robustness. Based on these points, a set of locally robust linear classifiers is defined and assembled by means of a Tikhonov regularized optimization procedure in an additive model to create a final lambda-smooth decision rule. As a result, a very simple and robust classifier with a strong geometrical meaning and nonlinear behavior is obtained. The simplicity of the method allows its extension to cope with some of today's machine learning challenges, such as online learning, large-scale learning or parallelization, with linear computational complexity. We validate our approach on the UCI database, comparing with several state-of-the-art classification techniques. Finally, we apply our technique in online and large-scale scenarios and in six real-life computer vision and pattern recognition problems: gender recognition based on face images, intravascular ultrasound tissue classification, speed traffic sign detection, Chagas' disease myocardial damage severity detection, old musical scores clef classification, and action recognition using 3D accelerometer data from a wearable device. The results are promising and this paper opens a line of research that deserves further attention.
Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.

PubMed

Nourani, Esmaeil; Khunjush, Farshad; Durmuş, Saliha

2016-05-24

Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular pathogen-host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing demand because of scarcity of experimental data. Prediction of protein-protein interactions (PPIs) within PHI systems can be formulated as a classification problem, which requires the knowledge of non-interacting protein pairs. This is a restricting requirement since we lack datasets that report non-interacting protein pairs. In this study, we formulated the "computational prediction of PHI data" problem using kernel embedding of heterogeneous data. This eliminates the abovementioned requirement and enables us to predict new interactions without randomly labeling protein pairs as non-interacting. Domain-domain associations are used to filter the predicted results leading to 175 novel PHIs between 170 human proteins and 105 viral proteins. To compare our results with the state-of-the-art studies that use a binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving operating curve) results in comparison with state-of-the-art methods.
Probability machines: consistent probability estimation using nonparametric learning machines.

PubMed

Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

2012-01-01

Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
Median Robust Extended Local Binary Pattern for Texture Classification.

PubMed

Liu, Li; Lao, Songyang; Fieguth, Paul W; Guo, Yulan; Wang, Xiaogang; Pietikäinen, Matti

2016-03-01

Local binary patterns (LBP) are considered among the most computationally efficient high-performance texture features. However, the LBP method is very sensitive to image noise and is unable to capture macrostructure information. To best address these disadvantages, in this paper, we introduce a novel descriptor for texture classification, the median robust extended LBP (MRELBP). Different from the traditional LBP and many LBP variants, MRELBP compares regional image medians rather than raw image intensities. A multiscale LBP type descriptor is computed by efficiently comparing image medians over a novel sampling scheme, which can capture both microstructure and macrostructure texture information. A comprehensive evaluation on benchmark data sets reveals MRELBP's high performance-robust to gray scale variations, rotation changes and noise-but at a low computational cost. MRELBP produces the best classification scores of 99.82%, 99.38%, and 99.77% on three popular Outex test suites. More importantly, MRELBP is shown to be highly robust to image noise, including Gaussian noise, Gaussian blur, salt-and-pepper noise, and random pixel corruption.
Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection.

PubMed

Cheng, Tiejun; Li, Qingliang; Wang, Yanli; Bryant, Stephen H

2011-02-28

Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46 000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.
SVM-based tree-type neural networks as a critic in adaptive critic designs for control.

PubMed

Deb, Alok Kanti; Jayadeva; Gopal, Madan; Chandra, Suresh

2007-07-01

In this paper, we use the approach of adaptive critic design (ACD) for control, specifically, the action-dependent heuristic dynamic programming (ADHDP) method. A least squares support vector machine (SVM) regressor has been used for generating the control actions, while an SVM-based tree-type neural network (NN) is used as the critic. After a failure occurs, the critic and action are retrained in tandem using the failure data. Failure data is binary classification data, where the number of failure states are very few as compared to the number of no-failure states. The difficulty of conventional multilayer feedforward NNs in learning this type of classification data has been overcome by using the SVM-based tree-type NN, which due to its feature to add neurons to learn misclassified data, has the capability to learn any binary classification data without a priori choice of the number of neurons or the structure of the network. The capability of the trained controller to handle unforeseen situations is demonstrated.
Using Fractal and Local Binary Pattern Features for Classification of ECOG Motor Imagery Tasks Obtained from the Right Brain Hemisphere.

PubMed

Xu, Fangzhou; Zhou, Weidong; Zhen, Yilin; Yuan, Qi; Wu, Qi

2016-09-01

The feature extraction and classification of brain signal is very significant in brain-computer interface (BCI). In this study, we describe an algorithm for motor imagery (MI) classification of electrocorticogram (ECoG)-based BCI. The proposed approach employs multi-resolution fractal measures and local binary pattern (LBP) operators to form a combined feature for characterizing an ECoG epoch recording from the right hemisphere of the brain. A classifier is trained by using the gradient boosting in conjunction with ordinary least squares (OLS) method. The fractal intercept, lacunarity and LBP features are extracted to classify imagined movements of either the left small finger or the tongue. Experimental results on dataset I of BCI competition III demonstrate the superior performance of our method. The cross-validation accuracy and accuracy is 90.6% and 95%, respectively. Furthermore, the low computational burden of this method makes it a promising candidate for real-time BCI systems.
Generation and Termination of Binary Decision Trees for Nonparametric Multiclass Classification.

DTIC Science & Technology

1984-10-01

O M coF=F;; UMBER2. GOVT ACCE5SION NO.1 3 . REC,PINS :A7AL:,G NUMBER ( ’eneration and Terminat_,on :)f Binary D-ecision jC j ik; Trees for Nonnararetrc...1-I . v)IAMO 0~I4 EDvt" O F I 00 . 3 15I OR%.OL.ETL - S-S OCTOBER 1984 LIDS-P-1411 GENERATION AND TERMINATION OF BINARY DECISION TREES FOR...minimizes the Bayes risk. Tree generation and termination are based on the training and test samples, respectively. 0 0 0/ 6 0¢ A 3 I. Introduction We state
Fast optimization of binary clusters using a novel dynamic lattice searching method.

PubMed

Wu, Xia; Cheng, Wen

2014-09-28

Global optimization of binary clusters has been a difficult task despite of much effort and many efficient methods. Directing toward two types of elements (i.e., homotop problem) in binary clusters, two classes of virtual dynamic lattices are constructed and a modified dynamic lattice searching (DLS) method, i.e., binary DLS (BDLS) method, is developed. However, it was found that the BDLS can only be utilized for the optimization of binary clusters with small sizes because homotop problem is hard to be solved without atomic exchange operation. Therefore, the iterated local search (ILS) method is adopted to solve homotop problem and an efficient method based on the BDLS method and ILS, named as BDLS-ILS, is presented for global optimization of binary clusters. In order to assess the efficiency of the proposed method, binary Lennard-Jones clusters with up to 100 atoms are investigated. Results show that the method is proved to be efficient. Furthermore, the BDLS-ILS method is also adopted to study the geometrical structures of (AuPd)79 clusters with DFT-fit parameters of Gupta potential.
Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data.

PubMed

Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin

2016-01-01

Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.

Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

PubMed Central

Aorigele; Zeng, Weiming; Hong, Xiaomin

2016-01-01

Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323
Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy.

PubMed

Welikala, R A; Fraz, M M; Dehmeshki, J; Hoppe, A; Tah, V; Mann, S; Williamson, T H; Barman, S A

2015-07-01

Proliferative diabetic retinopathy (PDR) is a condition that carries a high risk of severe visual impairment. The hallmark of PDR is the growth of abnormal new vessels. In this paper, an automated method for the detection of new vessels from retinal images is presented. This method is based on a dual classification approach. Two vessel segmentation approaches are applied to create two separate binary vessel map which each hold vital information. Local morphology features are measured from each binary vessel map to produce two separate 4-D feature vectors. Independent classification is performed for each feature vector using a support vector machine (SVM) classifier. The system then combines these individual outcomes to produce a final decision. This is followed by the creation of additional features to generate 21-D feature vectors, which feed into a genetic algorithm based feature selection approach with the objective of finding feature subsets that improve the performance of the classification. Sensitivity and specificity results using a dataset of 60 images are 0.9138 and 0.9600, respectively, on a per patch basis and 1.000 and 0.975, respectively, on a per image basis. Copyright © 2015 Elsevier Ltd. All rights reserved.
Structural classification of proteins using texture descriptors extracted from the cellular automata image.

PubMed

Kavianpour, Hamidreza; Vasighi, Mahdi

2017-02-01

Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.
Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature.

PubMed

Wang, Xinglong; Rak, Rafal; Restificar, Angelo; Nobata, Chikashi; Rupp, C J; Batista-Navarro, Riza Theresa B; Nawaz, Raheel; Ananiadou, Sophia

2011-10-03

The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest's Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task's development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew's Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.
Decision Manifold Approximation for Physics-Based Simulations

NASA Technical Reports Server (NTRS)

Wong, Jay Ming; Samareh, Jamshid A.

2016-01-01

With the recent surge of success in big-data driven deep learning problems, many of these frameworks focus on the notion of architecture design and utilizing massive databases. However, in some scenarios massive sets of data may be difficult, and in some cases infeasible, to acquire. In this paper we discuss a trajectory-based framework that quickly learns the underlying decision manifold of binary simulation classifications while judiciously selecting exploratory target states to minimize the number of required simulations. Furthermore, we draw particular attention to the simulation prediction application idealized to the case where failures in simulations can be predicted and avoided, providing machine intelligence to novice analysts. We demonstrate this framework in various forms of simulations and discuss its efficacy.
Toward more intuitive brain-computer interfacing: classification of binary covert intentions using functional near-infrared spectroscopy.

PubMed

Hwang, Han-Jeong; Choi, Han; Kim, Jeong-Youn; Chang, Won-Du; Kim, Do-Won; Kim, Kiwoong; Jo, Sungho; Im, Chang-Hwan

2016-09-01

In traditional brain-computer interface (BCI) studies, binary communication systems have generally been implemented using two mental tasks arbitrarily assigned to “yes” or “no” intentions (e.g., mental arithmetic calculation for “yes”). A recent pilot study performed with one paralyzed patient showed the possibility of a more intuitive paradigm for binary BCI communications, in which the patient’s internal yes/no intentions were directly decoded from functional near-infrared spectroscopy (fNIRS). We investigated whether such an “fNIRS-based direct intention decoding” paradigm can be reliably used for practical BCI communications. Eight healthy subjects participated in this study, and each participant was administered 70 disjunctive questions. Brain hemodynamic responses were recorded using a multichannel fNIRS device, while the participants were internally expressing “yes” or “no” intentions to each question. Different feature types, feature numbers, and time window sizes were tested to investigate optimal conditions for classifying the internal binary intentions. About 75% of the answers were correctly classified when the individual best feature set was employed (75.89% ± 1.39 and 74.08% ± 2.87 for oxygenated and deoxygenated hemoglobin responses, respectively), which was significantly higher than a random chance level (68.57% for p < 0.001). The kurtosis feature showed the highest mean classification accuracy among all feature types. The grand-averaged hemodynamic responses showed that wide brain regions are associated with the processing of binary implicit intentions. Our experimental results demonstrated that direct decoding of internal binary intention has the potential to be used for implementing more intuitive and user-friendly communication systems for patients with motor disabilities.
Efficient algorithms for dilated mappings of binary trees

NASA Technical Reports Server (NTRS)

Iqbal, M. Ashraf

1990-01-01

The problem is addressed to find a 1-1 mapping of the vertices of a binary tree onto those of a target binary tree such that the son of a node on the first binary tree is mapped onto a descendent of the image of that node in the second binary tree. There are two natural measures of the cost of this mapping, namely the dilation cost, i.e., the maximum distance in the target binary tree between the images of vertices that are adjacent in the original tree. The other measure, expansion cost, is defined as the number of extra nodes/edges to be added to the target binary tree in order to ensure a 1-1 mapping. An efficient algorithm to find a mapping of one binary tree onto another is described. It is shown that it is possible to minimize one cost of mapping at the expense of the other. This problem arises when designing pipelined arithmetic logic units (ALU) for special purpose computers. The pipeline is composed of ALU chips connected in the form of a binary tree. The operands to the pipeline can be supplied to the leaf nodes of the binary tree which then process and pass the results up to their parents. The final result is available at the root. As each new application may require a distinct nesting of operations, it is useful to be able to find a good mapping of a new binary tree over existing ALU tree. Another problem arises if every distinct required binary tree is known beforehand. Here it is useful to hardwire the pipeline in the form of a minimal supertree that contains all required binary trees.
Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.

2013-09-01

This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less
A Ternary Brain-Computer Interface Based on Single-Trial Readiness Potentials of Self-initiated Fine Movements: A Diversified Classification Scheme

PubMed Central

Abou Zeid, Elias; Rezazadeh Sereshkeh, Alborz; Schultz, Benjamin; Chau, Tom

2017-01-01

In recent years, the readiness potential (RP), a type of pre-movement neural activity, has been investigated for asynchronous electroencephalogram (EEG)-based brain-computer interfaces (BCIs). Since the RP is attenuated for involuntary movements, a BCI driven by RP alone could facilitate intentional control amid a plethora of unintentional movements. Previous studies have mainly attempted binary single-trial classification of RP. An RP-based BCI with three or more states would expand the options for functional control. Here, we propose a ternary BCI based on single-trial RPs. This BCI classifies amongst an idle state, a left hand and a right hand self-initiated fine movement. A pipeline of spatio-temporal filtering with per participant parameter optimization was used for feature extraction. The ternary classification was decomposed into binary classifications using a decision-directed acyclic graph (DDAG). For each class pair in the DDAG structure, an ordered diversified classifier system (ODCS-DDAG) was used to select the best among various classification algorithms or to combine the results of different classification algorithms. Using EEG data from 14 participants performing self-initiated left or right key presses, punctuated with rest periods, we compared the performance of ODCS-DDAG to a ternary classifier and four popular multiclass decomposition methods using only a single classification algorithm. ODCS-DDAG had the highest performance (0.769 Cohen's Kappa score) and was significantly better than the ternary classifier and two of the four multiclass decomposition methods. Our work supports further study of RP-based BCI for intuitive asynchronous environmental control or augmentative communication. PMID:28596725
Classification of brain tumours using short echo time 1H MR spectra

NASA Astrophysics Data System (ADS)

Devos, A.; Lukas, L.; Suykens, J. A. K.; Vanhamme, L.; Tate, A. R.; Howe, F. A.; Majós, C.; Moreno-Torres, A.; van der Graaf, M.; Arús, C.; Van Huffel, S.

2004-09-01

The purpose was to objectively compare the application of several techniques and the use of several input features for brain tumour classification using Magnetic Resonance Spectroscopy (MRS). Short echo time 1H MRS signals from patients with glioblastomas ( n = 87), meningiomas ( n = 57), metastases ( n = 39), and astrocytomas grade II ( n = 22) were provided by six centres in the European Union funded INTERPRET project. Linear discriminant analysis, least squares support vector machines (LS-SVM) with a linear kernel and LS-SVM with radial basis function kernel were applied and evaluated over 100 stratified random splittings of the dataset into training and test sets. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of binary classifiers, while the percentage of correct classifications was used to evaluate the multiclass classifiers. The influence of several factors on the classification performance has been tested: L2- vs. water normalization, magnitude vs. real spectra and baseline correction. The effect of input feature reduction was also investigated by using only the selected frequency regions containing the most discriminatory information, and peak integrated values. Using L2-normalized complete spectra the automated binary classifiers reached a mean test AUC of more than 0.95, except for glioblastomas vs. metastases. Similar results were obtained for all classification techniques and input features except for water normalized spectra, where classification performance was lower. This indicates that data acquisition and processing can be simplified for classification purposes, excluding the need for separate water signal acquisition, baseline correction or phasing.
Comprehensive decision tree models in bioinformatics.

PubMed

Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

2012-01-01

Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Comprehensive Decision Tree Models in Bioinformatics

PubMed Central

Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

2012-01-01

Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449
Satellite Image Classification of Building Damages Using Airborne and Satellite Image Samples in a Deep Learning Approach

NASA Astrophysics Data System (ADS)

Duarte, D.; Nex, F.; Kerle, N.; Vosselman, G.

2018-05-01

The localization and detailed assessment of damaged buildings after a disastrous event is of utmost importance to guide response operations, recovery tasks or for insurance purposes. Several remote sensing platforms and sensors are currently used for the manual detection of building damages. However, there is an overall interest in the use of automated methods to perform this task, regardless of the used platform. Owing to its synoptic coverage and predictable availability, satellite imagery is currently used as input for the identification of building damages by the International Charter, as well as the Copernicus Emergency Management Service for the production of damage grading and reference maps. Recently proposed methods to perform image classification of building damages rely on convolutional neural networks (CNN). These are usually trained with only satellite image samples in a binary classification problem, however the number of samples derived from these images is often limited, affecting the quality of the classification results. The use of up/down-sampling image samples during the training of a CNN, has demonstrated to improve several image recognition tasks in remote sensing. However, it is currently unclear if this multi resolution information can also be captured from images with different spatial resolutions like satellite and airborne imagery (from both manned and unmanned platforms). In this paper, a CNN framework using residual connections and dilated convolutions is used considering both manned and unmanned aerial image samples to perform the satellite image classification of building damages. Three network configurations, trained with multi-resolution image samples are compared against two benchmark networks where only satellite image samples are used. Combining feature maps generated from airborne and satellite image samples, and refining these using only the satellite image samples, improved nearly 4 % the overall satellite image classification of building damages.
3D multi-view convolutional neural networks for lung nodule classification

PubMed Central

Kang, Guixia; Hou, Beibei; Zhang, Ningbo

2017-01-01

The 3D convolutional neural network (CNN) is able to make full use of the spatial 3D context information of lung nodules, and the multi-view strategy has been shown to be useful for improving the performance of 2D CNN in classifying lung nodules. In this paper, we explore the classification of lung nodules using the 3D multi-view convolutional neural networks (MV-CNN) with both chain architecture and directed acyclic graph architecture, including 3D Inception and 3D Inception-ResNet. All networks employ the multi-view-one-network strategy. We conduct a binary classification (benign and malignant) and a ternary classification (benign, primary malignant and metastatic malignant) on Computed Tomography (CT) images from Lung Image Database Consortium and Image Database Resource Initiative database (LIDC-IDRI). All results are obtained via 10-fold cross validation. As regards the MV-CNN with chain architecture, results show that the performance of 3D MV-CNN surpasses that of 2D MV-CNN by a significant margin. Finally, a 3D Inception network achieved an error rate of 4.59% for the binary classification and 7.70% for the ternary classification, both of which represent superior results for the corresponding task. We compare the multi-view-one-network strategy with the one-view-one-network strategy. The results reveal that the multi-view-one-network strategy can achieve a lower error rate than the one-view-one-network strategy. PMID:29145492
Automatic Author Profiling of Online Chat Logs

DTIC Science & Technology

2007-03-01

CLASSIFICATION WITH PRIOR ..........91 1. All Test Data ................................91 2. Extracted Test Data: Teens and 20s ...........92 3...Extracted Test Data: Teens and 30s ...........92 4. Extracted Test Data: Teens and 40s ...........93 5. Extracted Test Data: Teens and 50s ...........93 6...Data ................................97 C. AGE: BINARY CLASSIFICATION WITH PRIOR .............98 1. Extracted Test Data: Teens and 20s ...........98 2
A Proposed Methodology to Classify Frontier Capital Markets

DTIC Science & Technology

2011-07-31

but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal
A Proposed Methodology to Classify Frontier Capital Markets

DTIC Science & Technology

2011-07-31

out of charity, but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project...identification, and machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ...Support Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F
Complex extreme learning machine applications in terahertz pulsed signals feature sets.

PubMed

Yin, X-X; Hadjiloucas, S; Zhang, Y

2014-11-01

This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Enhancing and Archiving the APS Catalog of the POSS I

NASA Technical Reports Server (NTRS)

Humphreys, Roberta M.

2003-01-01

We have worked on two different projects: 1) Archiving the APS Catalog of the POSS I for distribution to NASA's NED at IPAC, SIMBAD in France, and individual astronomers and 2) The automated morphological classification of galaxies. We have completed archiving the Catalog into easily readable binary files. The database together with the software to read it has been distributed on DVD's to the national and international data centers and to individual astronomers. The archived Catalog contains more than 89 million objects in 632 fields in the first epoch Palomar Observatory Sky Survey. Additional image parameters not available in the original on-line version are also included in the archived version. The archived Catalog is also available and can be queried at the APS web site (URL: http://aps.umn.edu) which has been improved with a much faster and more efficient querying system. The Catalog can be downloaded as binary datafiles with the source code for reading it. It is also being integrated into the SkyQuery system which includes the Sloan Digital Sky Survey, 2MASS, and the FIRST radio sky survey. We experimented with different classification algorithms to automate the morphological classification of galaxies. This is an especially difficult problem because there are not only a large number of attributes or parameters and measurement uncertainties, but also the added complication of human disagreement about the adopted types. To solve this problem we used 837 galaxy images from nine POSS I fields at the North Galactic Pole classified by two independent astronomers for which they agree on the morphological types. The initial goal was to separate the galaxies into the three broad classes relevant to issues of large scale structure and galaxy formation and evolution: early (ellipticals and lenticulars), spirals, and late (irregulars) with an accuracy or success rate that rivals the best astronomer classifiers. We also needed to identify a set of parameters derived from the digitized images that separate the galaxies by type. The human eye can easily recognize complicated patterns in images such as spiral arms which can be spotty, blotchy affairs that are difficult for automated techniques. A galaxy image can potentially be described by hundreds of parameters, all of which may have some relation to the morphological type. In the set of initial experiments we used 624 such parameters, in two colors, blue and red. These parameters include the surface brightness and color measured at different radii, ratios of these parameters at different radii, concentration indices, Fourier transforms and wavelet decomposition coefficients. We experimented with three different classes of classification algorithms; decision trees, k-nearest neighbors, and support vector machines (SVM). A range of experiments were conducted and we eventually narrowed the parameters to 23 selected parameters. SVM consistently outperformed the other algorithms with both sets of features. By combining the results from the different algorithms in a weighted scheme we achieved an overall classification success of 86%.
Latent feature representation with stacked auto-encoder for AD/MCI diagnosis

PubMed Central

Lee, Seong-Whan

2014-01-01

Recently, there have been great interests for computer-aided diagnosis of Alzheimer’s disease (AD) and its prodromal stage, mild cognitive impairment (MCI). Unlike the previous methods that considered simple low-level features such as gray matter tissue volumes from MRI, and mean signal intensities from PET, in this paper, we propose a deep learning-based latent feature representation with a stacked auto-encoder (SAE). We believe that there exist latent non-linear complicated patterns inherent in the low-level features such as relations among features. Combining the latent information with the original features helps build a robust model in AD/MCI classification, with high diagnostic accuracy. Furthermore, thanks to the unsupervised characteristic of the pre-training in deep learning, we can benefit from the target-unrelated samples to initialize parameters of SAE, thus finding optimal parameters in fine-tuning with the target-related samples, and further enhancing the classification performances across four binary classification problems: AD vs. healthy normal control (HC), MCI vs. HC, AD vs. MCI, and MCI converter (MCI-C) vs. MCI non-converter (MCI-NC). In our experiments on ADNI dataset, we validated the effectiveness of the proposed method, showing the accuracies of 98.8, 90.7, 83.7, and 83.3 % for AD/HC, MCI/HC, AD/MCI, and MCI-C/MCI-NC classification, respectively. We believe that deep learning can shed new light on the neuroimaging data analysis, and our work presented the applicability of this method to brain disease diagnosis. PMID:24363140

Artistic image analysis using graph-based learning approaches.

PubMed

Carneiro, Gustavo

2013-08-01

We introduce a new methodology for the problem of artistic image analysis, which among other tasks, involves the automatic identification of visual classes present in an art work. In this paper, we advocate the idea that artistic image analysis must explore a graph that captures the network of artistic influences by computing the similarities in terms of appearance and manual annotation. One of the novelties of our methodology is the proposed formulation that is a principled way of combining these two similarities in a single graph. Using this graph, we show that an efficient random walk algorithm based on an inverted label propagation formulation produces more accurate annotation and retrieval results compared with the following baseline algorithms: bag of visual words, label propagation, matrix completion, and structural learning. We also show that the proposed approach leads to a more efficient inference and training procedures. This experiment is run on a database containing 988 artistic images (with 49 visual classification problems divided into a multiclass problem with 27 classes and 48 binary problems), where we show the inference and training running times, and quantitative comparisons with respect to several retrieval and annotation performance measures.
Multi-class Mode of Action Classification of Toxic Compounds Using Logic Based Kernel Methods.

PubMed

Lodhi, Huma; Muggleton, Stephen; Sternberg, Mike J E

2010-09-17

Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an Inductive Logic Programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class Inductive Logic Programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.

PubMed

Vishnyakova, Dina; Pasche, Emilie; Ruch, Patrick

2012-01-01

We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.
Benign-malignant mass classification in mammogram using edge weighted local texture features

NASA Astrophysics Data System (ADS)

Rabidas, Rinku; Midya, Abhishek; Sadhu, Anup; Chakraborty, Jayasree

2016-03-01

This paper introduces novel Discriminative Robust Local Binary Pattern (DRLBP) and Discriminative Robust Local Ternary Pattern (DRLTP) for the classification of mammographic masses as benign or malignant. Mass is one of the common, however, challenging evidence of breast cancer in mammography and diagnosis of masses is a difficult task. Since DRLBP and DRLTP overcome the drawbacks of Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) by discriminating a brighter object against the dark background and vice-versa, in addition to the preservation of the edge information along with the texture information, several edge-preserving texture features are extracted, in this study, from DRLBP and DRLTP. Finally, a Fisher Linear Discriminant Analysis method is incorporated with discriminating features, selected by stepwise logistic regression method, for the classification of benign and malignant masses. The performance characteristics of DRLBP and DRLTP features are evaluated using a ten-fold cross-validation technique with 58 masses from the mini-MIAS database, and the best result is observed with DRLBP having an area under the receiver operating characteristic curve of 0.982.
Multiple confidence estimates as indices of eyewitness memory.

PubMed

Sauer, James D; Brewer, Neil; Weber, Nathan

2008-08-01

Eyewitness identification decisions are vulnerable to various influences on witnesses' decision criteria that contribute to false identifications of innocent suspects and failures to choose perpetrators. An alternative procedure using confidence estimates to assess the degree of match between novel and previously viewed faces was investigated. Classification algorithms were applied to participants' confidence data to determine when a confidence value or pattern of confidence values indicated a positive response. Experiment 1 compared confidence group classification accuracy with a binary decision control group's accuracy on a standard old-new face recognition task and found superior accuracy for the confidence group for target-absent trials but not for target-present trials. Experiment 2 used a face mini-lineup task and found reduced target-present accuracy offset by large gains in target-absent accuracy. Using a standard lineup paradigm, Experiments 3 and 4 also found improved classification accuracy for target-absent lineups and, with a more sophisticated algorithm, for target-present lineups. This demonstrates the accessibility of evidence for recognition memory decisions and points to a more sensitive index of memory quality than is afforded by binary decisions.
The effect of class imbalance on case selection for case-based classifiers: An empirical study in the context of medical decision support

PubMed Central

Malof, Jordan M.; Mazurowski, Maciej A.; Tourassi, Georgia D.

2013-01-01

Case selection is a useful approach for increasing the efficiency and performance of case-based classifiers. Multiple techniques have been designed to perform case selection. This paper empirically investigates how class imbalance in the available set of training cases can impact the performance of the resulting classifier as well as properties of the selected set. In this study, the experiments are performed using a dataset for the problem of detecting breast masses in screening mammograms. The classification problem was binary and we used a k-nearest neighbor classifier. The classifier’s performance was evaluated using the Receiver Operating Characteristic (ROC) area under the curve (AUC) measure. The experimental results indicate that although class imbalance reduces the performance of the derived classifier and the effectiveness of selection at improving overall classifier performance, case selection can still be beneficial, regardless of the level of class imbalance. PMID:21820273
BOREAS TE-18 Landsat TM Maximum Likelihood Classification Image of the NSA

NASA Technical Reports Server (NTRS)

Hall, Forrest G. (Editor); Knapp, David

2000-01-01

The BOREAS TE-18 team focused its efforts on using remotely sensed data to characterize the successional and disturbance dynamics of the boreal forest for use in carbon modeling. The objective of this classification is to provide the BOREAS investigators with a data product that characterizes the land cover of the NSA. A Landsat-5 TM image from 20-Aug-1988 was used to derive this classification. A standard supervised maximum likelihood classification approach was used to produce this classification. The data are provided in a binary image format file. The data files are available on a CD-ROM (see document number 20010000884), or from the Oak Ridge National Laboratory (ORNL) Distributed Activity Archive Center (DAAC).
Beyond the Binary: Dexterous Teaching and Knowing in Mathematics Education

ERIC Educational Resources Information Center

Adam, Raoul; Chigeza, Philemon

2015-01-01

This paper identifies binary oppositions in the discourse of mathematics education and introduces a binary-epistemic model for (re)conceptualising these oppositions and the epistemic-pedagogic problems they represent. The model is attentive to the contextual relationships between pedagogically relevant binaries (e.g., traditional/progressive,…
A fast image retrieval method based on SVM and imbalanced samples in filtering multimedia message spam

NASA Astrophysics Data System (ADS)

Chen, Zhang; Peng, Zhenming; Peng, Lingbing; Liao, Dongyi; He, Xin

2011-11-01

With the swift and violent development of the Multimedia Messaging Service (MMS), it becomes an urgent task to filter the Multimedia Message (MM) spam effectively in real-time. For the fact that most MMs contain images or videos, a method based on retrieving images is given in this paper for filtering MM spam. The detection method used in this paper is a combination of skin-color detection, texture detection, and face detection, and the classifier for this imbalanced problem is a very fast multi-classification combining Support vector machine (SVM) with unilateral binary decision tree. The experiments on 3 test sets show that the proposed method is effective, with the interception rate up to 60% and the average detection time for each image less than 1 second.
A Scatter-Based Prototype Framework and Multi-Class Extension of Support Vector Machines

PubMed Central

Jenssen, Robert; Kloft, Marius; Zien, Alexander; Sonnenburg, Sören; Müller, Klaus-Robert

2012-01-01

We provide a novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean. As a key contribution, we extend this framework to multiple classes, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables. This enables us to implement computationally efficient solvers based on sequential minimal and chunking optimization. As a further contribution, the primal problem formulation is developed in terms of regularized risk minimization and the hinge loss, revealing the score function to be used in the actual classification of test patterns. We investigate Scatter SVM properties related to generalization ability, computational efficiency, sparsity and sensitivity maps, and report promising results. PMID:23118845
High-Throughput Classification of Radiographs Using Deep Convolutional Neural Networks.

PubMed

Rajkomar, Alvin; Lingam, Sneha; Taylor, Andrew G; Blum, Michael; Mongan, John

2017-02-01

The study aimed to determine if computer vision techniques rooted in deep learning can use a small set of radiographs to perform clinically relevant image classification with high fidelity. One thousand eight hundred eighty-five chest radiographs on 909 patients obtained between January 2013 and July 2015 at our institution were retrieved and anonymized. The source images were manually annotated as frontal or lateral and randomly divided into training, validation, and test sets. Training and validation sets were augmented to over 150,000 images using standard image manipulations. We then pre-trained a series of deep convolutional networks based on the open-source GoogLeNet with various transformations of the open-source ImageNet (non-radiology) images. These trained networks were then fine-tuned using the original and augmented radiology images. The model with highest validation accuracy was applied to our institutional test set and a publicly available set. Accuracy was assessed by using the Youden Index to set a binary cutoff for frontal or lateral classification. This retrospective study was IRB approved prior to initiation. A network pre-trained on 1.2 million greyscale ImageNet images and fine-tuned on augmented radiographs was chosen. The binary classification method correctly classified 100 % (95 % CI 99.73-100 %) of both our test set and the publicly available images. Classification was rapid, at 38 images per second. A deep convolutional neural network created using non-radiological images, and an augmented set of radiographs is effective in highly accurate classification of chest radiograph view type and is a feasible, rapid method for high-throughput annotation.
Drawing a baseline in aesthetic quality assessment

NASA Astrophysics Data System (ADS)

Rubio, Fernando; Flores, M. Julia; Puerta, Jose M.

2018-04-01

Aesthetic classification of images is an inherently subjective task. There does not exist a validated collection of images/photographs labeled as having good or bad quality from experts. Nowadays, the closest approximation to that is to use databases of photos where a group of users rate each image. Hence, there is not a unique good/bad label but a rating distribution given by users voting. Due to this peculiarity, it is not possible to state the problem of binary aesthetic supervised classification in such a direct mode as other Computer Vision tasks. Recent literature follows an approach where researchers utilize the average rates from the users for each image, and they establish an arbitrary threshold to determine their class or label. In this way, images above the threshold are considered of good quality, while images below the threshold are seen as bad quality. This paper analyzes current literature, and it reviews those attributes able to represent an image, differentiating into three families: specific, general and deep features. Among those which have been proved more competitive, we have selected a representative subset, being our main goal to establish a clear experimental framework. Finally, once features were selected, we have used them for the full AVA dataset. We have to remark that to perform validation we report not only accuracy values, which is not that informative in this case, but also, metrics able to evaluate classification power within imbalanced datasets. We have conducted a series of experiments so that distinct well-known classifiers are learned from data. Like that, this paper provides what we could consider valuable and valid baseline results for the given problem.
Iterative quantization: a Procrustean approach to learning binary codes for large-scale image retrieval.

PubMed

Gong, Yunchao; Lazebnik, Svetlana; Gordo, Albert; Perronnin, Florent

2013-12-01

This paper addresses the problem of learning similarity-preserving binary codes for efficient similarity search in large-scale image collections. We formulate this problem in terms of finding a rotation of zero-centered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube, and propose a simple and efficient alternating minimization algorithm to accomplish this task. This algorithm, dubbed iterative quantization (ITQ), has connections to multiclass spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). The resulting binary codes significantly outperform several other state-of-the-art methods. We also show that further performance improvements can result from transforming the data with a nonlinear kernel mapping prior to PCA or CCA. Finally, we demonstrate an application of ITQ to learning binary attributes or "classemes" on the ImageNet data set.
Application of local binary pattern and human visual Fibonacci texture features for classification different medical images

NASA Astrophysics Data System (ADS)

Sanghavi, Foram; Agaian, Sos

2017-05-01

The goal of this paper is to (a) test the nuclei based Computer Aided Cancer Detection system using Human Visual based system on the histopathology images and (b) Compare the results of the proposed system with the Local Binary Pattern and modified Fibonacci -p pattern systems. The system performance is evaluated using different parameters such as accuracy, specificity, sensitivity, positive predictive value, and negative predictive value on 251 prostate histopathology images. The accuracy of 96.69% was observed for cancer detection using the proposed human visual based system compared to 87.42% and 94.70% observed for Local Binary patterns and the modified Fibonacci p patterns.
Discriminative Learning of Receptive Fields from Responses to Non-Gaussian Stimulus Ensembles

PubMed Central

Meyer, Arne F.; Diepenbrock, Jan-Philipp; Happel, Max F. K.; Ohl, Frank W.; Anemüller, Jörn

2014-01-01

Analysis of sensory neurons' processing characteristics requires simultaneous measurement of presented stimuli and concurrent spike responses. The functional transformation from high-dimensional stimulus space to the binary space of spike and non-spike responses is commonly described with linear-nonlinear models, whose linear filter component describes the neuron's receptive field. From a machine learning perspective, this corresponds to the binary classification problem of discriminating spike-eliciting from non-spike-eliciting stimulus examples. The classification-based receptive field (CbRF) estimation method proposed here adapts a linear large-margin classifier to optimally predict experimental stimulus-response data and subsequently interprets learned classifier weights as the neuron's receptive field filter. Computational learning theory provides a theoretical framework for learning from data and guarantees optimality in the sense that the risk of erroneously assigning a spike-eliciting stimulus example to the non-spike class (and vice versa) is minimized. Efficacy of the CbRF method is validated with simulations and for auditory spectro-temporal receptive field (STRF) estimation from experimental recordings in the auditory midbrain of Mongolian gerbils. Acoustic stimulation is performed with frequency-modulated tone complexes that mimic properties of natural stimuli, specifically non-Gaussian amplitude distribution and higher-order correlations. Results demonstrate that the proposed approach successfully identifies correct underlying STRFs, even in cases where second-order methods based on the spike-triggered average (STA) do not. Applied to small data samples, the method is shown to converge on smaller amounts of experimental recordings and with lower estimation variance than the generalized linear model and recent information theoretic methods. Thus, CbRF estimation may prove useful for investigation of neuronal processes in response to natural stimuli and in settings where rapid adaptation is induced by experimental design. PMID:24699631
Discriminative learning of receptive fields from responses to non-Gaussian stimulus ensembles.

PubMed

Meyer, Arne F; Diepenbrock, Jan-Philipp; Happel, Max F K; Ohl, Frank W; Anemüller, Jörn

2014-01-01

Analysis of sensory neurons' processing characteristics requires simultaneous measurement of presented stimuli and concurrent spike responses. The functional transformation from high-dimensional stimulus space to the binary space of spike and non-spike responses is commonly described with linear-nonlinear models, whose linear filter component describes the neuron's receptive field. From a machine learning perspective, this corresponds to the binary classification problem of discriminating spike-eliciting from non-spike-eliciting stimulus examples. The classification-based receptive field (CbRF) estimation method proposed here adapts a linear large-margin classifier to optimally predict experimental stimulus-response data and subsequently interprets learned classifier weights as the neuron's receptive field filter. Computational learning theory provides a theoretical framework for learning from data and guarantees optimality in the sense that the risk of erroneously assigning a spike-eliciting stimulus example to the non-spike class (and vice versa) is minimized. Efficacy of the CbRF method is validated with simulations and for auditory spectro-temporal receptive field (STRF) estimation from experimental recordings in the auditory midbrain of Mongolian gerbils. Acoustic stimulation is performed with frequency-modulated tone complexes that mimic properties of natural stimuli, specifically non-Gaussian amplitude distribution and higher-order correlations. Results demonstrate that the proposed approach successfully identifies correct underlying STRFs, even in cases where second-order methods based on the spike-triggered average (STA) do not. Applied to small data samples, the method is shown to converge on smaller amounts of experimental recordings and with lower estimation variance than the generalized linear model and recent information theoretic methods. Thus, CbRF estimation may prove useful for investigation of neuronal processes in response to natural stimuli and in settings where rapid adaptation is induced by experimental design.
Gene masking - a technique to improve accuracy for cancer classification with high dimensionality in microarray data.

PubMed

Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok

2016-12-05

High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.
A comparison of blood vessel features and local binary patterns for colorectal polyp classification

NASA Astrophysics Data System (ADS)

Gross, Sebastian; Stehle, Thomas; Behrens, Alexander; Auer, Roland; Aach, Til; Winograd, Ron; Trautwein, Christian; Tischendorf, Jens

2009-02-01

Colorectal cancer is the third leading cause of cancer deaths in the United States of America for both women and men. By means of early detection, the five year survival rate can be up to 90%. Polyps can to be grouped into three different classes: hyperplastic, adenomatous, and carcinomatous polyps. Hyperplastic polyps are benign and are not likely to develop into cancer. Adenomas, on the other hand, are known to grow into cancer (adenoma-carcinoma sequence). Carcinomas are fully developed cancers and can be easily distinguished from adenomas and hyperplastic polyps. A recent narrow band imaging (NBI) study by Tischendorf et al. has shown that hyperplastic polyps and adenomas can be discriminated by their blood vessel structure. We designed a computer-aided system for the differentiation between hyperplastic and adenomatous polyps. Our development aim is to provide the medical practitioner with an additional objective interpretation of the available image data as well as a confidence measure for the classification. We propose classification features calculated on the basis of the extracted blood vessel structure. We use the combined length of the detected blood vessels, the average perimeter of the vessels and their average gray level value. We achieve a successful classification rate of more than 90% on 102 polyps from our polyp data base. The classification results based on these features are compared to the results of Local Binary Patterns (LBP). The results indicate that the implemented features are superior to LBP.
The iron complex in high mass X-ray binaries

NASA Astrophysics Data System (ADS)

Giménez-García, A.; Torrejón, J. M.; Martínez-Núñez, S.; Rodes-Rocas, J. J.; Bernabéu, G.

2013-05-01

An X-ray binary system consists of a compact object (a white dwarf, a neutron star or a black hole) accreting material from an optical companion star. The spectral type of the optical component strongly affects the mass transfer to the compact object. This is the reason why X-ray binary systems are usually divided in High Mass X-ray Binaries (companion O or B type, denoted HMXB) and Low Mass X-ray Binaries (companion type A or later). The HMXB are divided depending on the partner's luminosity class in two main groups: the Supergiant X-ray Binaries (SGXB) and Be X-ray Binaries (BeXB). We introduce the spectral characterization of a sample of 9 High Mass X-ray Binaries in the iron complex (˜ 6-7 keV). This spectral range is a fundamental tool in the study of the surrounding material of these systems. The sources have been divided into three main groups according to their current standard classification: SGXB, BeXB and γ Cassiopeae-like. The purpose of this work is to look for qualitative patterns in the iron complex, around 6-7 keV, in order to discern between current different classes that make up the group of HMXB. We find significant spectral patterns for each of the sets, reflecting differences in accretion physics thereof.
Resonant dynamics of gravitationally bound pair of binaries: the case of 1:1 resonance

NASA Astrophysics Data System (ADS)

Breiter, Slawomir; Vokrouhlický, David

2018-04-01

The work presents a study of the 1:1 resonance case in a hierarchical quadruple stellar system of the 2+2 type. The resonance appears if orbital periods of both binaries are approximately equal. It is assumed that both periods are significantly shorter than the period of principal orbit of one binary with respect to the other. In these circumstances, the problem can be treated as three independent Kepler problems perturbed by mutual gravitational interactions. By means of canonical perturbation methods, the planar problem is reduced to a secular system with 1 degree of freedom involving a resonance angle (the difference of mean longitudes of the binaries) and its conjugate momentum (involving the ratio of orbital period in one binary to the period of principal orbit). The resonant model is supplemented with short periodic perturbations expressions, and verified by the comparison with numerical integration of the original equations of motion. Estimates of the binaries periods variations indicate that the effect is rather weak, but possibly detectible if it occurs in a moderately compact system. However, the analysis of resonance capture scenarios implies that the 1:1 resonance should be exceptional amongst the 2+2 quadruples.

Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure

PubMed Central

Berisha, Visar; Wisler, Alan; Hero, Alfred O.; Spanias, Andreas

2015-01-01

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks. PMID:26807014
The EB factory project. I. A fast, neural-net-based, general purpose light curve classifier optimized for eclipsing binaries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paegert, Martin; Stassun, Keivan G.; Burger, Dan M.

2014-08-01

We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together withmore » a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (∼60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.« less
Compact and Hybrid Feature Description for Building Extraction

NASA Astrophysics Data System (ADS)

Li, Z.; Liu, Y.; Hu, Y.; Li, P.; Ding, Y.

2017-05-01

Building extraction in aerial orthophotos is crucial for various applications. Currently, deep learning has been shown to be successful in addressing building extraction with high accuracy and high robustness. However, quite a large number of samples is required in training a classifier when using deep learning model. In order to realize accurate and semi-interactive labelling, the performance of feature description is crucial, as it has significant effect on the accuracy of classification. In this paper, we bring forward a compact and hybrid feature description method, in order to guarantees desirable classification accuracy of the corners on the building roof contours. The proposed descriptor is a hybrid description of an image patch constructed from 4 sets of binary intensity tests. Experiments show that benefiting from binary description and making full use of color channels, this descriptor is not only computationally frugal, but also accurate than SURF for building extraction.
Hyperspectral imaging for differentiation of foreign materials from pinto beans

NASA Astrophysics Data System (ADS)

Mehrubeoglu, Mehrube; Zemlan, Michael; Henry, Sam

2015-09-01

Food safety and quality in packaged products are paramount in the food processing industry. To ensure that packaged products are free of foreign materials, such as debris and pests, unwanted materials mixed with the targeted products must be detected before packaging. A portable hyperspectral imaging system in the visible-to-NIR range has been used to acquire hyperspectral data cubes from pinto beans that have been mixed with foreign matter. Bands and band ratios have been identified as effective features to develop a classification scheme for detection of foreign materials in pinto beans. A support vector machine has been implemented with a quadratic kernel to separate pinto beans and background (Class 1) from all other materials (Class 2) in each scene. After creating a binary classification map for the scene, further analysis of these binary images allows separation of false positives from true positives for proper removal action during packaging.
The Upper and Lower Bounds of the Prediction Accuracies of Ensemble Methods for Binary Classification

PubMed Central

Wang, Xueyi; Davidson, Nicholas J.

2011-01-01

Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed. PMID:21853162
Gravitational Wave (GW) Classification, Space GW Detection Sensitivities and AMIGO (Astrodynamical Middle-frequency Interferometric GW Observatory)

NASA Astrophysics Data System (ADS)

Ni, Wei-Tou

2018-01-01

After first reviewing the gravitational wave (GW) spectral classification. we discuss the sensitivities of GW detection in space aimed at low frequency band (100 nHz-100 mHz) and middle frequency band (100 mHz-10 Hz). The science goals are to detect GWs from (i) Supermassive Black Holes; (ii) Extreme-Mass-Ratio Black Hole Inspirals; (iii) Intermediate-Mass Black Holes; (iv) Galactic Compact Binaries; (v) Stellar-Size Black Hole Binaries; and (vi) Relic GW Background. The detector proposals have arm length ranging from 100 km to 1.35×109 km (9 AU) including (a) Solar orbiting detectors and (b) Earth orbiting detectors. We discuss especially the sensitivities in the frequency band 0.1-10 μHz and the middle frequency band (0.1 Hz-10 Hz). We propose and discuss AMIGO as an Astrodynamical Middlefrequency Interferometric GW Observatory.
Optimal threshold estimation for binary classifiers using game theory.

PubMed

Sanchez, Ignacio Enrique

2016-01-01

Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.
Hyperspectral feature mapping classification based on mathematical morphology

NASA Astrophysics Data System (ADS)

Liu, Chang; Li, Junwei; Wang, Guangping; Wu, Jingli

2016-03-01

This paper proposed a hyperspectral feature mapping classification algorithm based on mathematical morphology. Without the priori information such as spectral library etc., the spectral and spatial information can be used to realize the hyperspectral feature mapping classification. The mathematical morphological erosion and dilation operations are performed respectively to extract endmembers. The spectral feature mapping algorithm is used to carry on hyperspectral image classification. The hyperspectral image collected by AVIRIS is applied to evaluate the proposed algorithm. The proposed algorithm is compared with minimum Euclidean distance mapping algorithm, minimum Mahalanobis distance mapping algorithm, SAM algorithm and binary encoding mapping algorithm. From the results of the experiments, it is illuminated that the proposed algorithm's performance is better than that of the other algorithms under the same condition and has higher classification accuracy.
Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

PubMed Central

2011-01-01

Background The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. Results We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Conclusions Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance. PMID:22151769
Public challenge and endorsement of sex category ambiguity in online debate: 'The sooner people stop thinking that gender is a matter of choice the better'.

PubMed

Sweeting, Helen; Maycock, Matthew William; Walker, Laura; Hunt, Kate

2017-03-01

Despite academic feminist debate over several decades, the binary nature of sex as a (perhaps the) primary social classification is often taken for granted, as is the assumption that individuals can be unproblematically assigned a biological sex at birth. This article presents analysis of online debate on the BBC news website in November 2013, comprising 864 readers' responses to an article entitled 'Germany allows 'indeterminate' gender at birth'. It explores how discourse reflecting Western essentialist beliefs about people having one sex or 'the other' is maintained in debates conducted in this online public space. Comments were coded thematically and are presented under five sub-headings: overall evaluation of the German law; discussing and disputing statistics and 'facts'; binary categorisations; religion and politics; and 'conversations' and threads. Although for many the mapping of binary sex onto gender was unquestionable, this view was strongly disputed by commentators who questioned the meanings of 'natural' and 'normal', raised the possibility of removing societal binary male-female distinctions or saw maleness-femaleness as a continuum. While recognising that online commentators are anonymous and can control their self-presentation, this animated discussion suggests that social classifications as male or female, even if questioned, remain fundamental in public debate in the early 21 st century. © 2016 The Authors. Sociology of Health & Illness published by John Wiley & Sons Ltd on behalf of Foundation for SHIL.
Observations, Analysis, and Spectroscopic Classification of HO Piscium: A Bright Shallow-Contact Binary with G- and M-Type Components

NASA Astrophysics Data System (ADS)

Samec, Ronald G.; Smith, Paul M.; Robb, Russell; Faulkner, Danny R.; Van Hamme, W.

2012-07-01

We present a spectrum and a photometric analysis of the newly discovered, high-amplitude, solar-type, eclipsing binary HO Piscium. A spectroscopic identification, a period study, q-search, and a simultaneous UBVRc Ic light-curve solution are presented. The spectra and our photometric solution indicate that HO Psc is a W-type W UMa shallow-contact (fill-out ˜8%) binary system. The primary component has a G6V spectral type with an apparently precontact spectral type of M2V for the secondary component. The small fill-out indicates that the system has not yet achieved thermal contact and thus has recently come into physical contact. This may mean that this solar-type binary system has not attained its ˜0.4 mass ratio via a long period of magnetic braking, as would normally be assumed.
Combinatorial Optimization Algorithms for Dynamic Multiple Fault Diagnosis in Automotive and Aerospace Applications

NASA Astrophysics Data System (ADS)

Kodali, Anuradha

In this thesis, we develop dynamic multiple fault diagnosis (DMFD) algorithms to diagnose faults that are sporadic and coupled. Firstly, we formulate a coupled factorial hidden Markov model-based (CFHMM) framework to diagnose dependent faults occurring over time (dynamic case). Here, we implement a mixed memory Markov coupling model to determine the most likely sequence of (dependent) fault states, the one that best explains the observed test outcomes over time. An iterative Gauss-Seidel coordinate ascent optimization method is proposed for solving the problem. A soft Viterbi algorithm is also implemented within the framework for decoding dependent fault states over time. We demonstrate the algorithm on simulated and real-world systems with coupled faults; the results show that this approach improves the correct isolation rate as compared to the formulation where independent fault states are assumed. Secondly, we formulate a generalization of set-covering, termed dynamic set-covering (DSC), which involves a series of coupled set-covering problems over time. The objective of the DSC problem is to infer the most probable time sequence of a parsimonious set of failure sources that explains the observed test outcomes over time. The DSC problem is NP-hard and intractable due to the fault-test dependency matrix that couples the failed tests and faults via the constraint matrix, and the temporal dependence of failure sources over time. Here, the DSC problem is motivated from the viewpoint of a dynamic multiple fault diagnosis problem, but it has wide applications in operations research, for e.g., facility location problem. Thus, we also formulated the DSC problem in the context of a dynamically evolving facility location problem. Here, a facility can be opened, closed, or can be temporarily unavailable at any time for a given requirement of demand points. These activities are associated with costs or penalties, viz., phase-in or phase-out for the opening or closing of a facility, respectively. The set-covering matrix encapsulates the relationship among the rows (tests or demand points) and columns (faults or locations) of the system at each time. By relaxing the coupling constraints using Lagrange multipliers, the DSC problem can be decoupled into independent subproblems, one for each column. Each subproblem is solved using the Viterbi decoding algorithm, and a primal feasible solution is constructed by modifying the Viterbi solutions via a heuristic. The proposed Viterbi-Lagrangian relaxation algorithm (VLRA) provides a measure of suboptimality via an approximate duality gap. As a major practical extension of the above problem, we also consider the problem of diagnosing faults with delayed test outcomes, termed delay-dynamic set-covering (DDSC), and experiment with real-world problems that exhibit masking faults. Also, we present simulation results on OR-library datasets (set-covering formulations are predominantly validated on these matrices in the literature), posed as facility location problems. Finally, we implement these algorithms to solve problems in aerospace and automotive applications. Firstly, we address the diagnostic ambiguity problem in aerospace and automotive applications by developing a dynamic fusion framework that includes dynamic multiple fault diagnosis algorithms. This improves the correct fault isolation rate, while minimizing the false alarm rates, by considering multiple faults instead of the traditional data-driven techniques based on single fault (class)-single epoch (static) assumption. The dynamic fusion problem is formulated as a maximum a posteriori decision problem of inferring the fault sequence based on uncertain outcomes of multiple binary classifiers over time. The fusion process involves three steps: the first step transforms the multi-class problem into dichotomies using error correcting output codes (ECOC), thereby solving the concomitant binary classification problems; the second step fuses the outcomes of multiple binary classifiers over time using a sliding window or block dynamic fusion method that exploits temporal data correlations over time. We solve this NP-hard optimization problem via a Lagrangian relaxation (variational) technique. The third step optimizes the classifier parameters, viz., probabilities of detection and false alarm, using a genetic algorithm. The proposed algorithm is demonstrated by computing the diagnostic performance metrics on a twin-spool commercial jet engine, an automotive engine, and UCI datasets (problems with high classification error are specifically chosen for experimentation). We show that the primal-dual optimization framework performed consistently better than any traditional fusion technique, even when it is forced to give a single fault decision across a range of classification problems. Secondly, we implement the inference algorithms to diagnose faults in vehicle systems that are controlled by a network of electronic control units (ECUs). The faults, originating from various interactions and especially between hardware and software, are particularly challenging to address. Our basic strategy is to divide the fault universe of such cyber-physical systems in a hierarchical manner, and monitor the critical variables/signals that have impact at different levels of interactions. The proposed diagnostic strategy is validated on an electrical power generation and storage system (EPGS) controlled by two ECUs in an environment with CANoe/MATLAB co-simulation. Eleven faults are injected with the failures originating in actuator hardware, sensor, controller hardware and software components. Diagnostic matrix is established to represent the relationship between the faults and the test outcomes (also known as fault signatures) via simulations. The results show that the proposed diagnostic strategy is effective in addressing the interaction-caused faults.
Automatic target recognition and detection in infrared imagery under cluttered background

NASA Astrophysics Data System (ADS)

Gundogdu, Erhan; Koç, Aykut; Alatan, A. Aydın.

2017-10-01

Visual object classification has long been studied in visible spectrum by utilizing conventional cameras. Since the labeled images has recently increased in number, it is possible to train deep Convolutional Neural Networks (CNN) with significant amount of parameters. As the infrared (IR) sensor technology has been improved during the last two decades, labeled images extracted from IR sensors have been started to be used for object detection and recognition tasks. We address the problem of infrared object recognition and detection by exploiting 15K images from the real-field with long-wave and mid-wave IR sensors. For feature learning, a stacked denoising autoencoder is trained in this IR dataset. To recognize the objects, the trained stacked denoising autoencoder is fine-tuned according to the binary classification loss of the target object. Once the training is completed, the test samples are propagated over the network, and the probability of the test sample belonging to a class is computed. Moreover, the trained classifier is utilized in a detect-by-classification method, where the classification is performed in a set of candidate object boxes and the maximum confidence score in a particular location is accepted as the score of the detected object. To decrease the computational complexity, the detection step at every frame is avoided by running an efficient correlation filter based tracker. The detection part is performed when the tracker confidence is below a pre-defined threshold. The experiments conducted on the real field images demonstrate that the proposed detection and tracking framework presents satisfactory results for detecting tanks under cluttered background.
Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey.

PubMed

Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T

2006-08-01

The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.
A black hole-white dwarf compact binary model for long gamma-ray bursts without supernova association

NASA Astrophysics Data System (ADS)

Dong, Yi-Ze; Gu, Wei-Min; Liu, Tong; Wang, Junfeng

2018-03-01

Gamma-ray bursts (GRBs) are luminous and violent phenomena in the Universe. Traditionally, long GRBs are expected to be produced by the collapse of massive stars and associated with supernovae. However, some low-redshift long GRBs have no detection of supernova association, such as GRBs 060505, 060614, and 111005A. It is hard to classify these events convincingly according to usual classifications, and the lack of the supernova implies a non-massive star origin. We propose a new path to produce long GRBs without supernova association, the unstable and extremely violent accretion in a contact binary system consisting of a stellar-mass black hole and a white dwarf, which fills an important gap in compact binary evolution.
Integrating Human and Machine Intelligence in Galaxy Morphology Classification Tasks

NASA Astrophysics Data System (ADS)

Beck, Melanie Renee

The large flood of data flowing from observatories presents significant challenges to astronomy and cosmology--challenges that will only be magnified by projects currently under development. Growth in both volume and velocity of astrophysics data is accelerating: whereas the Sloan Digital Sky Survey (SDSS) has produced 60 terabytes of data in the last decade, the upcoming Large Synoptic Survey Telescope (LSST) plans to register 30 terabytes per night starting in the year 2020. Additionally, the Euclid Mission will acquire imaging for 5 x 107 resolvable galaxies. The field of galaxy evolution faces a particularly challenging future as complete understanding often cannot be reached without analysis of detailed morphological galaxy features. Historically, morphological analysis has relied on visual classification by astronomers, accessing the human brains capacity for advanced pattern recognition. However, this accurate but inefficient method falters when confronted with many thousands (or millions) of images. In the SDSS era, efforts to automate morphological classifications of galaxies (e.g., Conselice et al., 2000; Lotz et al., 2004) are reasonably successful and can distinguish between elliptical and disk-dominated galaxies with accuracies of 80%. While this is statistically very useful, a key problem with these methods is that they often cannot say which 80% of their samples are accurate. Furthermore, when confronted with the more complex task of identifying key substructure within galaxies, automated classification algorithms begin to fail. The Galaxy Zoo project uses a highly innovative approach to solving the scalability problem of visual classification. Displaying images of SDSS galaxies to volunteers via a simple and engaging web interface, www.galaxyzoo.org asks people to classify images by eye. Within the first year hundreds of thousands of members of the general public had classified each of the 1 million SDSS galaxies an average of 40 times. Galaxy Zoo thus solved both the visual classification problem of time efficiency and improved accuracy by producing a distribution of independent classifications for each galaxy. While crowd-sourced galaxy classifications have proven their worth, challenges remain before establishing this method as a critical and standard component of the data processing pipelines for the next generation of surveys. In particular, though innovative, crowd-sourcing techniques do not have the capacity to handle the data volume and rates expected in the next generation of surveys. These algorithms will be delegated to handle the majority of the classification tasks, freeing citizen scientists to contribute their efforts on subtler and more complex assignments. This thesis presents a solution through an integration of visual and automated classifications, preserving the best features of both human and machine. We demonstrate the effectiveness of such a system through a re-analysis of visual galaxy morphology classifications collected during the Galaxy Zoo 2 (GZ2) project. We reprocess the top-level question of the GZ2 decision tree with a Bayesian classification aggregation algorithm dubbed SWAP, originally developed for the Space Warps gravitational lens project. Through a simple binary classification scheme we increase the classification rate nearly 5-fold classifying 226,124 galaxies in 92 days of GZ2 project time while reproducing labels derived from GZ2 classification data with 95.7% accuracy. We next combine this with a Random Forest machine learning algorithm that learns on a suite of non-parametric morphology indicators widely used for automated morphologies. We develop a decision engine that delegates tasks between human and machine and demonstrate that the combined system provides a factor of 11.4 increase in the classification rate, classifying 210,803 galaxies in just 32 days of GZ2 project time with 93.1% accuracy. As the Random Forest algorithm requires a minimal amount of computational cost, this result has important implications for galaxy morphology identification tasks in the era of Euclid and other large-scale surveys.
Cellular automata rule characterization and classification using texture descriptors

NASA Astrophysics Data System (ADS)

Machicao, Jeaneth; Ribas, Lucas C.; Scabini, Leonardo F. S.; Bruno, Odermir M.

2018-05-01

The cellular automata (CA) spatio-temporal patterns have attracted the attention from many researchers since it can provide emergent behavior resulting from the dynamics of each individual cell. In this manuscript, we propose an approach of texture image analysis to characterize and classify CA rules. The proposed method converts the CA spatio-temporal patterns into a gray-scale image. The gray-scale is obtained by creating a binary number based on the 8-connected neighborhood of each dot of the CA spatio-temporal pattern. We demonstrate that this technique enhances the CA rule characterization and allow to use different texture image analysis algorithms. Thus, various texture descriptors were evaluated in a supervised training approach aiming to characterize the CA's global evolution. Our results show the efficiency of the proposed method for the classification of the elementary CA (ECAs), reaching a maximum of 99.57% of accuracy rate according to the Li-Packard scheme (6 classes) and 94.36% for the classification of the 88 rules scheme. Moreover, within the image analysis context, we found a better performance of the method by means of a transformation of the binary states to a gray-scale.
A liver cirrhosis classification on B-mode ultrasound images by the use of higher order local autocorrelation features

NASA Astrophysics Data System (ADS)

Sasaki, Kenya; Mitani, Yoshihiro; Fujita, Yusuke; Hamamoto, Yoshihiko; Sakaida, Isao

2017-02-01

In this paper, in order to classify liver cirrhosis on regions of interest (ROIs) images from B-mode ultrasound images, we have proposed to use the higher order local autocorrelation (HLAC) features. In a previous study, we tried to classify liver cirrhosis by using a Gabor filter based approach. However, the classification performance of the Gabor feature was poor from our preliminary experimental results. In order accurately to classify liver cirrhosis, we examined to use the HLAC features for liver cirrhosis classification. The experimental results show the effectiveness of HLAC features compared with the Gabor feature. Furthermore, by using a binary image made by an adaptive thresholding method, the classification performance of HLAC features has improved.
Chaotic particle swarm optimization with mutation for classification.

PubMed

Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza

2015-01-01

In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms.
An embedded system for face classification in infrared video using sparse representation

NASA Astrophysics Data System (ADS)

Saavedra M., Antonio; Pezoa, Jorge E.; Zarkesh-Ha, Payman; Figueroa, Miguel

2017-09-01

We propose a platform for robust face recognition in Infrared (IR) images using Compressive Sensing (CS). In line with CS theory, the classification problem is solved using a sparse representation framework, where test images are modeled by means of a linear combination of the training set. Because the training set constitutes an over-complete dictionary, we identify new images by finding their sparsest representation based on the training set, using standard l1-minimization algorithms. Unlike conventional face-recognition algorithms, we feature extraction is performed using random projections with a precomputed binary matrix, as proposed in the CS literature. This random sampling reduces the effects of noise and occlusions such as facial hair, eyeglasses, and disguises, which are notoriously challenging in IR images. Thus, the performance of our framework is robust to these noise and occlusion factors, achieving an average accuracy of approximately 90% when the UCHThermalFace database is used for training and testing purposes. We implemented our framework on a high-performance embedded digital system, where the computation of the sparse representation of IR images was performed by a dedicated hardware using a deeply pipelined architecture on an Field-Programmable Gate Array (FPGA).

On the orbital evolution of radiating binary systems

NASA Astrophysics Data System (ADS)

Bekov, A. A.; Momynov, S. B.

2018-05-01

The evolution of dynamic parameters of radiating binary systems with variable mass is studied. As a dynamic model, the problem of two gravitating and radiating bodies is considered, taking into account the gravitational attraction and the light pressure of the interacting bodies with the additional assumption of isotropic variability of their masses. The problem combines the Gylden-Meshchersky problem, acquiring a new physical meaning, and the two-body photogravitational Radzievsky problem. The evolving orbit is presented, unlike Kepler, with varying orbital elements - parameter and eccentricity, defines by the parameter µ(t), area integral C and quasi-integral energy h(t). Adiabatic invariants of the problem, which are of interest for the slow evolution of orbits, are determined. The general course of evolution of orbits of binary systems with radiation are determined by the change of the parameter µ(t) and the total energy of the system.
Robust online tracking via adaptive samples selection with saliency detection

NASA Astrophysics Data System (ADS)

Yan, Jia; Chen, Xi; Zhu, QiuPing

2013-12-01

Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.
Unary and binary multisystems; topologic classification of phase diagrams and relation to Euler's theorem on polyhedra.

USGS Publications Warehouse

Roseboom, E.H.; Zen, E.-A.

1982-01-01

A representation polyhedron summarizing the topology of a large number of possible nets previously devised by Zen (M.A. 18-167) is extended from n + 3 unary to n + 6 phase unary systems. A general way for constructing n + 4 phase nets is outlined. With the technique described, 62 multisystems are recognized, of which 26 contain all 16 possible divariant fields and represent the most nearly complete closed nets possible for a binary six-phase (n + 4) multisystem.-M.S.
A Catalog of Eclipsing Binaries and Variable Stars Observed with ASTEP 400 from Dome C, Antarctica

NASA Astrophysics Data System (ADS)

Chapellier, E.; Mékarnia, D.; Abe, L.; Guillot, T.; Agabi, K.; Rivet, J.-P.; Schmider, F.-X.; Crouzet, N.; Aristidi, E.

2016-10-01

We used the large photometric database of the ASTEP program, whose primary goal was to detect exoplanets in the southern hemisphere from Antarctica, to search for eclipsing binaries (EcBs) and variable stars. 673 EcBs and 1166 variable stars were detected, including 31 previously known stars. The resulting online catalogs give the identification, the classification, the period, and the depth or semi-amplitude of each star. Data and light curves for each object are available at http://astep-vo.oca.eu.
Hα and Gaia-RVS domain spectroscopy of Be stars and interacting binaries with Ondřejov 2m telescope

NASA Astrophysics Data System (ADS)

Koubský, P.; Kotková, L.; Votruba, V.

2011-12-01

A long term project to investigate the spectral appearance over the Gaia RVS domain of a large sample of Be stars and interacting binaries has been undertaken. The aim of the Ondřejov project is to create sufficient amounts of training data in the RVS wavelength domain to complement the Bp/Rp classification of Be stars which may be observed with Gaia. The project's current status is described and sample spectra in both the Hα and RVS wavelength domains are presented and discussed.
Separated Fringe Packet Observations with the Chara Array. 1. Methods and New Orbits for chi Draconis, HD 184467, and HD 198084

DTIC Science & Technology

2010-06-01

similar experiments using the Infrared Optical Telescope Array ( IOTA ) on the well- studied, widely separated binary ζ Hercules, in an attempt to revive...SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 11 19a. NAME OF RESPONSIBLE PERSON...SFPs with IOTA As noted by Dyck et al. (1995), for a binary star for which both components are within the field of view of the interferometer, it is
Searching for Genotype-Phenotype Structure: Using Hierarchical Log-Linear Models in Crohn Disease

PubMed Central

Chapman, Juliet M.; Onnie, Clive M.; Prescott, Natalie J.; Fisher, Sheila A.; Mansfield, John C.; Mathew, Christopher G.; Lewis, Cathryn M.; Verzilli, Claudio J.; Whittaker, John C.

2009-01-01

There has been considerable recent success in the detection of gene-disease associations. We consider here the development of tools that facilitate the more detailed characterization of the effect of a genetic variant on disease. We replace the simplistic classification of individuals according to a single binary disease indicator with classification according to a number of subphenotypes. This more accurately reflects the underlying biological complexity of the disease process, but it poses additional analytical difficulties. Notably, the subphenotypes that make up a particular disease are typically highly associated, and it becomes difficult to distinguish which genes might be causing which subphenotypes. Such problems arise in many complex diseases. Here, we concentrate on an application to Crohn disease (CD). We consider this problem as one of model selection based upon log-linear models, fitted in a Bayesian framework via reversible-jump Metropolis-Hastings approach. We evaluate the performance of our suggested approach with a simple simulation study and then apply the method to a real data example in CD, revealing a sparse disease structure. Most notably, the associated NOD2.908G→R mutation appears to be directly related to more severe disease behaviors, whereas the other two associated NOD2 variants, 1007L→FS and 702R→W, are more generally related to disease in the small bowel (ileum and jejenum). The ATG16L1.300T→A variant appears to be directly associated with only disease of the small bowel. PMID:19185283
Dynamic detection-rate-based bit allocation with genuine interval concealment for binary biometric representation.

PubMed

Lim, Meng-Hui; Teoh, Andrew Beng Jin; Toh, Kar-Ann

2013-06-01

Biometric discretization is a key component in biometric cryptographic key generation. It converts an extracted biometric feature vector into a binary string via typical steps such as segmentation of each feature element into a number of labeled intervals, mapping of each interval-captured feature element onto a binary space, and concatenation of the resulted binary output of all feature elements into a binary string. Currently, the detection rate optimized bit allocation (DROBA) scheme is one of the most effective biometric discretization schemes in terms of its capability to assign binary bits dynamically to user-specific features with respect to their discriminability. However, we learn that DROBA suffers from potential discriminative feature misdetection and underdiscretization in its bit allocation process. This paper highlights such drawbacks and improves upon DROBA based on a novel two-stage algorithm: 1) a dynamic search method to efficiently recapture such misdetected features and to optimize the bit allocation of underdiscretized features and 2) a genuine interval concealment technique to alleviate crucial information leakage resulted from the dynamic search. Improvements in classification accuracy on two popular face data sets vindicate the feasibility of our approach compared with DROBA.
Monsoon Forecasting based on Imbalanced Classification Techniques

NASA Astrophysics Data System (ADS)

Ribera, Pedro; Troncoso, Alicia; Asencio-Cortes, Gualberto; Vega, Inmaculada; Gallego, David

2017-04-01

Monsoonal systems are quasiperiodic processes of the climatic system that control seasonal precipitation over different regions of the world. The Western North Pacific Summer Monsoon (WNPSM) is one of those monsoons and it is known to have a great impact both over the global climate and over the total precipitation of very densely populated areas. The interannual variability of the WNPSM along the last 50-60 years has been related to different climatic indices such as El Niño, El Niño Modoki, the Indian Ocean Dipole or the Pacific Decadal Oscillation. Recently, a new and longer series characterizing the monthly evolution of the WNPSM, the WNP Directional Index (WNPDI), has been developed, extending its previous length from about 50 years to more than 100 years (1900-2007). Imbalanced classification techniques have been applied to the WNPDI in order to check the capability of traditional climate indices to capture and forecast the evolution of the WNPSM. The problem of forecasting has been transformed into a binary classification problem, in which the positive class represents the occurrence of an extreme monsoon event. Given that the number of extreme monsoons is much lower than the number of non-extreme monsoons, the resultant classification problem is highly imbalanced. The complete dataset is composed of 1296 instances, where only 71 (5.47%) samples correspond to extreme monsoons. Twenty predictor variables based on the cited climatic indices have been proposed, and namely, models based on trees, black box models such as neural networks, support vector machines and nearest neighbors, and finally ensemble-based techniques as random forests have been used in order to forecast the occurrence of extreme monsoons. It can be concluded that the methodology proposed here reports promising results according to the quality parameters evaluated and predicts extreme monsoons for a temporal horizon of a month with a high accuracy. From a climatological point of view, models based on trees show that the index of the El Niño Modoki in the months previous to an extreme monsoon acts as its best predictor. In most cases, the value of the Indian Ocean Dipole index acts as a second order classifier. But El Niño index, more frequently, or the Pacific Decadal Oscillation index, only in one case, do also modulate the intensity of the WNPSM in some cases.
A subgradient approach for constrained binary optimization via quantum adiabatic evolution

NASA Astrophysics Data System (ADS)

Karimi, Sahar; Ronagh, Pooya

2017-08-01

Outer approximation method has been proposed for solving the Lagrangian dual of a constrained binary quadratic programming problem via quantum adiabatic evolution in the literature. This should be an efficient prescription for solving the Lagrangian dual problem in the presence of an ideally noise-free quantum adiabatic system. However, current implementations of quantum annealing systems demand methods that are efficient at handling possible sources of noise. In this paper, we consider a subgradient method for finding an optimal primal-dual pair for the Lagrangian dual of a constrained binary polynomial programming problem. We then study the quadratic stable set (QSS) problem as a case study. We see that this method applied to the QSS problem can be viewed as an instance-dependent penalty-term approach that avoids large penalty coefficients. Finally, we report our experimental results of using the D-Wave 2X quantum annealer and conclude that our approach helps this quantum processor to succeed more often in solving these problems compared to the usual penalty-term approaches.
Fast and reliable symplectic integration for planetary system N-body problems

NASA Astrophysics Data System (ADS)

Hernandez, David M.

2016-06-01

We apply one of the exactly symplectic integrators, which we call HB15, of Hernandez & Bertschinger, along with the Kepler problem solver of Wisdom & Hernandez, to solve planetary system N-body problems. We compare the method to Wisdom-Holman (WH) methods in the MERCURY software package, the MERCURY switching integrator, and others and find HB15 to be the most efficient method or tied for the most efficient method in many cases. Unlike WH, HB15 solved N-body problems exhibiting close encounters with small, acceptable error, although frequent encounters slowed the code. Switching maps like MERCURY change between two methods and are not exactly symplectic. We carry out careful tests on their properties and suggest that they must be used with caution. We then use different integrators to solve a three-body problem consisting of a binary planet orbiting a star. For all tested tolerances and time steps, MERCURY unbinds the binary after 0 to 25 years. However, in the solutions of HB15, a time-symmetric HERMITE code, and a symplectic Yoshida method, the binary remains bound for >1000 years. The methods' solutions are qualitatively different, despite small errors in the first integrals in most cases. Several checks suggest that the qualitative binary behaviour of HB15's solution is correct. The Bulirsch-Stoer and Radau methods in the MERCURY package also unbind the binary before a time of 50 years, suggesting that this dynamical error is due to a MERCURY bug.
Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

PubMed

Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

2014-07-01

Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Teaching Non-Recursive Binary Searching: Establishing a Conceptual Framework.

ERIC Educational Resources Information Center

Magel, E. Terry

1989-01-01

Discusses problems associated with teaching non-recursive binary searching in computer language classes, and describes a teacher-directed dialog based on dictionary use that helps students use their previous searching experiences to conceptualize the binary search process. Algorithmic development is discussed and appropriate classroom discussion…
Di-codon Usage for Gene Classification

NASA Astrophysics Data System (ADS)

Nguyen, Minh N.; Ma, Jianmin; Fogel, Gary B.; Rajapakse, Jagath C.

Classification of genes into biologically related groups facilitates inference of their functions. Codon usage bias has been described previously as a potential feature for gene classification. In this paper, we demonstrate that di-codon usage can further improve classification of genes. By using both codon and di-codon features, we achieve near perfect accuracies for the classification of HLA molecules into major classes and sub-classes. The method is illustrated on 1,841 HLA sequences which are classified into two major classes, HLA-I and HLA-II. Major classes are further classified into sub-groups. A binary SVM using di-codon usage patterns achieved 99.95% accuracy in the classification of HLA genes into major HLA classes; and multi-class SVM achieved accuracy rates of 99.82% and 99.03% for sub-class classification of HLA-I and HLA-II genes, respectively. Furthermore, by combining codon and di-codon usages, the prediction accuracies reached 100%, 99.82%, and 99.84% for HLA major class classification, and for sub-class classification of HLA-I and HLA-II genes, respectively.
Studies of Horst's Procedure for Binary Data Analysis.

ERIC Educational Resources Information Center

Gray, William M.; Hofmann, Richard J.

Most responses to educational and psychological test items may be represented in binary form. However, such dichotomously scored items present special problems when an analysis of correlational interrelationships among the items is attempted. Two general methods of analyzing binary data are proposed by Horst to partial out the effects of…
A binary method for simple and accurate two-dimensional cursor control from EEG with minimal subject training.

PubMed

Kayagil, Turan A; Bai, Ou; Henriquez, Craig S; Lin, Peter; Furlani, Stephen J; Vorbach, Sherry; Hallett, Mark

2009-05-06

Brain-computer interfaces (BCI) use electroencephalography (EEG) to interpret user intention and control an output device accordingly. We describe a novel BCI method to use a signal from five EEG channels (comprising one primary channel with four additional channels used to calculate its Laplacian derivation) to provide two-dimensional (2-D) control of a cursor on a computer screen, with simple threshold-based binary classification of band power readings taken over pre-defined time windows during subject hand movement. We tested the paradigm with four healthy subjects, none of whom had prior BCI experience. Each subject played a game wherein he or she attempted to move a cursor to a target within a grid while avoiding a trap. We also present supplementary results including one healthy subject using motor imagery, one primary lateral sclerosis (PLS) patient, and one healthy subject using a single EEG channel without Laplacian derivation. For the four healthy subjects using real hand movement, the system provided accurate cursor control with little or no required user training. The average accuracy of the cursor movement was 86.1% (SD 9.8%), which is significantly better than chance (p = 0.0015). The best subject achieved a control accuracy of 96%, with only one incorrect bit classification out of 47. The supplementary results showed that control can be achieved under the respective experimental conditions, but with reduced accuracy. The binary method provides naïve subjects with real-time control of a cursor in 2-D using dichotomous classification of synchronous EEG band power readings from a small number of channels during hand movement. The primary strengths of our method are simplicity of hardware and software, and high accuracy when used by untrained subjects.
Classification of solid dispersions: correlation to (i) stability and solubility (ii) preparation and characterization techniques.

PubMed

Meng, Fan; Gala, Urvi; Chauhan, Harsh

2015-01-01

Solid dispersion has been a topic of interest in recent years for its potential in improving oral bioavailability, especially for poorly water soluble drugs where dissolution could be the rate-limiting step of oral absorption. Understanding the physical state of the drug and polymers in solid dispersions is essential as it influences both the stability and solubility of these systems. This review emphasizes on the classification of solid dispersions based on the physical states of drug and polymer. Based on this classification, stability aspects such as crystallization tendency, glass transition temperature (Tg), drug polymer miscibility, molecular mobility, etc. and solubility aspects have been discussed. In addition, preparation and characterization methods for binary solid dispersions based on the classification have also been discussed.
exprso: an R-package for the rapid implementation of machine learning algorithms.

PubMed

Quinn, Thomas; Tylee, Daniel; Glatt, Stephen

2016-01-01

Machine learning plays a major role in many scientific investigations. However, non-expert programmers may struggle to implement the elaborate pipelines necessary to build highly accurate and generalizable models. We introduce exprso , a new R package that is an intuitive machine learning suite designed specifically for non-expert programmers. Built initially for the classification of high-dimensional data, exprso uses an object-oriented framework to encapsulate a number of common analytical methods into a series of interchangeable modules. This includes modules for feature selection, classification, high-throughput parameter grid-searching, elaborate cross-validation schemes (e.g., Monte Carlo and nested cross-validation), ensemble classification, and prediction. In addition, exprso also supports multi-class classification (through the 1-vs-all generalization of binary classifiers) and the prediction of continuous outcomes.
A multi-pattern hash-binary hybrid algorithm for URL matching in the HTTP protocol.

PubMed

Zeng, Ping; Tan, Qingping; Meng, Xiankai; Shao, Zeming; Xie, Qinzheng; Yan, Ying; Cao, Wei; Xu, Jianjun

2017-01-01

In this paper, based on our previous multi-pattern uniform resource locator (URL) binary-matching algorithm called HEM, we propose an improved multi-pattern matching algorithm called MH that is based on hash tables and binary tables. The MH algorithm can be applied to the fields of network security, data analysis, load balancing, cloud robotic communications, and so on-all of which require string matching from a fixed starting position. Our approach effectively solves the performance problems of the classical multi-pattern matching algorithms. This paper explores ways to improve string matching performance under the HTTP protocol by using a hash method combined with a binary method that transforms the symbol-space matching problem into a digital-space numerical-size comparison and hashing problem. The MH approach has a fast matching speed, requires little memory, performs better than both the classical algorithms and HEM for matching fields in an HTTP stream, and it has great promise for use in real-world applications.
A multi-pattern hash-binary hybrid algorithm for URL matching in the HTTP protocol

PubMed Central

Tan, Qingping; Meng, Xiankai; Shao, Zeming; Xie, Qinzheng; Yan, Ying; Cao, Wei; Xu, Jianjun

2017-01-01

In this paper, based on our previous multi-pattern uniform resource locator (URL) binary-matching algorithm called HEM, we propose an improved multi-pattern matching algorithm called MH that is based on hash tables and binary tables. The MH algorithm can be applied to the fields of network security, data analysis, load balancing, cloud robotic communications, and so on—all of which require string matching from a fixed starting position. Our approach effectively solves the performance problems of the classical multi-pattern matching algorithms. This paper explores ways to improve string matching performance under the HTTP protocol by using a hash method combined with a binary method that transforms the symbol-space matching problem into a digital-space numerical-size comparison and hashing problem. The MH approach has a fast matching speed, requires little memory, performs better than both the classical algorithms and HEM for matching fields in an HTTP stream, and it has great promise for use in real-world applications. PMID:28399157

Compensatory neurofuzzy model for discrete data classification in biomedical

NASA Astrophysics Data System (ADS)

Ceylan, Rahime

2015-03-01

Biomedical data is separated to two main sections: signals and discrete data. So, studies in this area are about biomedical signal classification or biomedical discrete data classification. There are artificial intelligence models which are relevant to classification of ECG, EMG or EEG signals. In same way, in literature, many models exist for classification of discrete data taken as value of samples which can be results of blood analysis or biopsy in medical process. Each algorithm could not achieve high accuracy rate on classification of signal and discrete data. In this study, compensatory neurofuzzy network model is presented for classification of discrete data in biomedical pattern recognition area. The compensatory neurofuzzy network has a hybrid and binary classifier. In this system, the parameters of fuzzy systems are updated by backpropagation algorithm. The realized classifier model is conducted to two benchmark datasets (Wisconsin Breast Cancer dataset and Pima Indian Diabetes dataset). Experimental studies show that compensatory neurofuzzy network model achieved 96.11% accuracy rate in classification of breast cancer dataset and 69.08% accuracy rate was obtained in experiments made on diabetes dataset with only 10 iterations.
UNDERSTANDING AND APPLYING ENVIRONMENTAL RELATIVE MOLDINESS INDEX - ERMI

EPA Science Inventory

This study compared two binary classification methods to evaluate the mold condition in 271 homes of infants, 144 of which later developed symptoms of respiratory illness. A method using on-site visual mold inspection was compared to another method using a quantitative index of ...
Probabilistic detection of volcanic ash using a Bayesian approach

NASA Astrophysics Data System (ADS)

Mackie, Shona; Watson, Matthew

2014-03-01

Airborne volcanic ash can pose a hazard to aviation, agriculture, and both human and animal health. It is therefore important that ash clouds are monitored both day and night, even when they travel far from their source. Infrared satellite data provide perhaps the only means of doing this, and since the hugely expensive ash crisis that followed the 2010 Eyjafjalljökull eruption, much research has been carried out into techniques for discriminating ash in such data and for deriving key properties. Such techniques are generally specific to data from particular sensors, and most approaches result in a binary classification of pixels into "ash" and "ash free" classes with no indication of the classification certainty for individual pixels. Furthermore, almost all operational methods rely on expert-set thresholds to determine what constitutes "ash" and can therefore be criticized for being subjective and dependent on expertise that may not remain with an institution. Very few existing methods exploit available contemporaneous atmospheric data to inform the detection, despite the sensitivity of most techniques to atmospheric parameters. The Bayesian method proposed here does exploit such data and gives a probabilistic, physically based classification. We provide an example of the method's implementation for a scene containing both land and sea observations, and a large area of desert dust (often misidentified as ash by other methods). The technique has already been successfully applied to other detection problems in remote sensing, and this work shows that it will be a useful and effective tool for ash detection.
Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.

PubMed

Li, Jinyan; Fong, Simon; Sung, Yunsick; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L

2016-01-01

An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Apellániz, J. Maíz; Sota, A.; Alfaro, E. J.

This is the third installment of the Galactic O-Star Spectroscopic Survey (GOSSS), a massive spectroscopic survey of Galactic O stars, based on new homogeneous, high signal-to-noise ratio, R ∼ 2500 digital observations selected from the Galactic O-Star Catalog. In this paper, we present 142 additional stellar systems with O stars from both hemispheres, bringing the total of O-type systems published within the project to 590. Among the new objects, there are 20 new O stars. We also identify 11 new double-lined spectroscopic binaries, 6 of which are of O+O type and 5 of O+B type, and an additional new tripled-lined spectroscopic binary of O+O+Bmore » type. We also revise some of the previous GOSSS classifications, present some egregious examples of stars erroneously classified as O-type in the past, introduce the use of luminosity class IV at spectral types O4-O5.5, and adapt the classification scheme to the work of Arias et al.« less
A practical approach for writer-dependent symbol recognition using a writer-independent symbol recognizer.

PubMed

LaViola, Joseph J; Zeleznik, Robert C

2007-11-01

We present a practical technique for using a writer-independent recognition engine to improve the accuracy and speed while reducing the training requirements of a writer-dependent symbol recognizer. Our writer-dependent recognizer uses a set of binary classifiers based on the AdaBoost learning algorithm, one for each possible pairwise symbol comparison. Each classifier consists of a set of weak learners, one of which is based on a writer-independent handwriting recognizer. During online recognition, we also use the n-best list of the writer-independent recognizer to prune the set of possible symbols and thus reduce the number of required binary classifications. In this paper, we describe the geometric and statistical features used in our recognizer and our all-pairs classification algorithm. We also present the results of experiments that quantify the effect incorporating a writer-independent recognition engine into a writer-dependent recognizer has on accuracy, speed, and user training time.
Equilibrium points and associated periodic orbits in the gravity of binary asteroid systems: (66391) 1999 KW4 as an example

NASA Astrophysics Data System (ADS)

Shi, Yu; Wang, Yue; Xu, Shijie

2018-04-01

The motion of a massless particle in the gravity of a binary asteroid system, referred as the restricted full three-body problem (RF3BP), is fundamental, not only for the evolution of the binary system, but also for the design of relevant space missions. In this paper, equilibrium points and associated periodic orbit families in the gravity of a binary system are investigated, with the binary (66391) 1999 KW4 as an example. The polyhedron shape model is used to describe irregular shapes and corresponding gravity fields of the primary and secondary of (66391) 1999 KW4, which is more accurate than the ellipsoid shape model in previous studies and provides a high-fidelity representation of the gravitational environment. Both of the synchronous and non-synchronous states of the binary system are considered. For the synchronous binary system, the equilibrium points and their stability are determined, and periodic orbit families emanating from each equilibrium point are generated by using the shooting (multiple shooting) method and the homotopy method, where the homotopy function connects the circular restricted three-body problem and RF3BP. In the non-synchronous binary system, trajectories of equivalent equilibrium points are calculated, and the associated periodic orbits are obtained by using the homotopy method, where the homotopy function connects the synchronous and non-synchronous systems. Although only the binary (66391) 1999 KW4 is considered, our methods will also be well applicable to other binary systems with polyhedron shape data. Our results on equilibrium points and associated periodic orbits provide general insights into the dynamical environment and orbital behaviors in proximity of small binary asteroids and enable the trajectory design and mission operations in future binary system explorations.
Binary star orbits from speckle interferometry. 5: A combined speckle/spectroscopic study of the O star binary 15 Monocerotis

NASA Technical Reports Server (NTRS)

Gies, Douglas R.; Mason, Brian D.; Hartkopf, William I.; Mcalister, Harold A.; Frazin, Richard A.; Hahula, Michael E.; Penny, Laura R.; Thaller, Michelle L.; Fullerton, Alexander W.; Shara, Michael M.

1993-01-01

We report on the discovery of a speckle binary companion to the O7 V (f) star 15 Monocerotis. A study of published radial velocities in conjunction with new measurements from Kitt Peak National Observatory (KPNO) and IUE suggests that the star is also a spectroscopic binary with a period of 25 years and a large eccentricity. Thus, 15 Mon is the first O star to bridge the gap between the spectroscopic and visual separation regimes. We have used the star's membership in the cluster NGC 2264 together with the cluster distance to derive masses of 34 and 19 solar mass for the primary and secondary, respectively. Several of the He I line profiles display a broad shallow component which we associate with the secondary, and we estimate the secondary's classification to be O9.5 Vn. The new orbit leads to several important predictions that can be tested over the next few years.
Visual Recognition Software for Binary Classification and its Application to Pollen Identification

NASA Astrophysics Data System (ADS)

Punyasena, S. W.; Tcheng, D. K.; Nayak, A.

2014-12-01

An underappreciated source of uncertainty in paleoecology is the uncertainty of palynological identifications. The confidence of any given identification is not regularly reported in published results, so cannot be incorporated into subsequent meta-analyses. Automated identifications systems potentially provide a means of objectively measuring the confidence of a given count or single identification, as well as a mechanism for increasing sample sizes and throughput. We developed the software ARLO (Automated Recognition with Layered Optimization) to tackle difficult visual classification problems such as pollen identification. ARLO applies pattern recognition and machine learning to the analysis of pollen images. The features that the system discovers are not the traditional features of pollen morphology. Instead, general purpose image features, such as pixel lines and grids of different dimensions, size, spacing, and resolution, are used. ARLO adapts to a given problem by searching for the most effective combination of feature representation and learning strategy. We present a two phase approach which uses our machine learning process to first segment pollen grains from the background and then classify pollen pixels and report species ratios. We conducted two separate experiments that utilized two distinct sets of algorithms and optimization procedures. The first analysis focused on reconstructing black and white spruce pollen ratios, training and testing our classification model at the slide level. This allowed us to directly compare our automated counts and expert counts to slides of known spruce ratios. Our second analysis focused on maximizing classification accuracy at the individual pollen grain level. Instead of predicting ratios of given slides, we predicted the species represented in a given image window. The resulting analysis was more scalable, as we were able to adapt the most efficient parts of the methodology from our first analysis. ARLO was able to distinguish between the pollen of black and white spruce with an accuracy of ~83.61%. This compared favorably to human expert performance. At the writing of this abstract, we are also experimenting with experimenting with the analysis of higher diversity samples, including modern tropical pollen material collected from ground pollen traps.
Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey

PubMed Central

Özge, C; Toros, F; Bayramkaya, E; Çamdeviren, H; Şaşmaz, T

2006-01-01

Background The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Methods Using in‐class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. Results The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. Conclusions It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed. PMID:16891446
A signature dissimilarity measure for trabecular bone texture in knee radiographs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woloszynski, T.; Podsiadlo, P.; Stachowiak, G. W.

Purpose: The purpose of this study is to develop a dissimilarity measure for the classification of trabecular bone (TB) texture in knee radiographs. Problems associated with the traditional extraction and selection of texture features and with the invariance to imaging conditions such as image size, anisotropy, noise, blur, exposure, magnification, and projection angle were addressed. Methods: In the method developed, called a signature dissimilarity measure (SDM), a sum of earth mover's distances calculated for roughness and orientation signatures is used to quantify dissimilarities between textures. Scale-space theory was used to ensure scale and rotation invariance. The effects of image size,more » anisotropy, noise, and blur on the SDM developed were studied using computer generated fractal texture images. The invariance of the measure to image exposure, magnification, and projection angle was studied using x-ray images of human tibia head. For the studies, Mann-Whitney tests with significance level of 0.01 were used. A comparison study between the performances of a SDM based classification system and other two systems in the classification of Brodatz textures and the detection of knee osteoarthritis (OA) were conducted. The other systems are based on weighted neighbor distance using compound hierarchy of algorithms representing morphology (WND-CHARM) and local binary patterns (LBP). Results: Results obtained indicate that the SDM developed is invariant to image exposure (2.5-30 mA s), magnification (x1.00-x1.35), noise associated with film graininess and quantum mottle (<25%), blur generated by a sharp film screen, and image size (>64x64 pixels). However, the measure is sensitive to changes in projection angle (>5 deg.), image anisotropy (>30 deg.), and blur generated by a regular film screen. For the classification of Brodatz textures, the SDM based system produced comparable results to the LBP system. For the detection of knee OA, the SDM based system achieved 78.8% classification accuracy and outperformed the WND-CHARM system (64.2%). Conclusions: The SDM is well suited for the classification of TB texture images in knee OA detection and may be useful for the texture classification of medical images in general.« less
Kernel and divergence techniques in high energy physics separations

NASA Astrophysics Data System (ADS)

Bouř, Petr; Kůs, Václav; Franc, Jiří

2017-10-01

Binary decision trees under the Bayesian decision technique are used for supervised classification of high-dimensional data. We present a great potential of adaptive kernel density estimation as the nested separation method of the supervised binary divergence decision tree. Also, we provide a proof of alternative computing approach for kernel estimates utilizing Fourier transform. Further, we apply our method to Monte Carlo data set from the particle accelerator Tevatron at DØ experiment in Fermilab and provide final top-antitop signal separation results. We have achieved up to 82 % AUC while using the restricted feature selection entering the signal separation procedure.
CNN universal machine as classificaton platform: an art-like clustering algorithm.

PubMed

Bálya, David

2003-12-01

Fast and robust classification of feature vectors is a crucial task in a number of real-time systems. A cellular neural/nonlinear network universal machine (CNN-UM) can be very efficient as a feature detector. The next step is to post-process the results for object recognition. This paper shows how a robust classification scheme based on adaptive resonance theory (ART) can be mapped to the CNN-UM. Moreover, this mapping is general enough to include different types of feed-forward neural networks. The designed analogic CNN algorithm is capable of classifying the extracted feature vectors keeping the advantages of the ART networks, such as robust, plastic and fault-tolerant behaviors. An analogic algorithm is presented for unsupervised classification with tunable sensitivity and automatic new class creation. The algorithm is extended for supervised classification. The presented binary feature vector classification is implemented on the existing standard CNN-UM chips for fast classification. The experimental evaluation shows promising performance after 100% accuracy on the training set.
The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.

PubMed

Saito, Takaya; Rehmsmeier, Marc

2015-01-01

Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.
Graph Theory-Based Brain Connectivity for Automatic Classification of Multiple Sclerosis Clinical Courses.

PubMed

Kocevar, Gabriel; Stamile, Claudio; Hannoun, Salem; Cotton, François; Vukusic, Sandra; Durand-Dubief, Françoise; Sappey-Marinier, Dominique

2016-01-01

Purpose: In this work, we introduce a method to classify Multiple Sclerosis (MS) patients into four clinical profiles using structural connectivity information. For the first time, we try to solve this question in a fully automated way using a computer-based method. The main goal is to show how the combination of graph-derived metrics with machine learning techniques constitutes a powerful tool for a better characterization and classification of MS clinical profiles. Materials and Methods: Sixty-four MS patients [12 Clinical Isolated Syndrome (CIS), 24 Relapsing Remitting (RR), 24 Secondary Progressive (SP), and 17 Primary Progressive (PP)] along with 26 healthy controls (HC) underwent MR examination. T1 and diffusion tensor imaging (DTI) were used to obtain structural connectivity matrices for each subject. Global graph metrics, such as density and modularity, were estimated and compared between subjects' groups. These metrics were further used to classify patients using tuned Support Vector Machine (SVM) combined with Radial Basic Function (RBF) kernel. Results: When comparing MS patients to HC subjects, a greater assortativity, transitivity, and characteristic path length as well as a lower global efficiency were found. Using all graph metrics, the best F -Measures (91.8, 91.8, 75.6, and 70.6%) were obtained for binary (HC-CIS, CIS-RR, RR-PP) and multi-class (CIS-RR-SP) classification tasks, respectively. When using only one graph metric, the best F -Measures (83.6, 88.9, and 70.7%) were achieved for modularity with previous binary classification tasks. Conclusion: Based on a simple DTI acquisition associated with structural brain connectivity analysis, this automatic method allowed an accurate classification of different MS patients' clinical profiles.
Neural network ensemble based CAD system for focal liver lesions from B-mode ultrasound.

PubMed

Virmani, Jitendra; Kumar, Vinod; Kalra, Naveen; Khandelwal, Niranjan

2014-08-01

A neural network ensemble (NNE) based computer-aided diagnostic (CAD) system to assist radiologists in differential diagnosis between focal liver lesions (FLLs), including (1) typical and atypical cases of Cyst, hemangioma (HEM) and metastatic carcinoma (MET) lesions, (2) small and large hepatocellular carcinoma (HCC) lesions, along with (3) normal (NOR) liver tissue is proposed in the present work. Expert radiologists, visualize the textural characteristics of regions inside and outside the lesions to differentiate between different FLLs, accordingly texture features computed from inside lesion regions of interest (IROIs) and texture ratio features computed from IROIs and surrounding lesion regions of interests (SROIs) are taken as input. Principal component analysis (PCA) is used for reducing the dimensionality of the feature space before classifier design. The first step of classification module consists of a five class PCA-NN based primary classifier which yields probability outputs for five liver image classes. The second step of classification module consists of ten binary PCA-NN based secondary classifiers for NOR/Cyst, NOR/HEM, NOR/HCC, NOR/MET, Cyst/HEM, Cyst/HCC, Cyst/MET, HEM/HCC, HEM/MET and HCC/MET classes. The probability outputs of five class PCA-NN based primary classifier is used to determine the first two most probable classes for a test instance, based on which it is directed to the corresponding binary PCA-NN based secondary classifier for crisp classification between two classes. By including the second step of the classification module, classification accuracy increases from 88.7 % to 95 %. The promising results obtained by the proposed system indicate its usefulness to assist radiologists in differential diagnosis of FLLs.
Low-resolution expression recognition based on central oblique average CS-LBP with adaptive threshold

NASA Astrophysics Data System (ADS)

Han, Sheng; Xi, Shi-qiong; Geng, Wei-dong

2017-11-01

In order to solve the problem of low recognition rate of traditional feature extraction operators under low-resolution images, a novel algorithm of expression recognition is proposed, named central oblique average center-symmetric local binary pattern (CS-LBP) with adaptive threshold (ATCS-LBP). Firstly, the features of face images can be extracted by the proposed operator after pretreatment. Secondly, the obtained feature image is divided into blocks. Thirdly, the histogram of each block is computed independently and all histograms can be connected serially to create a final feature vector. Finally, expression classification is achieved by using support vector machine (SVM) classifier. Experimental results on Japanese female facial expression (JAFFE) database show that the proposed algorithm can achieve a recognition rate of 81.9% when the resolution is as low as 16×16, which is much better than that of the traditional feature extraction operators.
Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates.

PubMed

LeDell, Erin; Petersen, Maya; van der Laan, Mark

In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.
Gender Recognition from Human-Body Images Using Visible-Light and Thermal Camera Videos Based on a Convolutional Neural Network for Image Feature Extraction

PubMed Central

Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung

2017-01-01

Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images. PMID:28335510
Gender Recognition from Human-Body Images Using Visible-Light and Thermal Camera Videos Based on a Convolutional Neural Network for Image Feature Extraction.

PubMed

Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung

2017-03-20

Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

PubMed Central

Petersen, Maya; van der Laan, Mark

2015-01-01

In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC. PMID:26279737
New Results on Contact Binary Stars

NASA Astrophysics Data System (ADS)

He, J.; Qian, S.; Zhu, L.; Liu, L.; Liao, W.

2014-08-01

Contact binary star is a kind of close binary with the strongest interaction binary system. Their formations and evolutions are unsolved problems in astrophysics. Since 2000, our groups have observed and studied more than half a hundred of contact binaries. In this report, I will summarize our new results of some contact binary stars (e.g. UZ CMi, GSC 03526-01995, FU Dra, GSC 0763-0572, V524 Mon, MR Com, etc.). They are as follow: (1) We discovered that V524 Mon and MR Com are shallow-contact binaries with their period decreasing; (2) GSC 03526-01995 is middle-contact binary without a period increasing or decreasing continuously; (3) UZ CMi, GSC 0763-0572 and FU Dra are middle-contact binaries with the period increasing continuously; (4) UZ CMi, GSC 03526-01995, FU Dra and V524 Mon show period oscillation which may imply the presence of additional components in these contact binaries.
Classification of Stellar Orbits in Axisymmetric Galaxies

NASA Astrophysics Data System (ADS)

Li, Baile; Holley-Bockelmann, Kelly; Khan, Fazeel Mahmood

2015-09-01

It is known that two supermassive black holes (SMBHs) cannot merge in a spherical galaxy within a Hubble time; an emerging picture is that galaxy geometry, rotation, and large potential perturbations may usher the SMBH binary through the critical three-body scattering phase and ultimately drive the SMBH to coalesce. We explore the orbital content within an N-body model of a mildly flattened, non-rotating, SMBH-embedded elliptical galaxy. When used as the foundation for a study on the SMBH binary coalescence, the black holes bypassed the binary stalling often seen within spherical galaxies and merged on gigayear timescales. Using both frequency-mapping and angular momentum criteria, we identify a wealth of resonant orbits in the axisymmetric model, including saucers, that are absent from an otherwise identical spherical system and that can potentially interact with the binary. We quantified the set of orbits that could be scattered by the SMBH binary, and found that the axisymmetric model contained nearly six times the number of these potential loss cone orbits compared to our equivalent spherical model. In this flattened model, the mass of these orbits is more than three times that of the SMBH, which is consistent with what the SMBH binary needs to scatter to transition into the gravitational wave regime.
CLASSIFICATION OF STELLAR ORBITS IN AXISYMMETRIC GALAXIES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Baile; Holley-Bockelmann, Kelly; Khan, Fazeel Mahmood, E-mail: baile.li@vanderbilt.edu, E-mail: k.holley@vanderbilt.edu, E-mail: khanfazeel.ist@gmail.com

2015-09-20

It is known that two supermassive black holes (SMBHs) cannot merge in a spherical galaxy within a Hubble time; an emerging picture is that galaxy geometry, rotation, and large potential perturbations may usher the SMBH binary through the critical three-body scattering phase and ultimately drive the SMBH to coalesce. We explore the orbital content within an N-body model of a mildly flattened, non-rotating, SMBH-embedded elliptical galaxy. When used as the foundation for a study on the SMBH binary coalescence, the black holes bypassed the binary stalling often seen within spherical galaxies and merged on gigayear timescales. Using both frequency-mapping andmore » angular momentum criteria, we identify a wealth of resonant orbits in the axisymmetric model, including saucers, that are absent from an otherwise identical spherical system and that can potentially interact with the binary. We quantified the set of orbits that could be scattered by the SMBH binary, and found that the axisymmetric model contained nearly six times the number of these potential loss cone orbits compared to our equivalent spherical model. In this flattened model, the mass of these orbits is more than three times that of the SMBH, which is consistent with what the SMBH binary needs to scatter to transition into the gravitational wave regime.« less
Using multiclass classification to automate the identification of patient safety incident reports by type and severity.

PubMed

Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah

2017-06-12

Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type = 2860, n_ SeverityLevel = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused with medium risk incidents (SAC3). Binary classifier ensembles appear to be a feasible method for identifying incidents by type and severity level. Automated identification should enable safety problems to be detected and addressed in a more timely manner. Multi-label classifiers may be necessary for reports that relate to more than one incident type.
Hunting for brown dwarf binaries with X-Shooter

NASA Astrophysics Data System (ADS)

Manjavacas, E.; Goldman, B.; Alcalá, J. M.; Zapatero-Osorio, M. R.; Béjar, B. J. S.; Homeier, D.; Bonnefoy, M.; Smart, R. L.; Henning, T.; Allard, F.

2015-05-01

The refinement of the brown dwarf binary fraction may contribute to the understanding of the substellar formation mechanisms. Peculiar brown dwarf spectra or discrepancy between optical and near-infrared spectral type classification of brown dwarfs may indicate unresolved brown dwarf binary systems. We obtained medium-resolution spectra of 22 brown dwarfs of potential binary candidates using X-Shooter at the VLT. We aimed to select brown dwarf binary candidates. We also tested whether BT-Settl 2014 atmospheric models reproduce the physics in the atmospheres of these objects. To find different spectral type spectral binaries, we used spectral indices and we compared the selected candidates to single spectra and composition of two single spectra from libraries, to try to reproduce our X-Shooter spectra. We also created artificial binaries within the same spectral class, and we tried to find them using the same method as for brown dwarf binaries with different spectral types. We compared our spectra to the BT-Settl models 2014. We selected six possible candidates to be combination of L plus T brown dwarfs. All candidates, except one, are better reproduced by a combination of two single brown dwarf spectra than by a single spectrum. The one-sided F-test discarded this object as a binary candidate. We found that we are not able to find the artificial binaries with components of the same spectral type using the same method used for L plus T brown dwarfs. Best matches to models gave a range of effective temperatures between 950 K and 1900 K, a range of gravities between 4.0 and 5.5. Some best matches corresponded to supersolar metallicity.
Logistic Regression: Concept and Application

ERIC Educational Resources Information Center

Cokluk, Omay

2010-01-01

The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Pulsars in binary systems: probing binary stellar evolution and general relativity.

PubMed

Stairs, Ingrid H

2004-04-23

Radio pulsars in binary orbits often have short millisecond spin periods as a result of mass transfer from their companion stars. They therefore act as very precise, stable, moving clocks that allow us to investigate a large set of otherwise inaccessible astrophysical problems. The orbital parameters derived from high-precision binary pulsar timing provide constraints on binary evolution, characteristics of the binary pulsar population, and the masses of neutron stars with different mass-transfer histories. These binary systems also test gravitational theories, setting strong limits on deviations from general relativity. Surveys for new pulsars yield new binary systems that increase our understanding of all these fields and may open up whole new areas of physics, as most spectacularly evidenced by the recent discovery of an extremely relativistic double-pulsar system.
Psychology Problem Classification for Children and Youth.

ERIC Educational Resources Information Center

Minnesota Systems Research, Inc., Washington, DC.

The development of Psychology Problem Classification is an early step in the direction of providing a uniform nomenclature for classifying the needs and problems of children and youth. There are many potential uses for a diagnostic classification and coding system. The two most important uses for the practitioner are problem identification and…
Quantum Support Vector Machine for Big Data Classification

NASA Astrophysics Data System (ADS)

Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth

2014-09-01

Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.
An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

PubMed

Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

2012-01-01

Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.
Supernova Photometric Lightcurve Classification

NASA Astrophysics Data System (ADS)

Zaidi, Tayeb; Narayan, Gautham

2016-01-01

This is a preliminary report on photometric supernova classification. We first explore the properties of supernova light curves, and attempt to restructure the unevenly sampled and sparse data from assorted datasets to allow for processing and classification. The data was primarily drawn from the Dark Energy Survey (DES) simulated data, created for the Supernova Photometric Classification Challenge. This poster shows a method for producing a non-parametric representation of the light curve data, and applying a Random Forest classifier algorithm to distinguish between supernovae types. We examine the impact of Principal Component Analysis to reduce the dimensionality of the dataset, for future classification work. The classification code will be used in a stage of the ANTARES pipeline, created for use on the Large Synoptic Survey Telescope alert data and other wide-field surveys. The final figure-of-merit for the DES data in the r band was 60% for binary classification (Type I vs II).Zaidi was supported by the NOAO/KPNO Research Experiences for Undergraduates (REU) Program which is funded by the National Science Foundation Research Experiences for Undergraduates Program (AST-1262829).
Non-binary or genderqueer genders.

PubMed

Richards, Christina; Bouman, Walter Pierre; Seal, Leighton; Barker, Meg John; Nieder, Timo O; T'Sjoen, Guy

2016-01-01

Some people have a gender which is neither male nor female and may identify as both male and female at one time, as different genders at different times, as no gender at all, or dispute the very idea of only two genders. The umbrella terms for such genders are 'genderqueer' or 'non-binary' genders. Such gender identities outside of the binary of female and male are increasingly being recognized in legal, medical and psychological systems and diagnostic classifications in line with the emerging presence and advocacy of these groups of people. Population-based studies show a small percentage--but a sizable proportion in terms of raw numbers--of people who identify as non-binary. While such genders have been extant historically and globally, they remain marginalized, and as such--while not being disorders or pathological in themselves--people with such genders remain at risk of victimization and of minority or marginalization stress as a result of discrimination. This paper therefore reviews the limited literature on this field and considers ways in which (mental) health professionals may assist the people with genderqueer and non-binary gender identities and/or expressions they may see in their practice. Treatment options and associated risks are discussed.
Individually adapted imagery improves brain-computer interface performance in end-users with disability.

PubMed

Scherer, Reinhold; Faller, Josef; Friedrich, Elisabeth V C; Opisso, Eloy; Costa, Ursula; Kübler, Andrea; Müller-Putz, Gernot R

2015-01-01

Brain-computer interfaces (BCIs) translate oscillatory electroencephalogram (EEG) patterns into action. Different mental activities modulate spontaneous EEG rhythms in various ways. Non-stationarity and inherent variability of EEG signals, however, make reliable recognition of modulated EEG patterns challenging. Able-bodied individuals who use a BCI for the first time achieve - on average - binary classification performance of about 75%. Performance in users with central nervous system (CNS) tissue damage is typically lower. User training generally enhances reliability of EEG pattern generation and thus also robustness of pattern recognition. In this study, we investigated the impact of mental tasks on binary classification performance in BCI users with central nervous system (CNS) tissue damage such as persons with stroke or spinal cord injury (SCI). Motor imagery (MI), that is the kinesthetic imagination of movement (e.g. squeezing a rubber ball with the right hand), is the "gold standard" and mainly used to modulate EEG patterns. Based on our recent results in able-bodied users, we hypothesized that pair-wise combination of "brain-teaser" (e.g. mental subtraction and mental word association) and "dynamic imagery" (e.g. hand and feet MI) tasks significantly increases classification performance of induced EEG patterns in the selected end-user group. Within-day (How stable is the classification within a day?) and between-day (How well does a model trained on day one perform on unseen data of day two?) analysis of variability of mental task pair classification in nine individuals confirmed the hypothesis. We found that the use of the classical MI task pair hand vs. feed leads to significantly lower classification accuracy - in average up to 15% less - in most users with stroke or SCI. User-specific selection of task pairs was again essential to enhance performance. We expect that the gained evidence will significantly contribute to make imagery-based BCI technology become accessible to a larger population of users including individuals with special needs due to CNS damage.
Individually Adapted Imagery Improves Brain-Computer Interface Performance in End-Users with Disability

PubMed Central

Scherer, Reinhold; Faller, Josef; Friedrich, Elisabeth V. C.; Opisso, Eloy; Costa, Ursula; Kübler, Andrea; Müller-Putz, Gernot R.

2015-01-01

Brain-computer interfaces (BCIs) translate oscillatory electroencephalogram (EEG) patterns into action. Different mental activities modulate spontaneous EEG rhythms in various ways. Non-stationarity and inherent variability of EEG signals, however, make reliable recognition of modulated EEG patterns challenging. Able-bodied individuals who use a BCI for the first time achieve - on average - binary classification performance of about 75%. Performance in users with central nervous system (CNS) tissue damage is typically lower. User training generally enhances reliability of EEG pattern generation and thus also robustness of pattern recognition. In this study, we investigated the impact of mental tasks on binary classification performance in BCI users with central nervous system (CNS) tissue damage such as persons with stroke or spinal cord injury (SCI). Motor imagery (MI), that is the kinesthetic imagination of movement (e.g. squeezing a rubber ball with the right hand), is the "gold standard" and mainly used to modulate EEG patterns. Based on our recent results in able-bodied users, we hypothesized that pair-wise combination of "brain-teaser" (e.g. mental subtraction and mental word association) and "dynamic imagery" (e.g. hand and feet MI) tasks significantly increases classification performance of induced EEG patterns in the selected end-user group. Within-day (How stable is the classification within a day?) and between-day (How well does a model trained on day one perform on unseen data of day two?) analysis of variability of mental task pair classification in nine individuals confirmed the hypothesis. We found that the use of the classical MI task pair hand vs. feed leads to significantly lower classification accuracy - in average up to 15% less - in most users with stroke or SCI. User-specific selection of task pairs was again essential to enhance performance. We expect that the gained evidence will significantly contribute to make imagery-based BCI technology become accessible to a larger population of users including individuals with special needs due to CNS damage. PMID:25992718
A Be-type star with a black-hole companion.

PubMed

Casares, J; Negueruela, I; Ribó, M; Ribas, I; Paredes, J M; Herrero, A; Simón-Díaz, S

2014-01-16

Stellar-mass black holes have all been discovered through X-ray emission, which arises from the accretion of gas from their binary companions (this gas is either stripped from low-mass stars or supplied as winds from massive ones). Binary evolution models also predict the existence of black holes accreting from the equatorial envelope of rapidly spinning Be-type stars (stars of the Be type are hot blue irregular variables showing characteristic spectral emission lines of hydrogen). Of the approximately 80 Be X-ray binaries known in the Galaxy, however, only pulsating neutron stars have been found as companions. A black hole was formally allowed as a solution for the companion to the Be star MWC 656 (ref. 5; also known as HD 215227), although that conclusion was based on a single radial velocity curve of the Be star, a mistaken spectral classification and rough estimates of the inclination angle. Here we report observations of an accretion disk line mirroring the orbit of MWC 656. This, together with an improved radial velocity curve of the Be star through fitting sharp Fe II profiles from the equatorial disk, and a refined Be classification (to that of a B1.5-B2 III star), indicates that a black hole of 3.8 to 6.9 solar masses orbits MWC 656, the candidate counterpart of the γ-ray source AGL J2241+4454 (refs 5, 6). The black hole is X-ray quiescent and fed by a radiatively inefficient accretion flow giving a luminosity less than 1.6 × 10(-7) times the Eddington luminosity. This implies that Be binaries with black-hole companions are difficult to detect in conventional X-ray surveys.
Social Work Problem Classification for Children and Youth.

ERIC Educational Resources Information Center

Minnesota Systems Research, Inc., Washington, DC.

The development of the Social Work Problem Classification is an early step in the provision of a uniform nomenclature for classifying the needs and problems of children and youth. There are many potential uses for a diagnostic classification and coding system. The two most important for the practitioner are: (1) problem identification and…
Optimizing area under the ROC curve using semi-supervised learning

PubMed Central

Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M.

2014-01-01

Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.1 PMID:25395692
Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers.

PubMed

Borchani, Hanen; Bielza, Concha; Toro, Carlos; Larrañaga, Pedro

2013-03-01

Our aim is to use multi-dimensional Bayesian network classifiers in order to predict the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors given an input set of respective resistance mutations that an HIV patient carries. Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models especially designed to solve multi-dimensional classification problems, where each input instance in the data set has to be assigned simultaneously to multiple output class variables that are not necessarily binary. In this paper, we introduce a new method, named MB-MBC, for learning MBCs from data by determining the Markov blanket around each class variable using the HITON algorithm. Our method is applied to both reverse transcriptase and protease data sets obtained from the Stanford HIV-1 database. Regarding the prediction of antiretroviral combination therapies, the experimental study shows promising results in terms of classification accuracy compared with state-of-the-art MBC learning algorithms. For reverse transcriptase inhibitors, we get 71% and 11% in mean and global accuracy, respectively; while for protease inhibitors, we get more than 84% and 31% in mean and global accuracy, respectively. In addition, the analysis of MBC graphical structures lets us gain insight into both known and novel interactions between reverse transcriptase and protease inhibitors and their respective resistance mutations. MB-MBC algorithm is a valuable tool to analyze the HIV-1 reverse transcriptase and protease inhibitors prediction problem and to discover interactions within and between these two classes of inhibitors. Copyright © 2012 Elsevier B.V. All rights reserved.
Optimizing area under the ROC curve using semi-supervised learning.

PubMed

Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M

2015-01-01

Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.

DEVELOPMENT OF THE EPA RELATIVE MOLDINESS INDEX© AND ITS RELATIONSHIP TO HEALTH

EPA Science Inventory

This study compared two binary classification methods to evaluate the mold condition in 271 homes of infants, 144 of which later developed symptoms of respiratory illness. A method using on-site visual mold inspection was compared to another method using a quantitative index of ...
Learning to rank image tags with limited training examples.

PubMed

Songhe Feng; Zheyun Feng; Rong Jin

2015-04-01

With an increasing number of images that are available in social media, image annotation has emerged as an important research topic due to its application in image matching and retrieval. Most studies cast image annotation into a multilabel classification problem. The main shortcoming of this approach is that it requires a large number of training images with clean and complete annotations in order to learn a reliable model for tag prediction. We address this limitation by developing a novel approach that combines the strength of tag ranking with the power of matrix recovery. Instead of having to make a binary decision for each tag, our approach ranks tags in the descending order of their relevance to the given image, significantly simplifying the problem. In addition, the proposed method aggregates the prediction models for different tags into a matrix, and casts tag ranking into a matrix recovery problem. It introduces the matrix trace norm to explicitly control the model complexity, so that a reliable prediction model can be learned for tag ranking even when the tag space is large and the number of training images is limited. Experiments on multiple well-known image data sets demonstrate the effectiveness of the proposed framework for tag ranking compared with the state-of-the-art approaches for image annotation and tag ranking.
Automated particle identification through regression analysis of size, shape and colour

NASA Astrophysics Data System (ADS)

Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.

2016-04-01

Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.
A binary linear programming formulation of the graph edit distance.

PubMed

Justice, Derek; Hero, Alfred

2006-08-01

A binary linear programming formulation of the graph edit distance for unweighted, undirected graphs with vertex attributes is derived and applied to a graph recognition problem. A general formulation for editing graphs is used to derive a graph edit distance that is proven to be a metric, provided the cost function for individual edit operations is a metric. Then, a binary linear program is developed for computing this graph edit distance, and polynomial time methods for determining upper and lower bounds on the solution of the binary program are derived by applying solution methods for standard linear programming and the assignment problem. A recognition problem of comparing a sample input graph to a database of known prototype graphs in the context of a chemical information system is presented as an application of the new method. The costs associated with various edit operations are chosen by using a minimum normalized variance criterion applied to pairwise distances between nearest neighbors in the database of prototypes. The new metric is shown to perform quite well in comparison to existing metrics when applied to a database of chemical graphs.
The application of improved NeuroEvolution of Augmenting Topologies neural network in Marcellus Shale lithofacies prediction

NASA Astrophysics Data System (ADS)

Wang, Guochang; Cheng, Guojian; Carr, Timothy R.

2013-04-01

The organic-rich Marcellus Shale was deposited in a foreland basin during Middle Devonian. In terms of mineral composition and organic matter richness, we define seven mudrock lithofacies: three organic-rich lithofacies and four organic-poor lithofacies. The 3D lithofacies model is very helpful to determine geologic and engineering sweet spots, and consequently useful for designing horizontal well trajectories and stimulation strategies. The NeuroEvolution of Augmenting Topologies (NEAT) is relatively new idea in the design of neural networks, and shed light on classification (i.e., Marcellus Shale lithofacies prediction). We have successfully enhanced the capability and efficiency of NEAT in three aspects. First, we introduced two new attributes of node gene, the node location and recurrent connection (RCC), to increase the calculation efficiency. Second, we evolved the population size from an initial small value to big, instead of using the constant value, which saves time and computer memory, especially for complex learning tasks. Third, in multiclass pattern recognition problems, we combined feature selection of input variables and modular neural network to automatically select input variables and optimize network topology for each binary classifier. These improvements were tested and verified by true if an odd number of its arguments are true and false otherwise (XOR) experiments, and were powerful for classification.
A distributed approach for optimizing cascaded classifier topologies in real-time stream mining systems.

PubMed

Foo, Brian; van der Schaar, Mihaela

2010-11-01

In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.
Chaotic Particle Swarm Optimization with Mutation for Classification

PubMed Central

Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza

2015-01-01

In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms. PMID:25709937
Planet Formation in Binaries

NASA Astrophysics Data System (ADS)

Thebault, P.; Haghighipour, N.

Spurred by the discovery of numerous exoplanets in multiple systems, binaries have become in recent years one of the main topics in planet formation research. Numerous studies have investigated to what extent the presence of a stellar companion can affect the planet formation process. Such studies have implications that can reach beyond the sole context of binaries, as they allow to test certain aspects of the planet formation scenario by submitting them to extreme environments. We review here the current understanding on this complex problem. We show in particular how each of the different stages of the planet-formation process is affected differently by binary perturbations. We focus especially on the intermediate stage of kilometre-sized planetesimal accretion, which has proven to be the most sensitive to binarity and for which the presence of some exoplanets observed in tight binaries is difficult to explain by in-situ formation following the "standard" planet-formation scenario. Some tentative solutions to this apparent paradox are presented. The last part of our review presents a thorough description of the problem of planet habitability, for which the binary environment creates a complex situation because of the presence of two irradation sources of varying distance.
Satisfiability modulo theory and binary puzzle

NASA Astrophysics Data System (ADS)

Utomo, Putranto

2017-06-01

The binary puzzle is a sudoku-like puzzle with values in each cell taken from the set {0, 1}. We look at the mathematical theory behind it. A solved binary puzzle is an n × n binary array where n is even that satisfies the following conditions: (1) No three consecutive ones and no three consecutive zeros in each row and each column, (2) Every row and column is balanced, that is the number of ones and zeros must be equal in each row and in each column, (3) Every two rows and every two columns must be distinct. The binary puzzle had been proven to be an NP-complete problem [5]. Research concerning the satisfiability of formulas with respect to some background theory is called satisfiability modulo theory (SMT). An SMT solver is an extension of a satisfiability (SAT) solver. The notion of SMT can be used for solving various problem in mathematics and industries such as formula verification and operation research [1, 7]. In this paper we apply SMT to solve binary puzzles. In addition, we do an experiment in solving different sizes and different number of blanks. We also made comparison with two other approaches, namely by a SAT solver and exhaustive search.
Invariant correlation to position and rotation using a binary mask applied to binary and gray images

NASA Astrophysics Data System (ADS)

Álvarez-Borrego, Josué; Solorza, Selene; Bueno-Ibarra, Mario A.

2013-05-01

In this paper more alternative ways to generate the binary ring masks are studied and a new methodology is presented when in the analysis the image come with some distortion due to rotation. This new algorithm requires low computational cost. Signature vectors of the target so like signature vectors of the object to be recognized in the problem image are obtained using a binary ring mask constructed in accordance with the real or the imaginary part of their Fourier transform analyzing two different conditions in each one. In this manner, each image target or problem image, will have four unique binary ring masks. The four ways are analyzed and the best is chosen. In addition, due to any image with rotation include some distortion, the best transect is chosen in the Fourier plane in order to obtain the best signature through the different ways to obtain the binary mask. This methodology is applied to two cases: to identify different types of alphabetic letters in Arial font and to identify different fossil diatoms images. Considering the great similarity between diatom images the results obtained are excellent.
Rotation-invariant image and video description with local binary pattern features.

PubMed

Zhao, Guoying; Ahonen, Timo; Matas, Jiří; Pietikäinen, Matti

2012-04-01

In this paper, we propose a novel approach to compute rotation-invariant features from histograms of local noninvariant patterns. We apply this approach to both static and dynamic local binary pattern (LBP) descriptors. For static-texture description, we present LBP histogram Fourier (LBP-HF) features, and for dynamic-texture recognition, we present two rotation-invariant descriptors computed from the LBPs from three orthogonal planes (LBP-TOP) features in the spatiotemporal domain. LBP-HF is a novel rotation-invariant image descriptor computed from discrete Fourier transforms of LBP histograms. The approach can be also generalized to embed any uniform features into this framework, and combining the supplementary information, e.g., sign and magnitude components of the LBP, together can improve the description ability. Moreover, two variants of rotation-invariant descriptors are proposed to the LBP-TOP, which is an effective descriptor for dynamic-texture recognition, as shown by its recent success in different application problems, but it is not rotation invariant. In the experiments, it is shown that the LBP-HF and its extensions outperform noninvariant and earlier versions of the rotation-invariant LBP in the rotation-invariant texture classification. In experiments on two dynamic-texture databases with rotations or view variations, the proposed video features can effectively deal with rotation variations of dynamic textures (DTs). They also are robust with respect to changes in viewpoint, outperforming recent methods proposed for view-invariant recognition of DTs.
Building and Solving Odd-One-Out Classification Problems: A Systematic Approach

ERIC Educational Resources Information Center

Ruiz, Philippe E.

2011-01-01

Classification problems ("find the odd-one-out") are frequently used as tests of inductive reasoning to evaluate human or animal intelligence. This paper introduces a systematic method for building the set of all possible classification problems, followed by a simple algorithm for solving the problems of the R-ASCM, a psychometric test derived…
Bayesian Redshift Classification of Emission-line Galaxies with Photometric Equivalent Widths

NASA Astrophysics Data System (ADS)

Leung, Andrew S.; Acquaviva, Viviana; Gawiser, Eric; Ciardullo, Robin; Komatsu, Eiichiro; Malz, A. I.; Zeimann, Gregory R.; Bridge, Joanna S.; Drory, Niv; Feldmeier, John J.; Finkelstein, Steven L.; Gebhardt, Karl; Gronwall, Caryl; Hagen, Alex; Hill, Gary J.; Schneider, Donald P.

2017-07-01

We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for high-redshift Lyα-emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW {W}{Lyα }) greater than 20 Å. Our Bayesian method relies on known prior probabilities in measured emission-line luminosity functions and EW distributions for the galaxy populations, and returns the probability that an object in question is an LAE given the characteristics observed. This approach will be directly relevant for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), which seeks to classify ˜106 emission-line galaxies into LAEs and low-redshift [{{O}} {{II}}] emitters. For a simulated HETDEX catalog with realistic measurement noise, our Bayesian method recovers 86% of LAEs missed by the traditional {W}{Lyα } > 20 Å cutoff over 2 < z < 3, outperforming the EW cut in both contamination and incompleteness. This is due to the method’s ability to trade off between the two types of binary classification error by adjusting the stringency of the probability requirement for classifying an observed object as an LAE. In our simulations of HETDEX, this method reduces the uncertainty in cosmological distance measurements by 14% with respect to the EW cut, equivalent to recovering 29% more cosmological information. Rather than using binary object labels, this method enables the use of classification probabilities in large-scale structure analyses. It can be applied to narrowband emission-line surveys as well as upcoming large spectroscopic surveys including Euclid and WFIRST.
PCANet: A Simple Deep Learning Baseline for Image Classification?

PubMed

Chan, Tsung-Han; Jia, Kui; Gao, Shenghua; Lu, Jiwen; Zeng, Zinan; Ma, Yi

2015-12-01

In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets for different tasks, including Labeled Faces in the Wild (LFW) for face verification; the MultiPIE, Extended Yale B, AR, Facial Recognition Technology (FERET) data sets for face recognition; and MNIST for hand-written digit recognition. Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned [by deep neural networks (DNNs)]. Even more surprisingly, the model sets new records for many classification tasks on the Extended Yale B, AR, and FERET data sets and on MNIST variations. Additional experiments on other public data sets also demonstrate the potential of PCANet to serve as a simple but highly competitive baseline for texture classification and object recognition.
An Ultrasonographic Periodontal Probe

NASA Astrophysics Data System (ADS)

Bertoncini, C. A.; Hinders, M. K.

2010-02-01

Periodontal disease, commonly known as gum disease, affects millions of people. The current method of detecting periodontal pocket depth is painful, invasive, and inaccurate. As an alternative to manual probing, an ultrasonographic periodontal probe is being developed to use ultrasound echo waveforms to measure periodontal pocket depth, which is the main measure of periodontal disease. Wavelet transforms and pattern classification techniques are implemented in artificial intelligence routines that can automatically detect pocket depth. The main pattern classification technique used here, called a binary classification algorithm, compares test objects with only two possible pocket depth measurements at a time and relies on dimensionality reduction for the final determination. This method correctly identifies up to 90% of the ultrasonographic probe measurements within the manual probe's tolerance.
The Spot Variability and Related Brightness variations of the Solar Type PreContact W UMa Binary System V1001 Cas

NASA Astrophysics Data System (ADS)

Samec, Ronald George; Koenke, Sam S.; Faulkner, Danny R.

2015-08-01

A new classification of eclipsing binary has emerged, Pre Contact WUMa Binaries (PCWB’s, Samec et al. 2012). These solar-type systems are usually detached or semidetached with one or both components under filling their critical Roche lobes. They usually have EA or EB-type light curves (unequal eclipse depths, indicating components with substantially different temperatures). The accepted scenario for these W UMa binaries is that they are undergoing steady but slow angular momentum losses due to magnetic braking as stellar winds blow radially away on stiff bipolar field lines. These binaries are believed to come into stable contact and eventually coalesce into blue straggler type, single, fast rotating A-type stars (Guinan and Bradstreet,1988). High precision 2012 and 2009 light curves are compared for the very short period (~0.43d) Precontact W UMa Binary (PCWB), V1001 Cassiopeia. This is the shortest period PCWB found so far. Its short period, similar to the majority of W UMa’s, in contrast to its distinct Algol-type light curve, make it a very rare and interesting system. Our solutions of light curves separated by some three years give approximately the same physical parameters. However the spots radically change, in temperature, area and position causing a distinctive variation in the shape of the light curves. We conclude that spots are very active on this solar type dwarf system and that it may mimic its larger cousins, the RS CVn binaries.
The massive star binary fraction in young open clusters - II. NGC6611 (Eagle Nebula)

NASA Astrophysics Data System (ADS)

Sana, H.; Gosset, E.; Evans, C. J.

2009-12-01

Based on a set of over 100 medium- to high-resolution optical spectra collected from 2003 to 2009, we investigate the properties of the O-type star population in NGC6611 in the core of the Eagle Nebula (M16). Using a much more extended data set than previously available, we revise the spectral classification and multiplicity status of the nine O-type stars in our sample. We confirm two suspected binaries and derive the first SB2 orbital solutions for two systems. We further report that two other objects are displaying a composite spectrum, suggesting possible long-period binaries. Our analysis is supported by a set of Monte Carlo simulations, allowing us to estimate the detection biases of our campaign and showing that the latter do not affect our conclusions. The absolute minimal binary fraction in our sample is fmin = 0.44 but could be as high as 0.67 if all the binary candidates are confirmed. As in NGC6231 (see Paper I), up to 75 per cent of the O star population in NGC6611 are found in an O+OB system, thus implicitly excluding random pairing from a classical IMF as a process to describe the companion association in massive binaries. No statistical difference could be further identified in the binary fraction, mass-ratio and period distributions between NGC6231 and NGC 6611, despite the difference in age and environment of the two clusters.
Full Ionisation In Binary-Binary Encounters With Small Positive Energies

NASA Astrophysics Data System (ADS)

Sweatman, W. L.

2006-08-01

Interactions between binary stars and single stars and binary stars and other binary stars play a key role in the dynamics of a dense stellar system. Energy can be transferred between the internal dynamics of a binary and the larger scale dynamics of the interacting objects. Binaries can be destroyed and created by the interaction. In a binary-binary encounter, full ionisation occurs when both of the binary stars are destroyed in the interaction to create four single stars. This is only possible when the total energy of the system is positive. For very small energies the probability of this occurring is very low and it tends towards zero as the total energy tends towards zero. Here the case is considered for which all the stars have equal masses. An asymptotic power law is predicted relating the probability of full ionisation with the total energy when this latter quantity is small. The exponent, which is approximately 2.31, is compared with the results from numerical scattering experiments. The theoretical approach taken is similar to one used previously in the three-body problem. It makes use of the fact that the most dramatic changes in scale and energies of a few-body system occur when its components pass near to a central configuration. The position, and number, of these configurations is not known for the general four-body problem, however, with equal masses there are known to be exactly five different cases. Separate consideration and comparison of the properties of orbits close to each of these five central configurations enables the prediction of the form of the cross-section for full ionisation for the case of small positive total energy. This is the relation between total energy and the probability of total ionisation described above.
A hybrid binary particle swarm optimization for large capacitated multi item multi level lot sizing (CMIMLLS) problem

NASA Astrophysics Data System (ADS)

Mishra, S. K.; Sahithi, V. V. D.; Rao, C. S. P.

2016-09-01

The lot sizing problem deals with finding optimal order quantities which minimizes the ordering and holding cost of product mix. when multiple items at multiple levels with all capacity restrictions are considered, the lot sizing problem become NP hard. Many heuristics were developed in the past have inevitably failed due to size, computational complexity and time. However the authors were successful in the development of PSO based technique namely iterative improvement binary particles swarm technique to address very large capacitated multi-item multi level lot sizing (CMIMLLS) problem. First binary particle Swarm Optimization algorithm is used to find a solution in a reasonable time and iterative improvement local search mechanism is employed to improvise the solution obtained by BPSO algorithm. This hybrid mechanism of using local search on the global solution is found to improve the quality of solutions with respect to time thus IIBPSO method is found best and show excellent results.
Choice of optimal working fluid for binary power plants at extremely low temperature brine

NASA Astrophysics Data System (ADS)

Tomarov, G. V.; Shipkov, A. A.; Sorokina, E. V.

2016-12-01

The geothermal energy development problems based on using binary power plants utilizing lowpotential geothermal resources are considered. It is shown that one of the possible ways of increasing the efficiency of heat utilization of geothermal brine in a wide temperature range is the use of multistage power systems with series-connected binary power plants based on incremental primary energy conversion. Some practically significant results of design-analytical investigations of physicochemical properties of various organic substances and their influence on the main parameters of the flowsheet and the technical and operational characteristics of heat-mechanical and heat-exchange equipment for binary power plant operating on extremely-low temperature geothermal brine (70°C) are presented. The calculation results of geothermal brine specific flow rate, capacity (net), and other operation characteristics of binary power plants with the capacity of 2.5 MW at using various organic substances are a practical interest. It is shown that the working fluid selection significantly influences on the parameters of the flowsheet and the operational characteristics of the binary power plant, and the problem of selection of working fluid is in the search for compromise based on the priorities in the field of efficiency, safety, and ecology criteria of a binary power plant. It is proposed in the investigations on the working fluid selection of the binary plant to use the plotting method of multiaxis complex diagrams of relative parameters and characteristic of binary power plants. Some examples of plotting and analyzing these diagrams intended to choose the working fluid provided that the efficiency of geothermal brine is taken as main priority.

"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.

PubMed

Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit

2016-10-24

To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.
Characterizing X-ray Sources in the Rich Open Cluster NGC 7789 Using XMM-Newton

NASA Astrophysics Data System (ADS)

Farner, William; Pooley, David

2018-01-01

It is well established that globular clusters exhibit a correlation between their population of exotic binaries and their rate of stellar encounters, but little work has been done to characterize this relationship in rich open clusters. X-ray observations are the most efficient means to find various types of close binaries, and optical (and radio) identifications can provide secure source classifications. We report on an observation of the rich open cluster NGC 7789 using the XMM-Newton observatory. We present the X-ray and optical imaging data, source lists, and preliminary characterization of the sources based on their X-ray and multiwavelength properties.
Compact binary hashing for music retrieval

NASA Astrophysics Data System (ADS)

Seo, Jin S.

2014-03-01

With the huge volume of music clips available for protection, browsing, and indexing, there is an increased attention to retrieve the information contents of the music archives. Music-similarity computation is an essential building block for browsing, retrieval, and indexing of digital music archives. In practice, as the number of songs available for searching and indexing is increased, so the storage cost in retrieval systems is becoming a serious problem. This paper deals with the storage problem by extending the supervector concept with the binary hashing. We utilize the similarity-preserving binary embedding in generating a hash code from the supervector of each music clip. Especially we compare the performance of the various binary hashing methods for music retrieval tasks on the widely-used genre dataset and the in-house singer dataset. Through the evaluation, we find an effective way of generating hash codes for music similarity estimation which improves the retrieval performance.
VizieR Online Data Catalog: Kepler Mission. VII. Eclipsing binaries in DR3 (Kirk+, 2016)

NASA Astrophysics Data System (ADS)

Kirk, B.; Conroy, K.; Prsa, A.; Abdul-Masih, M.; Kochoska, A.; Matijevic, G.; Hambleton, K.; Barclay, T.; Bloemen, S.; Boyajian, T.; Doyle, L. R.; Fulton, B. J.; Hoekstra, A. J.; Jek, K.; Kane, S. R.; Kostov, V.; Latham, D.; Mazeh, T.; Orosz, J. A.; Pepper, J.; Quarles, B.; Ragozzine, D.; Shporer, A.; Southworth, J.; Stassun, K.; Thompson, S. E.; Welsh, W. F.; Agol, E.; Derekas, A.; Devor, J.; Fischer, D.; Green, G.; Gropp, J.; Jacobs, T.; Johnston, C.; Lacourse, D. M.; Saetre, K.; Schwengeler, H.; Toczyski, J.; Werner, G.; Garrett, M.; Gore, J.; Martinez, A. O.; Spitzer, I.; Stevick, J.; Thomadis, P. C.; Vrijmoet, E. H.; Yenawine, M.; Batalha, N.; Borucki, W.

2016-07-01

The Kepler Eclipsing Binary Catalog lists the stellar parameters from the Kepler Input Catalog (KIC) augmented by: primary and secondary eclipse depth, eclipse width, separation of eclipse, ephemeris, morphological classification parameter, and principal parameters determined by geometric analysis of the phased light curve. The previous release of the Catalog (Paper II; Slawson et al. 2011, cat. J/AJ/142/160) contained 2165 objects, through the second Kepler data release (Q0-Q2). In this release, 2878 objects are identified and analyzed from the entire data set of the primary Kepler mission (Q0-Q17). The online version of the Catalog is currently maintained at http://keplerEBs.villanova.edu/. A static version of the online Catalog associated with this paper is maintained at MAST https://archive.stsci.edu/kepler/eclipsing_binaries.html. (10 data files).
Double-lined M dwarf eclipsing binaries from Catalina Sky Survey and LAMOST

NASA Astrophysics Data System (ADS)

Lee, Chien-Hsiu; Lin, Chien-Cheng

2017-02-01

Eclipsing binaries provide a unique opportunity to determine fundamental stellar properties. In the era of wide-field cameras and all-sky imaging surveys, thousands of eclipsing binaries have been reported through light curve classification, yet their basic properties remain unexplored due to the extensive efforts needed to follow them up spectroscopically. In this paper we investigate three M2-M3 type double-lined eclipsing binaries discovered by cross-matching eclipsing binaries from the Catalina Sky Survey with spectroscopically classified M dwarfs from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope survey data release one and two. Because these three M dwarf binaries are faint, we further acquire radial velocity measurements using GMOS on the Gemini North telescope with R˜ 4000, enabling us to determine the mass and radius of individual stellar components. By jointly fitting the light and radial velocity curves of these systems, we derive the mass and radius of the primary and secondary components of these three systems, in the range between 0.28-0.42M_⊙ and 0.29-0.67R_⊙, respectively. Future observations with a high resolution spectrograph will help us pin down the uncertainties in their stellar parameters, and render these systems benchmarks to study M dwarfs, providing inputs to improving stellar models in the low mass regime, or establishing an empirical mass-radius relation for M dwarf stars.
Cell classification using big data analytics plus time stretch imaging (Conference Presentation)

NASA Astrophysics Data System (ADS)

Jalali, Bahram; Chen, Claire L.; Mahjoubfar, Ata

2016-09-01

We show that blood cells can be classified with high accuracy and high throughput by combining machine learning with time stretch quantitative phase imaging. Our diagnostic system captures quantitative phase images in a flow microscope at millions of frames per second and extracts multiple biophysical features from individual cells including morphological characteristics, light absorption and scattering parameters, and protein concentration. These parameters form a hyperdimensional feature space in which supervised learning and cell classification is performed. We show binary classification of T-cells against colon cancer cells, as well classification of algae cell strains with high and low lipid content. The label-free screening averts the negative impact of staining reagents on cellular viability or cell signaling. The combination of time stretch machine vision and learning offers unprecedented cell analysis capabilities for cancer diagnostics, drug development and liquid biopsy for personalized genomics.
An ordinal classification approach for CTG categorization.

PubMed

Georgoulas, George; Karvelis, Petros; Gavrilis, Dimitris; Stylios, Chrysostomos D; Nikolakopoulos, George

2017-07-01

Evaluation of cardiotocogram (CTG) is a standard approach employed during pregnancy and delivery. But, its interpretation requires high level expertise to decide whether the recording is Normal, Suspicious or Pathological. Therefore, a number of attempts have been carried out over the past three decades for development automated sophisticated systems. These systems are usually (multiclass) classification systems that assign a category to the respective CTG. However most of these systems usually do not take into consideration the natural ordering of the categories associated with CTG recordings. In this work, an algorithm that explicitly takes into consideration the ordering of CTG categories, based on binary decomposition method, is investigated. Achieved results, using as a base classifier the C4.5 decision tree classifier, prove that the ordinal classification approach is marginally better than the traditional multiclass classification approach, which utilizes the standard C4.5 algorithm for several performance criteria.
Feature Selection and Parameters Optimization of SVM Using Particle Swarm Optimization for Fault Classification in Power Distribution Systems.

PubMed

Cho, Ming-Yuan; Hoang, Thi Thom

2017-01-01

Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Decoding memory features from hippocampal spiking activities using sparse classification models.

PubMed

Dong Song; Hampson, Robert E; Robinson, Brian S; Marmarelis, Vasilis Z; Deadwyler, Sam A; Berger, Theodore W

2016-08-01

To understand how memory information is encoded in the hippocampus, we build classification models to decode memory features from hippocampal CA3 and CA1 spatio-temporal patterns of spikes recorded from epilepsy patients performing a memory-dependent delayed match-to-sample task. The classification model consists of a set of B-spline basis functions for extracting memory features from the spike patterns, and a sparse logistic regression classifier for generating binary categorical output of memory features. Results show that classification models can extract significant amount of memory information with respects to types of memory tasks and categories of sample images used in the task, despite the high level of variability in prediction accuracy due to the small sample size. These results support the hypothesis that memories are encoded in the hippocampal activities and have important implication to the development of hippocampal memory prostheses.
Classification of X-ray sources in the direction of M31

NASA Astrophysics Data System (ADS)

Vasilopoulos, G.; Hatzidimitriou, D.; Pietsch, W.

2012-01-01

M31 is our nearest spiral galaxy, at a distance of 780 kpc. Identification of X-ray sources in nearby galaxies is important for interpreting the properties of more distant ones, mainly because we can classify nearby sources using both X-ray and optical data, while more distant ones via X-rays alone. The XMM-Newton Large Project for M31 has produced an abundant sample of about 1900 X-ray sources in the direction of M31. Most of them remain elusive, giving us little signs of their origin. Our goal is to classify these sources using criteria based on properties of already identified ones. In particular we construct candidate lists of high mass X-ray binaries, low mass X-ray binaries, X-ray binaries correlated with globular clusters and AGN based on their X-ray emission and the properties of their optical counterparts, if any. Our main methodology consists of identifying particular loci of X-ray sources on X-ray hardness ratio diagrams and the color magnitude diagrams of their optical counterparts. Finally, we examined the X-ray luminosity function of the X-ray binaries populations.
A Novel Design of 4-Class BCI Using Two Binary Classifiers and Parallel Mental Tasks

PubMed Central

Geng, Tao; Gan, John Q.; Dyson, Matthew; Tsui, Chun SL; Sepulveda, Francisco

2008-01-01

A novel 4-class single-trial brain computer interface (BCI) based on two (rather than four or more) binary linear discriminant analysis (LDA) classifiers is proposed, which is called a “parallel BCI.” Unlike other BCIs where mental tasks are executed and classified in a serial way one after another, the parallel BCI uses properly designed parallel mental tasks that are executed on both sides of the subject body simultaneously, which is the main novelty of the BCI paradigm used in our experiments. Each of the two binary classifiers only classifies the mental tasks executed on one side of the subject body, and the results of the two binary classifiers are combined to give the result of the 4-class BCI. Data was recorded in experiments with both real movement and motor imagery in 3 able-bodied subjects. Artifacts were not detected or removed. Offline analysis has shown that, in some subjects, the parallel BCI can generate a higher accuracy than a conventional 4-class BCI, although both of them have used the same feature selection and classification algorithms. PMID:18584040
Contributions to "k"-Means Clustering and Regression via Classification Algorithms

ERIC Educational Resources Information Center

Salman, Raied

2012-01-01

The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Prediction of occult invasive disease in ductal carcinoma in situ using computer-extracted mammographic features

NASA Astrophysics Data System (ADS)

Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

2017-03-01

Predicting the risk of occult invasive disease in ductal carcinoma in situ (DCIS) is an important task to help address the overdiagnosis and overtreatment problems associated with breast cancer. In this work, we investigated the feasibility of using computer-extracted mammographic features to predict occult invasive disease in patients with biopsy proven DCIS. We proposed a computer-vision algorithm based approach to extract mammographic features from magnification views of full field digital mammography (FFDM) for patients with DCIS. After an expert breast radiologist provided a region of interest (ROI) mask for the DCIS lesion, the proposed approach is able to segment individual microcalcifications (MCs), detect the boundary of the MC cluster (MCC), and extract 113 mammographic features from MCs and MCC within the ROI. In this study, we extracted mammographic features from 99 patients with DCIS (74 pure DCIS; 25 DCIS plus invasive disease). The predictive power of the mammographic features was demonstrated through binary classifications between pure DCIS and DCIS with invasive disease using linear discriminant analysis (LDA). Before classification, the minimum redundancy Maximum Relevance (mRMR) feature selection method was first applied to choose subsets of useful features. The generalization performance was assessed using Leave-One-Out Cross-Validation and Receiver Operating Characteristic (ROC) curve analysis. Using the computer-extracted mammographic features, the proposed model was able to distinguish DCIS with invasive disease from pure DCIS, with an average classification performance of AUC = 0.61 +/- 0.05. Overall, the proposed computer-extracted mammographic features are promising for predicting occult invasive disease in DCIS.
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.

PubMed

Ma, Li; Fan, Suohai

2017-03-14

The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure-Activity Relationship and Machine Learning Methods

EPA Science Inventory

ABSTRACT: There are thousands of environmental chemicals subject to regulatory decisions for endocrine disrupting potential. A promising approach to manage this large universe of untested chemicals is to use a prioritization filter that combines in vitro assays with in silico QSA...
Anomalous change detection in imagery

DOEpatents

Theiler, James P [Los Alamos, NM; Perkins, Simon J [Santa Fe, NM

2011-05-31

A distribution-based anomaly detection platform is described that identifies a non-flat background that is specified in terms of the distribution of the data. A resampling approach is also disclosed employing scrambled resampling of the original data with one class specified by the data and the other by the explicit distribution, and solving using binary classification.
A taxonomy-based approach to shed light on the babel of mathematical analogies for rice simulation

USDA-ARS?s Scientific Manuscript database

For most biophysical domains, different models are available and the extent to which their structures differ with respect to differences in outputs was never quantified. We use a taxonomy-based approach to address the question with thirteen rice models. Classification keys and binary attributes for ...
Identification and Classification of Orthogonal Frequency Division Multiple Access (OFDMA) Signals Used in Next Generation Wireless Systems

DTIC Science & Technology

2012-03-01

advanced antenna systems AMC adaptive modulation and coding AWGN additive white Gaussian noise BPSK binary phase shift keying BS base station BTC ...QAM-16, and QAM-64, and coding types include convolutional coding (CC), convolutional turbo coding (CTC), block turbo coding ( BTC ), zero-terminating
Bayesian Hierarchical Classes Analysis

ERIC Educational Resources Information Center

Leenen, Iwin; Van Mechelen, Iven; Gelman, Andrew; De Knop, Stijn

2008-01-01

Hierarchical classes models are models for "N"-way "N"-mode data that represent the association among the "N" modes and simultaneously yield, for each mode, a hierarchical classification of its elements. In this paper we present a stochastic extension of the hierarchical classes model for two-way two-mode binary data. In line with the original…
An application to pulmonary emphysema classification based on model of texton learning by sparse representation

NASA Astrophysics Data System (ADS)

Zhang, Min; Zhou, Xiangrong; Goshima, Satoshi; Chen, Huayue; Muramatsu, Chisako; Hara, Takeshi; Yokoyama, Ryojiro; Kanematsu, Masayuki; Fujita, Hiroshi

2012-03-01

We aim at using a new texton based texture classification method in the classification of pulmonary emphysema in computed tomography (CT) images of the lungs. Different from conventional computer-aided diagnosis (CAD) pulmonary emphysema classification methods, in this paper, firstly, the dictionary of texton is learned via applying sparse representation(SR) to image patches in the training dataset. Then the SR coefficients of the test images over the dictionary are used to construct the histograms for texture presentations. Finally, classification is performed by using a nearest neighbor classifier with a histogram dissimilarity measure as distance. The proposed approach is tested on 3840 annotated regions of interest consisting of normal tissue and mild, moderate and severe pulmonary emphysema of three subtypes. The performance of the proposed system, with an accuracy of about 88%, is comparably higher than state of the art method based on the basic rotation invariant local binary pattern histograms and the texture classification method based on texton learning by k-means, which performs almost the best among other approaches in the literature.

Extending cluster Lot Quality Assurance Sampling designs for surveillance programs

PubMed Central

Hund, Lauren; Pagano, Marcello

2014-01-01

Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible non-parametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. PMID:24633656
Automatic Extraction of Road Markings from Mobile Laser-Point Cloud Using Intensity Data

NASA Astrophysics Data System (ADS)

Yao, L.; Chen, Q.; Qin, C.; Wu, H.; Zhang, S.

2018-04-01

With the development of intelligent transportation, road's high precision information data has been widely applied in many fields. This paper proposes a concise and practical way to extract road marking information from point cloud data collected by mobile mapping system (MMS). The method contains three steps. Firstly, road surface is segmented through edge detection from scan lines. Then the intensity image is generated by inverse distance weighted (IDW) interpolation and the road marking is extracted by using adaptive threshold segmentation based on integral image without intensity calibration. Moreover, the noise is reduced by removing a small number of plaque pixels from binary image. Finally, point cloud mapped from binary image is clustered into marking objects according to Euclidean distance, and using a series of algorithms including template matching and feature attribute filtering for the classification of linear markings, arrow markings and guidelines. Through processing the point cloud data collected by RIEGL VUX-1 in case area, the results show that the F-score of marking extraction is 0.83, and the average classification rate is 0.9.
Extending cluster lot quality assurance sampling designs for surveillance programs.

PubMed

Hund, Lauren; Pagano, Marcello

2014-07-20

Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance on the basis of the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible nonparametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. Copyright © 2014 John Wiley & Sons, Ltd.
A fast learning method for large scale and multi-class samples of SVM

NASA Astrophysics Data System (ADS)

Fan, Yu; Guo, Huiming

2017-06-01

A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
EEG Correlates of Ten Positive Emotions.

PubMed

Hu, Xin; Yu, Jianwen; Song, Mengdi; Yu, Chun; Wang, Fei; Sun, Pei; Wang, Daifa; Zhang, Dan

2017-01-01

Compared with the well documented neurophysiological findings on negative emotions, much less is known about positive emotions. In the present study, we explored the EEG correlates of ten different positive emotions (joy, gratitude, serenity, interest, hope, pride, amusement, inspiration, awe, and love). A group of 20 participants were invited to watch 30 short film clips with their EEGs simultaneously recorded. Distinct topographical patterns for different positive emotions were found for the correlation coefficients between the subjective ratings on the ten positive emotions per film clip and the corresponding EEG spectral powers in different frequency bands. Based on the similarities of the participants' ratings on the ten positive emotions, these emotions were further clustered into three representative clusters, as 'encouragement' for awe, gratitude, hope, inspiration, pride, 'playfulness' for amusement, joy, interest, and 'harmony' for love, serenity. Using the EEG spectral powers as features, both the binary classification on the higher and lower ratings on these positive emotions and the binary classification between the three positive emotion clusters, achieved accuracies of approximately 80% and above. To our knowledge, our study provides the first piece of evidence on the EEG correlates of different positive emotions.
Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines.

PubMed

Wijaya, Sony Hartono; Afendi, Farit Mochamad; Batubara, Irmanida; Darusman, Latifah K; Altaf-Ul-Amin, Md; Kanaya, Shigehiko

2016-12-07

The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d.
Capacity improvement using simulation optimization approaches: A case study in the thermotechnology industry

NASA Astrophysics Data System (ADS)

Yelkenci Köse, Simge; Demir, Leyla; Tunalı, Semra; Türsel Eliiyi, Deniz

2015-02-01

In manufacturing systems, optimal buffer allocation has a considerable impact on capacity improvement. This study presents a simulation optimization procedure to solve the buffer allocation problem in a heat exchanger production plant so as to improve the capacity of the system. For optimization, three metaheuristic-based search algorithms, i.e. a binary-genetic algorithm (B-GA), a binary-simulated annealing algorithm (B-SA) and a binary-tabu search algorithm (B-TS), are proposed. These algorithms are integrated with the simulation model of the production line. The simulation model, which captures the stochastic and dynamic nature of the production line, is used as an evaluation function for the proposed metaheuristics. The experimental study with benchmark problem instances from the literature and the real-life problem show that the proposed B-TS algorithm outperforms B-GA and B-SA in terms of solution quality.
On the existence of binary simplex codes. [using combinatorial construction

NASA Technical Reports Server (NTRS)

Taylor, H.

1977-01-01

Using a simple combinatorial construction, the existence of a binary simplex code with m codewords for all m is greater than or equal to 1 is proved. The problem of the shortest possible length is left open.
Recursive heuristic classification

NASA Technical Reports Server (NTRS)

Wilkins, David C.

1994-01-01

The author will describe a new problem-solving approach called recursive heuristic classification, whereby a subproblem of heuristic classification is itself formulated and solved by heuristic classification. This allows the construction of more knowledge-intensive classification programs in a way that yields a clean organization. Further, standard knowledge acquisition and learning techniques for heuristic classification can be used to create, refine, and maintain the knowledge base associated with the recursively called classification expert system. The method of recursive heuristic classification was used in the Minerva blackboard shell for heuristic classification. Minerva recursively calls itself every problem-solving cycle to solve the important blackboard scheduler task, which involves assigning a desirability rating to alternative problem-solving actions. Knowing these ratings is critical to the use of an expert system as a component of a critiquing or apprenticeship tutoring system. One innovation of this research is a method called dynamic heuristic classification, which allows selection among dynamically generated classification categories instead of requiring them to be prenumerated.
Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets

NASA Astrophysics Data System (ADS)

Bazzica, A.; van Gemert, J. C.; Liem, C. C. S.; Hanjalic, A.

2017-05-01

Acoustic events often have a visual counterpart. Knowledge of visual information can aid the understanding of complex auditory scenes, even when only a stereo mixdown is available in the audio domain, \\eg identifying which musicians are playing in large musical ensembles. In this paper, we consider a vision-based approach to note onset detection. As a case study we focus on challenging, real-world clarinetist videos and carry out preliminary experiments on a 3D convolutional neural network based on multiple streams and purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5 hours of clarinetist videos together with cleaned annotations which include about 36,000 onsets and the coordinates for a number of salient points and regions of interest. By performing several training trials on our dataset, we learned that the problem is challenging. We found that the CNN model is highly sensitive to the optimization algorithm and hyper-parameters, and that treating the problem as binary classification may prevent the joint optimization of precision and recall. To encourage further research, we publicly share our dataset, annotations and all models and detail which issues we came across during our preliminary experiments.
Unsupervised universal steganalyzer for high-dimensional steganalytic features

NASA Astrophysics Data System (ADS)

Hou, Xiaodan; Zhang, Tao

2016-11-01

The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.
The search for structure - Object classification in large data sets. [for astronomers

NASA Technical Reports Server (NTRS)

Kurtz, Michael J.

1988-01-01

Research concerning object classifications schemes are reviewed, focusing on large data sets. Classification techniques are discussed, including syntactic, decision theoretic methods, fuzzy techniques, and stochastic and fuzzy grammars. Consideration is given to the automation of MK classification (Morgan and Keenan, 1973) and other problems associated with the classification of spectra. In addition, the classification of galaxies is examined, including the problems of systematic errors, blended objects, galaxy types, and galaxy clusters.
Beyond logistic regression: structural equations modelling for binary variables and its application to investigating unobserved confounders.

PubMed

Kupek, Emil

2006-03-15

Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Robust image region descriptor using local derivative ordinal binary pattern

NASA Astrophysics Data System (ADS)

Shang, Jun; Chen, Chuanbo; Pei, Xiaobing; Liang, Hu; Tang, He; Sarem, Mudar

2015-05-01

Binary image descriptors have received a lot of attention in recent years, since they provide numerous advantages, such as low memory footprint and efficient matching strategy. However, they utilize intermediate representations and are generally less discriminative than floating-point descriptors. We propose an image region descriptor, namely local derivative ordinal binary pattern, for object recognition and image categorization. In order to preserve more local contrast and edge information, we quantize the intensity differences between the central pixels and their neighbors of the detected local affine covariant regions in an adaptive way. These differences are then sorted and mapped into binary codes and histogrammed with a weight of the sum of the absolute value of the differences. Furthermore, the gray level of the central pixel is quantized to further improve the discriminative ability. Finally, we combine them to form a joint histogram to represent the features of the image. We observe that our descriptor preserves more local brightness and edge information than traditional binary descriptors. Also, our descriptor is robust to rotation, illumination variations, and other geometric transformations. We conduct extensive experiments on the standard ETHZ and Kentucky datasets for object recognition and PASCAL for image classification. The experimental results show that our descriptor outperforms existing state-of-the-art methods.
Non-Routine Problems in Primary Mathematics Workbooks from Romania

ERIC Educational Resources Information Center

Marchis, Iuliana

2012-01-01

The aim of this paper is to present a research on Hungarian 3th grade primary school textbooks from Romania. These textbooks are analyzed using two classifications. The first classification is based on how much creativity and problem solving skills pupils need to solve a given task. In this classification problems are gouped in three categories:…
Supernovae in Binary Systems: An Application of Classical Mechanics.

ERIC Educational Resources Information Center

Mitalas, R.

1980-01-01

Presents the supernova explosion in a binary system as an application of classical mechanics. This presentation is intended to illustrate the power of the equivalent one-body problem and provide undergraduate students with a variety of insights into elementary classical mechanics. (HM)
Binary optics: Trends and limitations

NASA Technical Reports Server (NTRS)

Farn, Michael W.; Veldkamp, Wilfrid B.

1993-01-01

We describe the current state of binary optics, addressing both the technology and the industry (i.e., marketplace). With respect to the technology, the two dominant aspects are optical design methods and fabrication capabilities, with the optical design problem being limited by human innovation in the search for new applications and the fabrication issue being limited by the availability of resources required to improve fabrication capabilities. With respect to the industry, the current marketplace does not favor binary optics as a separate product line and so we expect that companies whose primary purpose is the production of binary optics will not represent the bulk of binary optics production. Rather, binary optics' more natural role is as an enabling technology - a technology which will directly result in a competitive advantage in a company's other business areas - and so we expect that the majority of binary optics will be produced for internal use.
Optimal two-phase sampling design for comparing accuracies of two binary classification rules.

PubMed

Xu, Huiping; Hui, Siu L; Grannis, Shaun

2014-02-10

In this paper, we consider the design for comparing the performance of two binary classification rules, for example, two record linkage algorithms or two screening tests. Statistical methods are well developed for comparing these accuracy measures when the gold standard is available for every unit in the sample, or in a two-phase study when the gold standard is ascertained only in the second phase in a subsample using a fixed sampling scheme. However, these methods do not attempt to optimize the sampling scheme to minimize the variance of the estimators of interest. In comparing the performance of two classification rules, the parameters of primary interest are the difference in sensitivities, specificities, and positive predictive values. We derived the analytic variance formulas for these parameter estimates and used them to obtain the optimal sampling design. The efficiency of the optimal sampling design is evaluated through an empirical investigation that compares the optimal sampling with simple random sampling and with proportional allocation. Results of the empirical study show that the optimal sampling design is similar for estimating the difference in sensitivities and in specificities, and both achieve a substantial amount of variance reduction with an over-sample of subjects with discordant results and under-sample of subjects with concordant results. A heuristic rule is recommended when there is no prior knowledge of individual sensitivities and specificities, or the prevalence of the true positive findings in the study population. The optimal sampling is applied to a real-world example in record linkage to evaluate the difference in classification accuracy of two matching algorithms. Copyright © 2013 John Wiley & Sons, Ltd.
EEG-based Affect and Workload Recognition in a Virtual Driving Environment for ASD Intervention

PubMed Central

Wade, Joshua W.; Key, Alexandra P.; Warren, Zachary E.; Sarkar, Nilanjan

2017-01-01

objective To build group-level classification models capable of recognizing affective states and mental workload of individuals with autism spectrum disorder (ASD) during driving skill training. Methods Twenty adolescents with ASD participated in a six-session virtual reality driving simulator based experiment, during which their electroencephalogram (EEG) data were recorded alongside driving events and a therapist’s rating of their affective states and mental workload. Five feature generation approaches including statistical features, fractal dimension features, higher order crossings (HOC)-based features, power features from frequency bands, and power features from bins (Δf = 2 Hz) were applied to extract relevant features. Individual differences were removed with a two-step feature calibration method. Finally, binary classification results based on the k-nearest neighbors algorithm and univariate feature selection method were evaluated by leave-one-subject-out nested cross-validation to compare feature types and identify discriminative features. Results The best classification results were achieved using power features from bins for engagement (0.95) and boredom (0.78), and HOC-based features for enjoyment (0.90), frustration (0.88), and workload (0.86). Conclusion Offline EEG-based group-level classification models are feasible for recognizing binary low and high intensity of affect and workload of individuals with ASD in the context of driving. However, while promising the applicability of the models in an online adaptive driving task requires further development. Significance The developed models provide a basis for an EEG-based passive brain computer interface system that has the potential to benefit individuals with ASD with an affect- and workload-based individualized driving skill training intervention. PMID:28422647
Single aflatoxin contaminated corn kernel analysis with fluorescence hyperspectral image

NASA Astrophysics Data System (ADS)

Yao, Haibo; Hruska, Zuzana; Kincaid, Russell; Ononye, Ambrose; Brown, Robert L.; Cleveland, Thomas E.

2010-04-01

Aflatoxins are toxic secondary metabolites of the fungi Aspergillus flavus and Aspergillus parasiticus, among others. Aflatoxin contaminated corn is toxic to domestic animals when ingested in feed and is a known carcinogen associated with liver and lung cancer in humans. Consequently, aflatoxin levels in food and feed are regulated by the Food and Drug Administration (FDA) in the US, allowing 20 ppb (parts per billion) limits in food and 100 ppb in feed for interstate commerce. Currently, aflatoxin detection and quantification methods are based on analytical tests including thin-layer chromatography (TCL) and high performance liquid chromatography (HPLC). These analytical tests require the destruction of samples, and are costly and time consuming. Thus, the ability to detect aflatoxin in a rapid, nondestructive way is crucial to the grain industry, particularly to corn industry. Hyperspectral imaging technology offers a non-invasive approach toward screening for food safety inspection and quality control based on its spectral signature. The focus of this paper is to classify aflatoxin contaminated single corn kernels using fluorescence hyperspectral imagery. Field inoculated corn kernels were used in the study. Contaminated and control kernels under long wavelength ultraviolet excitation were imaged using a visible near-infrared (VNIR) hyperspectral camera. The imaged kernels were chemically analyzed to provide reference information for image analysis. This paper describes a procedure to process corn kernels located in different images for statistical training and classification. Two classification algorithms, Maximum Likelihood and Binary Encoding, were used to classify each corn kernel into "control" or "contaminated" through pixel classification. The Binary Encoding approach had a slightly better performance with accuracy equals to 87% or 88% when 20 ppb or 100 ppb was used as classification threshold, respectively.

Improving Predictions of Multiple Binary Models in ILP

PubMed Central

2014-01-01

Despite the success of ILP systems in learning first-order rules from small number of examples and complexly structured data in various domains, they struggle in dealing with multiclass problems. In most cases they boil down a multiclass problem into multiple black-box binary problems following the one-versus-one or one-versus-rest binarisation techniques and learn a theory for each one. When evaluating the learned theories of multiple class problems in one-versus-rest paradigm particularly, there is a bias caused by the default rule toward the negative classes leading to an unrealistic high performance beside the lack of prediction integrity between the theories. Here we discuss the problem of using one-versus-rest binarisation technique when it comes to evaluating multiclass data and propose several methods to remedy this problem. We also illustrate the methods and highlight their link to binary tree and Formal Concept Analysis (FCA). Our methods allow learning of a simple, consistent, and reliable multiclass theory by combining the rules of the multiple one-versus-rest theories into one rule list or rule set theory. Empirical evaluation over a number of data sets shows that our proposed methods produce coherent and accurate rule models from the rules learned by the ILP system of Aleph. PMID:24696657
Mass loss in the 96 day binary UU Cancri

NASA Technical Reports Server (NTRS)

Eaton, Joel A.; Hall, Douglas S.; Honeycutt, R. K.

1991-01-01

A series of 16 high-dispersion spectra at H-alpha have been obtained for the tidally distorted K giant-containing long-period binary UU Cnc, in order to both study the K giant's Doppler profiles and search for the effects of accretion onto the second component. While Doppler profiles of the system for a semidetached configuration fit the observations very well, those for existing overcontact light-curve solutions all yield poorer fits. Although the H-alpha line is always stronger than those of the common giants, its equivalent width is consistent with a K4 II classification reflective of the star's 50 solar radius size. Emission wings in H-alpha are possible evidence for an accretion disk.
Identification of Alzheimer's disease and mild cognitive impairment using multimodal sparse hierarchical extreme learning machine.

PubMed

Kim, Jongin; Lee, Boreom

2018-05-07

Different modalities such as structural MRI, FDG-PET, and CSF have complementary information, which is likely to be very useful for diagnosis of AD and MCI. Therefore, it is possible to develop a more effective and accurate AD/MCI automatic diagnosis method by integrating complementary information of different modalities. In this paper, we propose multi-modal sparse hierarchical extreme leaning machine (MSH-ELM). We used volume and mean intensity extracted from 93 regions of interest (ROIs) as features of MRI and FDG-PET, respectively, and used p-tau, t-tau, and Aβ42 as CSF features. In detail, high-level representation was individually extracted from each of MRI, FDG-PET, and CSF using a stacked sparse extreme learning machine auto-encoder (sELM-AE). Then, another stacked sELM-AE was devised to acquire a joint hierarchical feature representation by fusing the high-level representations obtained from each modality. Finally, we classified joint hierarchical feature representation using a kernel-based extreme learning machine (KELM). The results of MSH-ELM were compared with those of conventional ELM, single kernel support vector machine (SK-SVM), multiple kernel support vector machine (MK-SVM) and stacked auto-encoder (SAE). Performance was evaluated through 10-fold cross-validation. In the classification of AD vs. HC and MCI vs. HC problem, the proposed MSH-ELM method showed mean balanced accuracies of 96.10% and 86.46%, respectively, which is much better than those of competing methods. In summary, the proposed algorithm exhibits consistently better performance than SK-SVM, ELM, MK-SVM and SAE in the two binary classification problems (AD vs. HC and MCI vs. HC). © 2018 Wiley Periodicals, Inc.
Dementia in Adults with Mental Retardation: Assessment at a Single Point in Time

ERIC Educational Resources Information Center

Silverman, Wayne; Schupf, Nicole; Zigman, Warren; Devenny, Darlynne; Miezejeski, Charles; Schubert, Romaine; Ryan, Robert

2004-01-01

Dementia status of 273 adults with mental retardation was rated based upon two extensive evaluations conducted 18 months apart. Overall, 184 individuals did not have dementia, 33 had possible or definite dementia, and 66 had findings suggesting uncertain or questionable status. These ratings were compared to binary classifications (dementia vs. no…
An unconventional approach to ecosystem unit classification in western North Carolina, USA

Treesearch

W. Henry McNab; Sara A. Browning; Steven A. Simon; Penelope E. Fouts

1999-01-01

The authors used an unconventional combination of data transformation and multivariate analyses to reduce subjectivity in identification of ecosystem units in a mountainous region of western North Carolina, USA. Vegetative cover and environmental variables were measured on 79 stratified, randomly located, 0.1 ha sample plots in a 4000 ha watershed. Binary...
Generalized Partial Least Squares Approach for Nominal Multinomial Logit Regression Models with a Functional Covariate

ERIC Educational Resources Information Center

Albaqshi, Amani Mohammed H.

2017-01-01

Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional…
Infalling clouds on to supermassive black hole binaries - II. Binary evolution and the final parsec problem

NASA Astrophysics Data System (ADS)

Goicovic, Felipe G.; Sesana, Alberto; Cuadra, Jorge; Stasyszyn, Federico

2017-11-01

The formation of massive black hole binaries (MBHBs) is an unavoidable outcome of galaxy evolution via successive mergers. However, the mechanism that drives their orbital evolution from parsec separations down to the gravitational wave dominated regime is poorly understood, and their final fate is still unclear. If such binaries are embedded in gas-rich and turbulent environments, as observed in remnants of galaxy mergers, the interaction with gas clumps (such as molecular clouds) may efficiently drive their orbital evolution. Using numerical simulations, we test this hypothesis by studying the dynamical evolution of an equal mass, circular MBHB accreting infalling molecular clouds. We investigate different orbital configurations, modelling a total of 13 systems to explore different possible impact parameters and relative inclinations of the cloud-binary encounter. We focus our study on the prompt, transient phase during the first few orbits when the dynamical evolution of the binary is fastest, finding that this evolution is dominated by the exchange of angular momentum through gas capture by the individual black holes and accretion. Building on these results, we construct a simple model for evolving an MBHB interacting with a sequence of clouds, which are randomly drawn from reasonable populations with different levels of anisotropy in their angular momenta distributions. We show that the binary efficiently evolves down to the gravitational wave emission regime within a few hundred million years, overcoming the 'final parsec' problem regardless of the stellar distribution.
Efficient Decoding of Compressed Data.

ERIC Educational Resources Information Center

Bassiouni, Mostafa A.; Mukherjee, Amar

1995-01-01

Discusses the problem of enhancing the speed of Huffman decoding of compressed data. Topics addressed include the Huffman decoding tree; multibit decoding; binary string mapping problems; and algorithms for solving mapping problems. (22 references) (LRW)
What we learn from eclipsing binaries in the ultraviolet

NASA Technical Reports Server (NTRS)

Guinan, Edward F.

1990-01-01

Recent results on stars and stellar physics from IUE (International Ultraviolet Explorer) observations of eclipsing binaries are discussed. Several case studies are presented, including V 444 Cyg, Aur stars, V 471 Tau and AR Lac. Topics include stellar winds and mass loss, stellar atmospheres, stellar dynamos, and surface activity. Studies of binary star dynamics and evolution are discussed. The progress made with IUE in understanding the complex dynamical and evolutionary processes taking place in W UMa-type binaries and Algol systems is highlighted. The initial results of intensive studies of the W UMa star VW Cep and three representative Algol-type binaries (in different stages of evolution) focused on gas flows and accretion, are included. The future prospects of eclipsing binary research are explored. Remaining problems are surveyed and the next challenges are presented. The roles that eclipsing binaries could play in studies of stellar evolution, cluster dynamics, galactic structure, mass luminosity relations for extra galactic systems, cosmology, and even possible detection of extra solar system planets using eclipsing binaries are discussed.
Ab-initio study of liquid systems: Concentration dependence of electrical resistivity of binary liquid alloy Rb1-xCsx

NASA Astrophysics Data System (ADS)

Thakur, Anil; Sharma, Nalini; Chandel, Surjeet; Ahluwalia, P. K.

2013-02-01

The electrical resistivity (ρL) of Rb1-XCsX binary alloys has been made calculated using Troullier Martins ab-initio pseudopotentials. The present results of the electrical resistivity (ρL) of Rb1-XCsX binary alloys have been found in good agreement with the experimental results. These results suggest that ab-initio approach for calculating electrical resistivity is quite successful in explaining the electronic transport properties of binary Liquid alloys. Hence ab-initio pseudopotentials can be used instead of model pseudopotentials having problem of transferability.
Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval.

PubMed

Wang, Di; Gao, Xinbo; Wang, Xiumei; He, Lihuo; Yuan, Bo

2016-10-01

Multimodal hashing, which conducts effective and efficient nearest neighbor search across heterogeneous data on large-scale multimedia databases, has been attracting increasing interest, given the explosive growth of multimedia content on the Internet. Recent multimodal hashing research mainly aims at learning the compact binary codes to preserve semantic information given by labels. The overwhelming majority of these methods are similarity preserving approaches which approximate pairwise similarity matrix with Hamming distances between the to-be-learnt binary hash codes. However, these methods ignore the discriminative property in hash learning process, which results in hash codes from different classes undistinguished, and therefore reduces the accuracy and robustness for the nearest neighbor search. To this end, we present a novel multimodal hashing method, named multimodal discriminative binary embedding (MDBE), which focuses on learning discriminative hash codes. First, the proposed method formulates the hash function learning in terms of classification, where the binary codes generated by the learned hash functions are expected to be discriminative. And then, it exploits the label information to discover the shared structures inside heterogeneous data. Finally, the learned structures are preserved for hash codes to produce similar binary codes in the same class. Hence, the proposed MDBE can preserve both discriminability and similarity for hash codes, and will enhance retrieval accuracy. Thorough experiments on benchmark data sets demonstrate that the proposed method achieves excellent accuracy and competitive computational efficiency compared with the state-of-the-art methods for large-scale cross-modal retrieval task.
A Telescopic Binary Learning Machine for Training Neural Networks.

PubMed

Brunato, Mauro; Battiti, Roberto

2017-03-01

This paper proposes a new algorithm based on multiscale stochastic local search with binary representation for training neural networks [binary learning machine (BLM)]. We study the effects of neighborhood evaluation strategies, the effect of the number of bits per weight and that of the maximum weight range used for mapping binary strings to real values. Following this preliminary investigation, we propose a telescopic multiscale version of local search, where the number of bits is increased in an adaptive manner, leading to a faster search and to local minima of better quality. An analysis related to adapting the number of bits in a dynamic way is presented. The control on the number of bits, which happens in a natural manner in the proposed method, is effective to increase the generalization performance. The learning dynamics are discussed and validated on a highly nonlinear artificial problem and on real-world tasks in many application domains; BLM is finally applied to a problem requiring either feedforward or recurrent architectures for feedback control.
Performance Enhancement of Radial Distributed System with Distributed Generators by Reconfiguration Using Binary Firefly Algorithm

NASA Astrophysics Data System (ADS)

Rajalakshmi, N.; Padma Subramanian, D.; Thamizhavel, K.

2015-03-01

The extent of real power loss and voltage deviation associated with overloaded feeders in radial distribution system can be reduced by reconfiguration. Reconfiguration is normally achieved by changing the open/closed state of tie/sectionalizing switches. Finding optimal switch combination is a complicated problem as there are many switching combinations possible in a distribution system. Hence optimization techniques are finding greater importance in reducing the complexity of reconfiguration problem. This paper presents the application of firefly algorithm (FA) for optimal reconfiguration of radial distribution system with distributed generators (DG). The algorithm is tested on IEEE 33 bus system installed with DGs and the results are compared with binary genetic algorithm. It is found that binary FA is more effective than binary genetic algorithm in achieving real power loss reduction and improving voltage profile and hence enhancing the performance of radial distribution system. Results are found to be optimum when DGs are added to the test system, which proved the impact of DGs on distribution system.
Visualization of the significance of Receiver Operating Characteristics based on confidence ellipses

NASA Astrophysics Data System (ADS)

Sarlis, Nicholas V.; Christopoulos, Stavros-Richard G.

2014-03-01

The Receiver Operating Characteristics (ROC) is used for the evaluation of prediction methods in various disciplines like meteorology, geophysics, complex system physics, medicine etc. The estimation of the significance of a binary prediction method, however, remains a cumbersome task and is usually done by repeating the calculations by Monte Carlo. The FORTRAN code provided here simplifies this problem by evaluating the significance of binary predictions for a family of ellipses which are based on confidence ellipses and cover the whole ROC space. Catalogue identifier: AERY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERY_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 11511 No. of bytes in distributed program, including test data, etc.: 72906 Distribution format: tar.gz Programming language: FORTRAN. Computer: Any computer supporting a GNU FORTRAN compiler. Operating system: Linux, MacOS, Windows. RAM: 1Mbyte Classification: 4.13, 9, 14. Nature of problem: The Receiver Operating Characteristics (ROC) is used for the evaluation of prediction methods in various disciplines like meteorology, geophysics, complex system physics, medicine etc. The estimation of the significance of a binary prediction method, however, remains a cumbersome task and is usually done by repeating the calculations by Monte Carlo. The FORTRAN code provided here simplifies this problem by evaluating the significance of binary predictions for a family of ellipses which are based on confidence ellipses and cover the whole ROC space. Solution method: Using the statistics of random binary predictions for a given value of the predictor threshold ɛt, one can construct the corresponding confidence ellipses. The envelope of these corresponding confidence ellipses is estimated when ɛt varies from 0 to 1. This way a new family of ellipses is obtained, named k-ellipses, which covers the whole ROC plane and leads to a well defined Area Under the Curve (AUC). For the latter quantity, Mason and Graham [1] have shown that it follows the Mann-Whitney U-statistics [2] which can be applied [3] for the estimation of the statistical significance of each k-ellipse. As the transformation is invertible, any point on the ROC plane corresponds to a unique value of k, thus to a unique p-value to obtain this point by chance. The present FORTRAN code provides this p-value field on the ROC plane as well as the k-ellipses corresponding to the (p=)10%, 5% and 1% significance levels using as input the number of the positive (P) and negative (Q) cases to be predicted. Unusual features: In some machines, the compiler directive -O2 or -O3 should be used to avoid NaN’s in some points of the p-field along the diagonal. Running time: Depending on the application, e.g., 4s for an Intel(R) Core(TM)2 CPU E7600 at 3.06 GHz with 2 GB RAM for the examples presented here References: [1] S.J. Mason, N.E. Graham, Quart. J. Roy. Meteor. Soc. 128 (2002) 2145. [2] H.B. Mann, D.R. Whitney, Ann. Math. Statist. 18 (1947) 50. [3] L.C. Dinneen, B.C. Blakesley, J. Roy. Stat. Soc. Ser. C Appl. Stat. 22 (1973) 269.
A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection.

PubMed

Yasui, Yutaka; Pepe, Margaret; Thompson, Mary Lou; Adam, Bao-Ling; Wright, George L; Qu, Yinsheng; Potter, John D; Winget, Marcy; Thornquist, Mark; Feng, Ziding

2003-07-01

With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.
Variable Selection for Support Vector Machines in Moderately High Dimensions

PubMed Central

Zhang, Xiang; Wu, Yichao; Wang, Lan; Li, Runze

2015-01-01

Summary The support vector machine (SVM) is a powerful binary classification tool with high accuracy and great flexibility. It has achieved great success, but its performance can be seriously impaired if many redundant covariates are included. Some efforts have been devoted to studying variable selection for SVMs, but asymptotic properties, such as variable selection consistency, are largely unknown when the number of predictors diverges to infinity. In this work, we establish a unified theory for a general class of nonconvex penalized SVMs. We first prove that in ultra-high dimensions, there exists one local minimizer to the objective function of nonconvex penalized SVMs possessing the desired oracle property. We further address the problem of nonunique local minimizers by showing that the local linear approximation algorithm is guaranteed to converge to the oracle estimator even in the ultra-high dimensional setting if an appropriate initial estimator is available. This condition on initial estimator is verified to be automatically valid as long as the dimensions are moderately high. Numerical examples provide supportive evidence. PMID:26778916
Thermodynamic efficiency of learning a rule in neural networks

NASA Astrophysics Data System (ADS)

Goldt, Sebastian; Seifert, Udo

2017-11-01

Biological systems have to build models from their sensory input data that allow them to efficiently process previously unseen inputs. Here, we study a neural network learning a binary classification rule for these inputs from examples provided by a teacher. We analyse the ability of the network to apply the rule to new inputs, that is to generalise from past experience. Using stochastic thermodynamics, we show that the thermodynamic costs of the learning process provide an upper bound on the amount of information that the network is able to learn from its teacher for both batch and online learning. This allows us to introduce a thermodynamic efficiency of learning. We analytically compute the dynamics and the efficiency of a noisy neural network performing online learning in the thermodynamic limit. In particular, we analyse three popular learning algorithms, namely Hebbian, Perceptron and AdaTron learning. Our work extends the methods of stochastic thermodynamics to a new type of learning problem and might form a suitable basis for investigating the thermodynamics of decision-making.
Stars with relativistic speeds in the Hills scenario

NASA Astrophysics Data System (ADS)

Dremova, G. N.; Dremov, V. V.; Tutukov, A. V.

2017-07-01

The dynamical capture of a binary system consisting of a supermassive black hole (SMBH) and an ordinary star in the gravitational field of a central (more massive) SMBH is considered in the three-body problem in the framework of a modified Hills scenario. The results of numerical simulations predict the existence of objects whose spatial speeds are comparable to the speed of light. The conditions for and constraints imposed on the ejection speeds realized in a classical scenario and the modified Hills scenario are analyzed. The star is modeled using an N-body approach, making it possible to treat it as a structured object, enabling estimation of the probability that the object survives when it is ejected with relativistic speed as a function of the mass of the star, the masses of both SMBHs, and the pericenter distance. It is possible that the modern kinematic classification for stars with anomalously high spatial velocities will be augmented with a new class—stars with relativistic speeds.
Characterization of Early Cortical Neural Network ...

EPA Pesticide Factsheets

We examined the development of neural network activity using microelectrode array (MEA) recordings made in multi-well MEA plates (mwMEAs) over the first 12 days in vitro (DIV). In primary cortical cultures made from postnatal rats, action potential spiking activity was essentially absent on DIV 2 and developed rapidly between DIV 5 and 12. Spiking activity was primarily sporadic and unorganized at early DIV, and became progressively more organized with time in culture, with bursting parameters, synchrony and network bursting increasing between DIV 5 and 12. We selected 12 features to describe network activity and principal components analysis using these features demonstrated a general segregation of data by age at both the well and plate levels. Using a combination of random forest classifiers and Support Vector Machines, we demonstrated that 4 features (CV of within burst ISI, CV of IBI, network spike rate and burst rate) were sufficient to predict the age (either DIV 5, 7, 9 or 12) of each well recording with >65% accuracy. When restricting the classification problem to a binary decision, we found that classification improved dramatically, e.g. 95% accuracy for discriminating DIV 5 vs DIV 12 wells. Further, we present a novel resampling approach to determine the number of wells that might be needed for conducting comparisons of different treatments using mwMEA plates. Overall, these results demonstrate that network development on mwMEA plates is similar to
Federated learning of predictive models from federated Electronic Health Records.

PubMed

Brisimi, Theodora S; Chen, Ruidi; Mela, Theofanie; Olshevsky, Alex; Paschalidis, Ioannis Ch; Shi, Wei

2018-04-01

In an era of "big data," computationally efficient and privacy-aware solutions for large-scale machine learning problems become crucial, especially in the healthcare domain, where large amounts of data are stored in different locations and owned by different entities. Past research has been focused on centralized algorithms, which assume the existence of a central data repository (database) which stores and can process the data from all participants. Such an architecture, however, can be impractical when data are not centrally located, it does not scale well to very large datasets, and introduces single-point of failure risks which could compromise the integrity and privacy of the data. Given scores of data widely spread across hospitals/individuals, a decentralized computationally scalable methodology is very much in need. We aim at solving a binary supervised classification problem to predict hospitalizations for cardiac events using a distributed algorithm. We seek to develop a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data. We focus on the soft-margin l 1 -regularized sparse Support Vector Machine (sSVM) classifier. We develop an iterative cluster Primal Dual Splitting (cPDS) algorithm for solving the large-scale sSVM problem in a decentralized fashion. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the data holders to collaborate, while keeping every participant's data private. We test cPDS on the problem of predicting hospitalizations due to heart diseases within a calendar year based on information in the patients Electronic Health Records prior to that year. cPDS converges faster than centralized methods at the cost of some communication between agents. It also converges faster and with less communication overhead compared to an alternative distributed algorithm. In both cases, it achieves similar prediction accuracy measured by the Area Under the Receiver Operating Characteristic Curve (AUC) of the classifier. We extract important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts. Copyright © 2018 Elsevier B.V. All rights reserved.

Bi-temporal analysis of landscape changes in the easternmost mediterranean deltas using binary and classified change information.

PubMed

Alphan, Hakan

2013-03-01

The aim of this study is (1) to quantify landscape changes in the easternmost Mediterranean deltas using bi-temporal binary change detection approach and (2) to analyze relationships between conservation/management designations and various categories of change that indicate type, degree and severity of human impact. For this purpose, image differencing and ratioing were applied to Landsat TM images of 1984 and 2006. A total of 136 candidate change images including normalized difference vegetation index (NDVI) and principal component analysis (PCA) difference images were tested to understand performance of bi-temporal pre-classification analysis procedures in the Mediterranean delta ecosystems. Results showed that visible image algebra provided high accuracies than did NDVI and PCA differencing. On the other hand, Band 5 differencing had one of the lowest change detection performances. Seven superclasses of change were identified using from/to change categories between the earlier and later dates. These classes were used to understand spatial character of anthropogenic impacts in the study area and derive qualitative and quantitative change information within and outside of the conservation/management areas. Change analysis indicated that natural site and wildlife reserve designations fell short of protecting sand dunes from agricultural expansion in the west. East of the study area, however, was exposed to least human impact owing to the fact that nature conservation status kept human interference at a minimum. Implications of these changes were discussed and solutions were proposed to deal with management problems leading to environmental change.
An unbalanced spectra classification method based on entropy

NASA Astrophysics Data System (ADS)

Liu, Zhong-bao; Zhao, Wen-juan

2017-05-01

How to solve the problem of distinguishing the minority spectra from the majority of the spectra is quite important in astronomy. In view of this, an unbalanced spectra classification method based on entropy (USCM) is proposed in this paper to deal with the unbalanced spectra classification problem. USCM greatly improves the performances of the traditional classifiers on distinguishing the minority spectra as it takes the data distribution into consideration in the process of classification. However, its time complexity is exponential with the training size, and therefore, it can only deal with the problem of small- and medium-scale classification. How to solve the large-scale classification problem is quite important to USCM. It can be easily obtained by mathematical computation that the dual form of USCM is equivalent to the minimum enclosing ball (MEB), and core vector machine (CVM) is introduced, USCM based on CVM is proposed to deal with the large-scale classification problem. Several comparative experiments on the 4 subclasses of K-type spectra, 3 subclasses of F-type spectra and 3 subclasses of G-type spectra from Sloan Digital Sky Survey (SDSS) verify USCM and USCM based on CVM perform better than kNN (k nearest neighbor) and SVM (support vector machine) in dealing with the problem of rare spectra mining respectively on the small- and medium-scale datasets and the large-scale datasets.
Probing Ultracool Atmospheres and Substellar Interiors with Dynamical Masses

NASA Astrophysics Data System (ADS)

Dupuy, Trent

2010-09-01

After years of patient orbital monitoring, there is now a large sample of very low-mass stars and brown dwarfs with precise { 5%} dynamical masses. These binaries represent the gold standard for testing substellar theoretical models. Work to date has identified problems with the model-predicted broad-band colors, effective temperatures, and possibly even luminosity evolution with age. However, our ability to test models is currently limited by how well the individual components of these highly prized binaries are characterized. To solve this problem, we propose to use NICMOS and STIS to characterize this first large sample of ultracool binaries with well-determined dynamical masses. We will use NICMOS multi-band photometry to measure the SEDs of the binary components and thereby precisely estimate their spectral types and effective temperatures. We will use STIS to obtain resolved spectroscopy of the Li I doublet at 6708 A for a subset of three binaries whose masses lie very near the theoretical mass limit for lithium burning. The STIS data will provide the first ever resolved lithium measurements for brown dwarfs of known mass, enabling a direct probe of substellar interiors. Our proposed HST observations to characterize the components of these binaries is much less daunting in comparison to the years of orbital monitoring needed to yield dynamical masses, but these HST data are equally vital for robust tests of theory.
Rotational properties of hypermassive neutron stars from binary mergers

NASA Astrophysics Data System (ADS)

Hanauske, Matthias; Takami, Kentaro; Bovard, Luke; Rezzolla, Luciano; Font, José A.; Galeazzi, Filippo; Stöcker, Horst

2017-08-01

Determining the differential-rotation law of compact stellar objects produced in binary neutron stars mergers or core-collapse supernovae is an old problem in relativistic astrophysics. Addressing this problem is important because it impacts directly on the maximum mass these objects can attain and, hence, on the threshold to black-hole formation under realistic conditions. Using the results from a large number of numerical simulations in full general relativity of binary neutron star mergers described with various equations of state and masses, we study the rotational properties of the resulting hypermassive neutron stars. We find that the angular-velocity distribution shows only a modest dependence on the equation of state, thus exhibiting the traits of "quasiuniversality" found in other aspects of compact stars, both isolated and in binary systems. The distributions are characterized by an almost uniformly rotating core and a "disk." Such a configuration is significantly different from the j -constant differential-rotation law that is commonly adopted in equilibrium models of differentially rotating stars. Furthermore, the rest-mass contained in such a disk can be quite large, ranging from ≃0.03 M⊙ in the case of high-mass binaries with stiff equations of state, up to ≃0.2 M⊙ for low-mass binaries with soft equations of state. We comment on the astrophysical implications of our findings and on the long-term evolutionary scenarios that can be conjectured on the basis of our simulations.
Manifold regularized multitask learning for semi-supervised multilabel image classification.

PubMed

Luo, Yong; Tao, Dacheng; Geng, Bo; Xu, Chao; Maybank, Stephen J

2013-02-01

It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.
Local binary pattern texture-based classification of solid masses in ultrasound breast images

NASA Astrophysics Data System (ADS)

Matsumoto, Monica M. S.; Sehgal, Chandra M.; Udupa, Jayaram K.

2012-03-01

Breast cancer is one of the leading causes of cancer mortality among women. Ultrasound examination can be used to assess breast masses, complementarily to mammography. Ultrasound images reveal tissue information in its echoic patterns. Therefore, pattern recognition techniques can facilitate classification of lesions and thereby reduce the number of unnecessary biopsies. Our hypothesis was that image texture features on the boundary of a lesion and its vicinity can be used to classify masses. We have used intensity-independent and rotation-invariant texture features, known as Local Binary Patterns (LBP). The classifier selected was K-nearest neighbors. Our breast ultrasound image database consisted of 100 patient images (50 benign and 50 malignant cases). The determination of whether the mass was benign or malignant was done through biopsy and pathology assessment. The training set consisted of sixty images, randomly chosen from the database of 100 patients. The testing set consisted of forty images to be classified. The results with a multi-fold cross validation of 100 iterations produced a robust evaluation. The highest performance was observed for feature LBP with 24 symmetrically distributed neighbors over a circle of radius 3 (LBP24,3) with an accuracy rate of 81.0%. We also investigated an approach with a score of malignancy assigned to the images in the test set. This approach provided an ROC curve with Az of 0.803. The analysis of texture features over the boundary of solid masses showed promise for malignancy classification in ultrasound breast images.
Relative Moldiness Index as Predictor of Childhood Respiratory Illness

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vesper, Sephen J.; McKinstry, Craig A.; Haugland, Richard A.

2007-01-01

This study compared two classification methods to evaluate the mold condition in 271 homes of infants, 144 of which later developed symptoms of respiratory illness. A method using on-site visual mold inspection was compared to another method using a quantitative index of moldiness, calculated from mold specific quantitative PCR (MSQPCR) measurements on the concentration of 36 species of molds in floor dust samples called the EPA relative moldiness index© (ERMI©). The binary classification of homes as either moldy or non-moldy by on-site visual home inspection was not predictive of the development of wheeze and/or rhinitis. The odds-ratio of moldy vs.more » non-moldy homes to experience respiratory illness was estimated at 1.33 (p=0.27, Fisher’s exact test). Further, this method offers little flexibility in how it may be applied in support of decisions on mold remediation. On the other hand, a method developed and validated in this paper using the ERMI© index fit to a logistic function, can be used to predict the occurrence of illness in homes and allows stake holders to choose among various levels of risk. An example is given where an ERMI© value of -4.29 is used as a threshold for binary classification of homes producing an odds ratio of 2.53 (p=0.003, Fisher’s exact test). The ERMI© based methods presented here provide a new and more flexible platform to support mold remediation decisions.« less
Libration of arguments of circumbinary-planet orbits at resonance

NASA Astrophysics Data System (ADS)

Schubart, Joachim

2017-06-01

The paper refers to fictitious resonant orbits of planet type that surround both components of a binary system. In case of 16 studied examples a suitable choice of the starting values leads to a process of libration of special angular arguments and to an evolution with an at least temporary stay of the planet in the resonant orbit. The ratio of the periods of revolution of the binary and a planet is equal to 1:5. Eight orbits depend on the ratio 1:5 of the masses of the binary components, but two other ratios appear as well. The basis of this study is the planar, elliptic or circular restricted problem of three bodies, but remarks at the end of the text refer to a four-body problem.
Various forms of indexing HDMR for modelling multivariate classification problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aksu, Çağrı; Tunga, M. Alper

2014-12-10

The Indexing HDMR method was recently developed for modelling multivariate interpolation problems. The method uses the Plain HDMR philosophy in partitioning the given multivariate data set into less variate data sets and then constructing an analytical structure through these partitioned data sets to represent the given multidimensional problem. Indexing HDMR makes HDMR be applicable to classification problems having real world data. Mostly, we do not know all possible class values in the domain of the given problem, that is, we have a non-orthogonal data structure. However, Plain HDMR needs an orthogonal data structure in the given problem to be modelled.more » In this sense, the main idea of this work is to offer various forms of Indexing HDMR to successfully model these real life classification problems. To test these different forms, several well-known multivariate classification problems given in UCI Machine Learning Repository were used and it was observed that the accuracy results lie between 80% and 95% which are very satisfactory.« less
New Approach to Analyzing Physics Problems: A Taxonomy of Introductory Physics Problems

ERIC Educational Resources Information Center

Teodorescu, Raluca E.; Bennhold, Cornelius; Feldman, Gerald; Medsker, Larry

2013-01-01

This paper describes research on a classification of physics problems in the context of introductory physics courses. This classification, called the Taxonomy of Introductory Physics Problems (TIPP), relates physics problems to the cognitive processes required to solve them. TIPP was created in order to design educational objectives, to develop…
Mexican Hat Wavelet Kernel ELM for Multiclass Classification.

PubMed

Wang, Jie; Song, Yi-Fan; Ma, Tian-Lei

2017-01-01

Kernel extreme learning machine (KELM) is a novel feedforward neural network, which is widely used in classification problems. To some extent, it solves the existing problems of the invalid nodes and the large computational complexity in ELM. However, the traditional KELM classifier usually has a low test accuracy when it faces multiclass classification problems. In order to solve the above problem, a new classifier, Mexican Hat wavelet KELM classifier, is proposed in this paper. The proposed classifier successfully improves the training accuracy and reduces the training time in the multiclass classification problems. Moreover, the validity of the Mexican Hat wavelet as a kernel function of ELM is rigorously proved. Experimental results on different data sets show that the performance of the proposed classifier is significantly superior to the compared classifiers.
Editorial: Let's talk about sex - the gender binary revisited.

PubMed

Oldehinkel, Albertine J

2017-08-01

Sex refers to biological differences and gender to socioculturally delineated masculine and feminine roles. Sex or gender are included as a covariate or effect modifier in the majority of child psychology and psychiatry studies, and differences found between boys and girls have inspired many researchers to postulate underlying mechanisms. Empirical tests of whether including these proposed explanatory variables actually reduces the variance explained by gender are lagging behind somewhat. That is a pity, because a lot can be gained from a greater focus on the active agents of specific gender differences. As opposed to biological sex as such, some of the processes explaining why a specific outcome shows gender differences may be changeable and so possible prevention targets. Moreover, while the sex binary may be reasonable adequate as a classification variable, the gender binary is far from perfect. Gender is a multidimensional, partly context-dependent factor, and the dichotomy generally used in research does not do justice to the diversity existing within boys and girls. © 2017 Association for Child and Adolescent Mental Health.
Logic regression and its extensions.

PubMed

Schwender, Holger; Ruczinski, Ingo

2010-01-01

Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
Challenging cisgenderism in the ageing and aged care sector: Meeting the needs of older people of trans and/or non-binary experience.

PubMed

Ansara, Y Gavriel

2015-10-01

Recent Australian legislative and policy changes can benefit people of trans and/or non-binary experience (e.g. men assigned female with stereotypically 'female' bodies, women assigned male with stereotypically 'male' bodies, and people who identify as genderqueer, agender [having no gender], bi-gender [having two genders] or another gender option). These populations often experience cisgenderism, which previous research defined as 'the ideology that invalidates people's own understanding of their genders and bodies'. Some documented forms of cisgenderism include pathologising (treating people's genders and bodies as disordered) and misgendering (disregarding people's own understanding and classifications of their genders and bodies). This system of classifying people's lived experiences of gender and body invalidation is called the cisgenderism framework. Applying the cisgenderism framework in the ageing and aged care sector can enhance service providers' ability to meet the needs of older people of trans and/or non-binary experience. © 2015 AJA Inc.
The 26.3-h orbit and multiwavelength properties of the `redback' millisecond pulsar PSR J1306-40

NASA Astrophysics Data System (ADS)

Linares, Manuel

2018-01-01

We present the discovery of the variable optical and X-ray counterparts to the radio millisecond pulsar (MSP) PSR J1306-40, recently discovered by Keane et al. We find that both the optical and X-ray fluxes are modulated with the same period, which allows us to measure for the first time the orbital period Porb = 1.097 16[6] d. The optical properties are consistent with a main-sequence companion with spectral type G to mid K and, together with the X-ray luminosity (8.8 × 1031 erg s-1 in the 0.5-10 keV band, for a distance of 1.2 kpc), confirm the redback classification of this pulsar. Our results establish the binary nature of PSR J1306-40, which has the longest Porb among all known compact binary MSPs in the Galactic disc. We briefly discuss these findings in the context of irradiation and intrabinary shock emission in compact binary MSPs.
Electrode replacement does not affect classification accuracy in dual-session use of a passive brain-computer interface for assessing cognitive workload

PubMed Central

Estepp, Justin R.; Christensen, James C.

2015-01-01

The passive brain-computer interface (pBCI) framework has been shown to be a very promising construct for assessing cognitive and affective state in both individuals and teams. There is a growing body of work that focuses on solving the challenges of transitioning pBCI systems from the research laboratory environment to practical, everyday use. An interesting issue is what impact methodological variability may have on the ability to reliably identify (neuro)physiological patterns that are useful for state assessment. This work aimed at quantifying the effects of methodological variability in a pBCI design for detecting changes in cognitive workload. Specific focus was directed toward the effects of replacing electrodes over dual sessions (thus inducing changes in placement, electromechanical properties, and/or impedance between the electrode and skin surface) on the accuracy of several machine learning approaches in a binary classification problem. In investigating these methodological variables, it was determined that the removal and replacement of the electrode suite between sessions does not impact the accuracy of a number of learning approaches when trained on one session and tested on a second. This finding was confirmed by comparing to a control group for which the electrode suite was not replaced between sessions. This result suggests that sensors (both neurological and peripheral) may be removed and replaced over the course of many interactions with a pBCI system without affecting its performance. Future work on multi-session and multi-day pBCI system use should seek to replicate this (lack of) effect between sessions in other tasks, temporal time courses, and data analytic approaches while also focusing on non-stationarity and variable classification performance due to intrinsic factors. PMID:25805963
Classification of diabetic retinopathy using fractal dimension analysis of eye fundus image

NASA Astrophysics Data System (ADS)

Safitri, Diah Wahyu; Juniati, Dwi

2017-08-01

Diabetes Mellitus (DM) is a metabolic disorder when pancreas produce inadequate insulin or a condition when body resist insulin action, so the blood glucose level is high. One of the most common complications of diabetes mellitus is diabetic retinopathy which can lead to a vision problem. Diabetic retinopathy can be recognized by an abnormality in eye fundus. Those abnormalities are characterized by microaneurysms, hemorrhage, hard exudate, cotton wool spots, and venous's changes. The diabetic retinopathy is classified depends on the conditions of abnormality in eye fundus, that is grade 1 if there is a microaneurysm only in the eye fundus; grade 2, if there are a microaneurysm and a hemorrhage in eye fundus; and grade 3: if there are microaneurysm, hemorrhage, and neovascularization in the eye fundus. This study proposed a method and a process of eye fundus image to classify of diabetic retinopathy using fractal analysis and K-Nearest Neighbor (KNN). The first phase was image segmentation process using green channel, CLAHE, morphological opening, matched filter, masking, and morphological opening binary image. After segmentation process, its fractal dimension was calculated using box-counting method and the values of fractal dimension were analyzed to make a classification of diabetic retinopathy. Tests carried out by used k-fold cross validation method with k=5. In each test used 10 different grade K of KNN. The accuracy of the result of this method is 89,17% with K=3 or K=4, it was the best results than others K value. Based on this results, it can be concluded that the classification of diabetic retinopathy using fractal analysis and KNN had a good performance.
Probabilistic detection of volcanic ash using a Bayesian approach

PubMed Central

Mackie, Shona; Watson, Matthew

2014-01-01

Airborne volcanic ash can pose a hazard to aviation, agriculture, and both human and animal health. It is therefore important that ash clouds are monitored both day and night, even when they travel far from their source. Infrared satellite data provide perhaps the only means of doing this, and since the hugely expensive ash crisis that followed the 2010 Eyjafjalljökull eruption, much research has been carried out into techniques for discriminating ash in such data and for deriving key properties. Such techniques are generally specific to data from particular sensors, and most approaches result in a binary classification of pixels into “ash” and “ash free” classes with no indication of the classification certainty for individual pixels. Furthermore, almost all operational methods rely on expert-set thresholds to determine what constitutes “ash” and can therefore be criticized for being subjective and dependent on expertise that may not remain with an institution. Very few existing methods exploit available contemporaneous atmospheric data to inform the detection, despite the sensitivity of most techniques to atmospheric parameters. The Bayesian method proposed here does exploit such data and gives a probabilistic, physically based classification. We provide an example of the method's implementation for a scene containing both land and sea observations, and a large area of desert dust (often misidentified as ash by other methods). The technique has already been successfully applied to other detection problems in remote sensing, and this work shows that it will be a useful and effective tool for ash detection. Key Points Presentation of a probabilistic volcanic ash detection scheme Method for calculation of probability density function for ash observations Demonstration of a remote sensing technique for monitoring volcanic ash hazards PMID:25844278
Electrode replacement does not affect classification accuracy in dual-session use of a passive brain-computer interface for assessing cognitive workload.

PubMed

Estepp, Justin R; Christensen, James C

2015-01-01

The passive brain-computer interface (pBCI) framework has been shown to be a very promising construct for assessing cognitive and affective state in both individuals and teams. There is a growing body of work that focuses on solving the challenges of transitioning pBCI systems from the research laboratory environment to practical, everyday use. An interesting issue is what impact methodological variability may have on the ability to reliably identify (neuro)physiological patterns that are useful for state assessment. This work aimed at quantifying the effects of methodological variability in a pBCI design for detecting changes in cognitive workload. Specific focus was directed toward the effects of replacing electrodes over dual sessions (thus inducing changes in placement, electromechanical properties, and/or impedance between the electrode and skin surface) on the accuracy of several machine learning approaches in a binary classification problem. In investigating these methodological variables, it was determined that the removal and replacement of the electrode suite between sessions does not impact the accuracy of a number of learning approaches when trained on one session and tested on a second. This finding was confirmed by comparing to a control group for which the electrode suite was not replaced between sessions. This result suggests that sensors (both neurological and peripheral) may be removed and replaced over the course of many interactions with a pBCI system without affecting its performance. Future work on multi-session and multi-day pBCI system use should seek to replicate this (lack of) effect between sessions in other tasks, temporal time courses, and data analytic approaches while also focusing on non-stationarity and variable classification performance due to intrinsic factors.
Probabilistic detection of volcanic ash using a Bayesian approach.

PubMed

Mackie, Shona; Watson, Matthew

2014-03-16

Airborne volcanic ash can pose a hazard to aviation, agriculture, and both human and animal health. It is therefore important that ash clouds are monitored both day and night, even when they travel far from their source. Infrared satellite data provide perhaps the only means of doing this, and since the hugely expensive ash crisis that followed the 2010 Eyjafjalljökull eruption, much research has been carried out into techniques for discriminating ash in such data and for deriving key properties. Such techniques are generally specific to data from particular sensors, and most approaches result in a binary classification of pixels into "ash" and "ash free" classes with no indication of the classification certainty for individual pixels. Furthermore, almost all operational methods rely on expert-set thresholds to determine what constitutes "ash" and can therefore be criticized for being subjective and dependent on expertise that may not remain with an institution. Very few existing methods exploit available contemporaneous atmospheric data to inform the detection, despite the sensitivity of most techniques to atmospheric parameters. The Bayesian method proposed here does exploit such data and gives a probabilistic, physically based classification. We provide an example of the method's implementation for a scene containing both land and sea observations, and a large area of desert dust (often misidentified as ash by other methods). The technique has already been successfully applied to other detection problems in remote sensing, and this work shows that it will be a useful and effective tool for ash detection. Presentation of a probabilistic volcanic ash detection schemeMethod for calculation of probability density function for ash observationsDemonstration of a remote sensing technique for monitoring volcanic ash hazards.

Conditional High-Order Boltzmann Machines for Supervised Relation Learning.

PubMed

Huang, Yan; Wang, Wei; Wang, Liang; Tan, Tieniu

2017-09-01

Relation learning is a fundamental problem in many vision tasks. Recently, high-order Boltzmann machine and its variants have shown their great potentials in learning various types of data relation in a range of tasks. But most of these models are learned in an unsupervised way, i.e., without using relation class labels, which are not very discriminative for some challenging tasks, e.g., face verification. In this paper, with the goal to perform supervised relation learning, we introduce relation class labels into conventional high-order multiplicative interactions with pairwise input samples, and propose a conditional high-order Boltzmann Machine (CHBM), which can learn to classify the data relation in a binary classification way. To be able to deal with more complex data relation, we develop two improved variants of CHBM: 1) latent CHBM, which jointly performs relation feature learning and classification, by using a set of latent variables to block the pathway from pairwise input samples to output relation labels and 2) gated CHBM, which untangles factors of variation in data relation, by exploiting a set of latent variables to multiplicatively gate the classification of CHBM. To reduce the large number of model parameters generated by the multiplicative interactions, we approximately factorize high-order parameter tensors into multiple matrices. Then, we develop efficient supervised learning algorithms, by first pretraining the models using joint likelihood to provide good parameter initialization, and then finetuning them using conditional likelihood to enhance the discriminant ability. We apply the proposed models to a series of tasks including invariant recognition, face verification, and action similarity labeling. Experimental results demonstrate that by exploiting supervised relation labels, our models can greatly improve the performance.
An Automatic Diagnosis Method of Facial Acne Vulgaris Based on Convolutional Neural Network.

PubMed

Shen, Xiaolei; Zhang, Jiachi; Yan, Chenjun; Zhou, Hong

2018-04-11

In this paper, we present a new automatic diagnosis method for facial acne vulgaris which is based on convolutional neural networks (CNNs). To overcome the shortcomings of previous methods which were the inability to classify enough types of acne vulgaris. The core of our method is to extract features of images based on CNNs and achieve classification by classifier. A binary-classifier of skin-and-non-skin is used to detect skin area and a seven-classifier is used to achieve the classification task of facial acne vulgaris and healthy skin. In the experiments, we compare the effectiveness of our CNN and the VGG16 neural network which is pre-trained on the ImageNet data set. We use a ROC curve to evaluate the performance of binary-classifier and use a normalized confusion matrix to evaluate the performance of seven-classifier. The results of our experiments show that the pre-trained VGG16 neural network is effective in extracting features from facial acne vulgaris images. And the features are very useful for the follow-up classifiers. Finally, we try applying the classifiers both based on the pre-trained VGG16 neural network to assist doctors in facial acne vulgaris diagnosis.
Convolution Comparison Pattern: An Efficient Local Image Descriptor for Fingerprint Liveness Detection

PubMed Central

Gottschlich, Carsten

2016-01-01

We present a new type of local image descriptor which yields binary patterns from small image patches. For the application to fingerprint liveness detection, we achieve rotation invariant image patches by taking the fingerprint segmentation and orientation field into account. We compute the discrete cosine transform (DCT) for these rotation invariant patches and attain binary patterns by comparing pairs of two DCT coefficients. These patterns are summarized into one or more histograms per image. Each histogram comprises the relative frequencies of pattern occurrences. Multiple histograms are concatenated and the resulting feature vector is used for image classification. We name this novel type of descriptor convolution comparison pattern (CCP). Experimental results show the usefulness of the proposed CCP descriptor for fingerprint liveness detection. CCP outperforms other local image descriptors such as LBP, LPQ and WLD on the LivDet 2013 benchmark. The CCP descriptor is a general type of local image descriptor which we expect to prove useful in areas beyond fingerprint liveness detection such as biological and medical image processing, texture recognition, face recognition and iris recognition, liveness detection for face and iris images, and machine vision for surface inspection and material classification. PMID:26844544
Classification of ROTSE Variable Stars using Machine Learning

NASA Astrophysics Data System (ADS)

Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration

2001-12-01

We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.
Rapid Crop Cover Mapping for the Conterminous United States.

PubMed

Dahal, Devendra; Wylie, Bruce; Howard, Danny

2018-06-05

Timely crop cover maps with sufficient resolution are important components to various environmental planning and research applications. Through the modification and use of a previously developed crop classification model (CCM), which was originally developed to generate historical annual crop cover maps, we hypothesized that such crop cover maps could be generated rapidly during the growing season. Through a process of incrementally removing weekly and monthly independent variables from the CCM and implementing a 'two model mapping' approach, we found it viable to generate conterminous United States-wide rapid crop cover maps at a resolution of 250 m for the current year by the month of September. In this approach, we divided the CCM model into one 'crop type model' to handle the classification of nine specific crops and a second, binary model to classify the presence or absence of 'other' crops. Under the two model mapping approach, the training errors were 0.8% and 1.5% for the crop type and binary model, respectively, while test errors were 5.5% and 6.4%, respectively. With spatial mapping accuracies for annual maps reaching upwards of 70%, this approach demonstrated a strong potential for generating rapid crop cover maps by the 1 st of September.
EEG Correlates of Ten Positive Emotions

PubMed Central

Hu, Xin; Yu, Jianwen; Song, Mengdi; Yu, Chun; Wang, Fei; Sun, Pei; Wang, Daifa; Zhang, Dan

2017-01-01

Compared with the well documented neurophysiological findings on negative emotions, much less is known about positive emotions. In the present study, we explored the EEG correlates of ten different positive emotions (joy, gratitude, serenity, interest, hope, pride, amusement, inspiration, awe, and love). A group of 20 participants were invited to watch 30 short film clips with their EEGs simultaneously recorded. Distinct topographical patterns for different positive emotions were found for the correlation coefficients between the subjective ratings on the ten positive emotions per film clip and the corresponding EEG spectral powers in different frequency bands. Based on the similarities of the participants’ ratings on the ten positive emotions, these emotions were further clustered into three representative clusters, as ‘encouragement’ for awe, gratitude, hope, inspiration, pride, ‘playfulness’ for amusement, joy, interest, and ‘harmony’ for love, serenity. Using the EEG spectral powers as features, both the binary classification on the higher and lower ratings on these positive emotions and the binary classification between the three positive emotion clusters, achieved accuracies of approximately 80% and above. To our knowledge, our study provides the first piece of evidence on the EEG correlates of different positive emotions. PMID:28184194
The topology of the regularized integral surfaces of the 3-body problem

NASA Technical Reports Server (NTRS)

Easton, R.

1971-01-01

Momentum, angular momentum, and energy of integral surfaces in the planar three-body problem are considered. The end points of orbits which cross an isolating block are identified. It is shown that this identification has a unique extension to an identification which pairs the end points of orbits entering the block and which end in a binary collision with the end points of orbits leaving the block and which come from a binary collision. The problem of regularization is that of showing that the identification of the end points of crossing orbits has a continuous, unique extension. The regularized phase space for the three-body problem was obtained, as were regularized integral surfaces for the problem on which the three-body equations of motion induce flows. Finally the topology of these surfaces is described.
WHU at TREC KBA Vital Filtering Track 2014

DTIC Science & Technology

2014-11-01

view the problem as a classification problem and use Stanford NLP Toolkit to extract necessary information. Various kinds of features are leveraged to...profile of an entity. Our approach is to view the problem as a classification problem and use Stanford NLP Toolkit to extract necessary information
Lifetime of binary asteroids versus gravitational encounters and collisions

NASA Technical Reports Server (NTRS)

Chauvineau, Bertrand; Farinella, Paolo; Mignard, F.

1992-01-01

We investigate the effect on the dynamics of a binary asteroid in the case of a near encounter with a third body. The dynamics of the binary is modeled as a two-body problem perturbed by an approaching body in the following ways: near encounters and collisions with a component of the system. In each case, the typical value of the two-body energy variation is estimated, and a random walk for the cumulative effect is assumed. Results are applied to some binary asteroid candidates. The main conclusion is that the collisional disruption is the dominant effect, giving lifetimes comparable to or larger than the age of the solar system.
Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores.

PubMed

Rios, Anthony; Kavuluru, Ramakanth

2017-11-01

The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

PubMed Central

Chen, Han; Wang, Chaolong; Conomos, Matthew P.; Stilp, Adrienne M.; Li, Zilin; Sofer, Tamar; Szpiro, Adam A.; Chen, Wei; Brehm, John M.; Celedón, Juan C.; Redline, Susan; Papanicolaou, George J.; Thornton, Timothy A.; Laurie, Cathy C.; Rice, Kenneth; Lin, Xihong

2016-01-01

Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM’s constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs. PMID:27018471
Embedding intensity image into a binary hologram with strong noise resistant capability

NASA Astrophysics Data System (ADS)

Zhuang, Zhaoyong; Jiao, Shuming; Zou, Wenbin; Li, Xia

2017-11-01

A digital hologram can be employed as a host image for image watermarking applications to protect information security. Past research demonstrates that a gray level intensity image can be embedded into a binary Fresnel hologram by error diffusion method or bit truncation coding method. However, the fidelity of the retrieved watermark image from binary hologram is generally not satisfactory, especially when the binary hologram is contaminated with noise. To address this problem, we propose a JPEG-BCH encoding method in this paper. First, we employ the JPEG standard to compress the intensity image into a binary bit stream. Next, we encode the binary bit stream with BCH code to obtain error correction capability. Finally, the JPEG-BCH code is embedded into the binary hologram. By this way, the intensity image can be retrieved with high fidelity by a BCH-JPEG decoder even if the binary hologram suffers from serious noise contamination. Numerical simulation results show that the image quality of retrieved intensity image with our proposed method is superior to the state-of-the-art work reported.
Nutrition Problem Classification for Children and Youth.

ERIC Educational Resources Information Center

Health Services Administration (DHEW/PHS), Rockville, MD. Bureau of Community Health Services.

This nutrition problem classification system is an attempt to classify the nutritional needs and problems of children and youth. Its two most important uses are problem identification and monitoring for individual patients and creation of an information base for developing program plans for intervention in a service population. The classification…
Binary optimization for source localization in the inverse problem of ECG.

PubMed

Potyagaylo, Danila; Cortés, Elisenda Gil; Schulze, Walther H W; Dössel, Olaf

2014-09-01

The goal of ECG-imaging (ECGI) is to reconstruct heart electrical activity from body surface potential maps. The problem is ill-posed, which means that it is extremely sensitive to measurement and modeling errors. The most commonly used method to tackle this obstacle is Tikhonov regularization, which consists in converting the original problem into a well-posed one by adding a penalty term. The method, despite all its practical advantages, has however a serious drawback: The obtained solution is often over-smoothed, which can hinder precise clinical diagnosis and treatment planning. In this paper, we apply a binary optimization approach to the transmembrane voltage (TMV)-based problem. For this, we assume the TMV to take two possible values according to a heart abnormality under consideration. In this work, we investigate the localization of simulated ischemic areas and ectopic foci and one clinical infarction case. This affects only the choice of the binary values, while the core of the algorithms remains the same, making the approximation easily adjustable to the application needs. Two methods, a hybrid metaheuristic approach and the difference of convex functions (DC), algorithm were tested. For this purpose, we performed realistic heart simulations for a complex thorax model and applied the proposed techniques to the obtained ECG signals. Both methods enabled localization of the areas of interest, hence showing their potential for application in ECGI. For the metaheuristic algorithm, it was necessary to subdivide the heart into regions in order to obtain a stable solution unsusceptible to the errors, while the analytical DC scheme can be efficiently applied for higher dimensional problems. With the DC method, we also successfully reconstructed the activation pattern and origin of a simulated extrasystole. In addition, the DC algorithm enables iterative adjustment of binary values ensuring robust performance.
An Optimization-based Framework to Learn Conditional Random Fields for Multi-label Classification

PubMed Central

Naeini, Mahdi Pakdaman; Batal, Iyad; Liu, Zitao; Hong, CharmGil; Hauskrecht, Milos

2015-01-01

This paper studies multi-label classification problem in which data instances are associated with multiple, possibly high-dimensional, label vectors. This problem is especially challenging when labels are dependent and one cannot decompose the problem into a set of independent classification problems. To address the problem and properly represent label dependencies we propose and study a pairwise conditional random Field (CRF) model. We develop a new approach for learning the structure and parameters of the CRF from data. The approach maximizes the pseudo likelihood of observed labels and relies on the fast proximal gradient descend for learning the structure and limited memory BFGS for learning the parameters of the model. Empirical results on several datasets show that our approach outperforms several multi-label classification baselines, including recently published state-of-the-art methods. PMID:25927015
A chance-constrained stochastic approach to intermodal container routing problems.

PubMed

Zhao, Yi; Liu, Ronghui; Zhang, Xi; Whiteing, Anthony

2018-01-01

We consider a container routing problem with stochastic time variables in a sea-rail intermodal transportation system. The problem is formulated as a binary integer chance-constrained programming model including stochastic travel times and stochastic transfer time, with the objective of minimising the expected total cost. Two chance constraints are proposed to ensure that the container service satisfies ship fulfilment and cargo on-time delivery with pre-specified probabilities. A hybrid heuristic algorithm is employed to solve the binary integer chance-constrained programming model. Two case studies are conducted to demonstrate the feasibility of the proposed model and to analyse the impact of stochastic variables and chance-constraints on the optimal solution and total cost.
A chance-constrained stochastic approach to intermodal container routing problems

PubMed Central

Zhao, Yi; Zhang, Xi; Whiteing, Anthony

2018-01-01

We consider a container routing problem with stochastic time variables in a sea-rail intermodal transportation system. The problem is formulated as a binary integer chance-constrained programming model including stochastic travel times and stochastic transfer time, with the objective of minimising the expected total cost. Two chance constraints are proposed to ensure that the container service satisfies ship fulfilment and cargo on-time delivery with pre-specified probabilities. A hybrid heuristic algorithm is employed to solve the binary integer chance-constrained programming model. Two case studies are conducted to demonstrate the feasibility of the proposed model and to analyse the impact of stochastic variables and chance-constraints on the optimal solution and total cost. PMID:29438389
Geomorphic Flood Area (GFA): a QGIS tool for a cost-effective delineation of the floodplains

NASA Astrophysics Data System (ADS)

Samela, Caterina; Albano, Raffaele; Sole, Aurelia; Manfreda, Salvatore

2017-04-01

The importance of delineating flood hazard and risk areas at a global scale has been highlighted for many years. However, its complete achievement regularly encounters practical difficulties, above all the lack of data and implementation costs. In conditions of scarce data availability (e.g. ungauged basins, large-scale analyses), a fast and cost-effective floodplain delineation can be carried out using geomorphic methods (e.g., Manfreda et al., 2011; 2014). In particular, an automatic DEM-based procedure has been implemented in an open-source QGIS plugin named Geomorphic Flood Area - tool (GFA - tool). This tool performs a linear binary classification based on the recently proposed Geomorphic Flood Index (GFI), which exhibited high classification accuracy and reliability in several test sites located in Europe, United States and Africa (Manfreda et al., 2015; Samela et al., 2016, 2017; Samela, 2016). The GFA - tool is designed to make available to all users the proposed procedure, that includes a number of operations requiring good geomorphic and GIS competences. It allows computing the GFI through terrain analysis, turning it into a binary classifier, and training it on the base of a standard inundation map derived for a portion of the river basin (a minimum of 2% of the river basin's area is suggested) using detailed methods of analysis (e.g. flood hazard maps produced by emergency management agencies or river basin authorities). Finally, GFA - tool allows to extend the classification outside the calibration area to delineate the flood-prone areas across the entire river basin. The full analysis has been implemented in this plugin with a user-friendly interface that should make it easy to all user to apply the approach and produce the desired results. Keywords: flood susceptibility; data scarce environments; geomorphic flood index; linear binary classification; Digital elevation models (DEMs). References Manfreda, S., Di Leo, M., Sole, A., (2011). Detection of Flood Prone Areas using Digital Elevation Models, Journal of Hydrologic Engineering, 16(10), 781-790. Manfreda, S., Nardi, F., Samela, C., Grimaldi, S., Taramasso, A. C., Roth, G., & Sole, A. (2014). Investigation on the Use of Geomorphic Approaches for the Delineation of Flood Prone Areas, Journal of Hydrology, 517, 863-876. Manfreda, S., Samela, C., Gioia, A., Consoli, G., Iacobellis, V., Giuzio, L., & Sole, A. (2015). Flood-prone areas assessment using linear binary classifiers based on flood maps obtained from 1D and 2D hydraulic models. Natural Hazards, Vol. 79 (2), pp 735-754. Samela, C. (2016), 100-year flood susceptibility maps for the continental U.S. derived with a geomorphic method. University of Basilicata. Dataset. Samela, C., Manfreda, S., Paola, F. D., Giugni, M., Sole, A., & Fiorentino, M. (2016). DEM-Based Approaches for the Delineation of Flood-Prone Areas in an Ungauged Basin in Africa. Journal of Hydrologic Engineering, 21(2), 1-10. Samela, C., Troy, T.J., Manfreda, S. (2017). Geomorphic classifiers for flood-prone areas delineation for data-scarce environments, Advances in Water Resources (under review).
The dynamical mass of a classical Cepheid variable star in an eclipsing binary system.

PubMed

Pietrzyński, G; Thompson, I B; Gieren, W; Graczyk, D; Bono, G; Udalski, A; Soszyński, I; Minniti, D; Pilecki, B

2010-11-25

Stellar pulsation theory provides a means of determining the masses of pulsating classical Cepheid supergiants-it is the pulsation that causes their luminosity to vary. Such pulsational masses are found to be smaller than the masses derived from stellar evolution theory: this is the Cepheid mass discrepancy problem, for which a solution is missing. An independent, accurate dynamical mass determination for a classical Cepheid variable star (as opposed to type-II Cepheids, low-mass stars with a very different evolutionary history) in a binary system is needed in order to determine which is correct. The accuracy of previous efforts to establish a dynamical Cepheid mass from Galactic single-lined non-eclipsing binaries was typically about 15-30% (refs 6, 7), which is not good enough to resolve the mass discrepancy problem. In spite of many observational efforts, no firm detection of a classical Cepheid in an eclipsing double-lined binary has hitherto been reported. Here we report the discovery of a classical Cepheid in a well detached, double-lined eclipsing binary in the Large Magellanic Cloud. We determine the mass to a precision of 1% and show that it agrees with its pulsation mass, providing strong evidence that pulsation theory correctly and precisely predicts the masses of classical Cepheids.
Muxstep: an open-source C ++ multiplex HMM library for making inferences on multiple data types.

PubMed

Veličković, Petar; Liò, Pietro

2016-08-15

With the development of experimental methods and technology, we are able to reliably gain access to data in larger quantities, dimensions and types. This has great potential for the improvement of machine learning (as the learning algorithms have access to a larger space of information). However, conventional machine learning approaches used thus far on single-dimensional data inputs are unlikely to be expressive enough to accurately model the problem in higher dimensions; in fact, it should generally be most suitable to represent our underlying models as some form of complex networksng;nsio with nontrivial topological features. As the first step in establishing such a trend, we present MUXSTEP: , an open-source library utilising multiplex networks for the purposes of binary classification on multiple data types. The library is designed to be used out-of-the-box for developing models based on the multiplex network framework, as well as easily modifiable to suit problem modelling needs that may differ significantly from the default approach described. The full source code is available on GitHub: https://github.com/PetarV-/muxstep petar.velickovic@cl.cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

A Robust Random Forest-Based Approach for Heart Rate Monitoring Using Photoplethysmography Signal Contaminated by Intense Motion Artifacts.

PubMed

Ye, Yalan; He, Wenwen; Cheng, Yunfei; Huang, Wenxia; Zhang, Zhilin

2017-02-16

The estimation of heart rate (HR) based on wearable devices is of interest in fitness. Photoplethysmography (PPG) is a promising approach to estimate HR due to low cost; however, it is easily corrupted by motion artifacts (MA). In this work, a robust approach based on random forest is proposed for accurately estimating HR from the photoplethysmography signal contaminated by intense motion artifacts, consisting of two stages. Stage 1 proposes a hybrid method to effectively remove MA with a low computation complexity, where two MA removal algorithms are combined by an accurate binary decision algorithm whose aim is to decide whether or not to adopt the second MA removal algorithm. Stage 2 proposes a random forest-based spectral peak-tracking algorithm, whose aim is to locate the spectral peak corresponding to HR, formulating the problem of spectral peak tracking into a pattern classification problem. Experiments on the PPG datasets including 22 subjects used in the 2015 IEEE Signal Processing Cup showed that the proposed approach achieved the average absolute error of 1.65 beats per minute (BPM) on the 22 PPG datasets. Compared to state-of-the-art approaches, the proposed approach has better accuracy and robustness to intense motion artifacts, indicating its potential use in wearable sensors for health monitoring and fitness tracking.
A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks.

PubMed

Mei, Suyu; Zhu, Hao

2015-01-26

Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.
The difference between a dynamic and mechanical approach to stroke treatment.

PubMed

Helgason, Cathy M

2007-06-01

The current classification of stroke is based on causation, also called pathogenesis, and relies on binary logic faithful to the Aristotelian tradition. Accordingly, a pathology is or is not the cause of the stroke, is considered independent of others, and is the target for treatment. It is the subject for large double-blind randomized clinical therapeutic trials. The scientific view behind clinical trials is the fundamental concept that information is statistical, and causation is determined by probabilities. Therefore, the cause and effect relation will be determined by probability-theory-based statistics. This is the basis of evidence-based medicine, which calls for the results of such trials to be the basis for physician decisions regarding diagnosis and treatment. However, there are problems with the methodology behind evidence-based medicine. Calculations using probability-theory-based statistics regarding cause and effect are performed within an automatic system where there are known inputs and outputs. This method of research provides a framework of certainty with no surprise elements or outcomes. However, it is not a system or method that will come up with previously unknown variables, concepts, or universal principles; it is not a method that will give a new outcome; and it is not a method that allows for creativity, expertise, or new insight for problem solving.
Aeronautic instruments. Section I : general classification of instruments and problems including bibliography

NASA Technical Reports Server (NTRS)

Hersey, Mayo D

1923-01-01

This report is intended as a technical introduction to the series of reports on aeronautic instruments. It presents a discussion of those subjects which are common to all instruments. First, a general classification is given, embracing all types of instruments used in aeronautics. Finally, a classification is given of the various problems confronted by the instrument expert and investigator. In this way the following groups of problems are brought up for consideration: problems of mechanical design, human factor, manufacturing problems, supply and selection of instruments, problems concerning the technique of testing, problems of installation, problems concerning the use of instruments, problems of maintenance, and physical research problems. This enumeration of problems which are common to instruments in general serves to indicate the different points of view which should be kept in mind in approaching the study of any particular instrument.
Identification the Relation between Active Basketball Classification Referees' Empathetic Tendencies and Their Problem Solving Abilities

ERIC Educational Resources Information Center

Karaçam, Aydin; Pulur, Atilla

2016-01-01

This study aims to determine the relation between basketball classification referees' problem solving ability and empathetic tendencies. Research model of the study is relational screening model. Sampling of the study is constituted by 124 male and 18 female basketball classification referees who made active refereeing within Turkish Basketball…
Application of the SNoW machine learning paradigm to a set of transportation imaging problems

NASA Astrophysics Data System (ADS)

Paul, Peter; Burry, Aaron M.; Wang, Yuheng; Kozitsky, Vladimir

2012-01-01

Machine learning methods have been successfully applied to image object classification problems where there is clear distinction between classes and where a comprehensive set of training samples and ground truth are readily available. The transportation domain is an area where machine learning methods are particularly applicable, since the classification problems typically have well defined class boundaries and, due to high traffic volumes in most applications, massive roadway data is available. Though these classes tend to be well defined, the particular image noises and variations can be challenging. Another challenge is the extremely high accuracy typically required in most traffic applications. Incorrect assignment of fines or tolls due to imaging mistakes is not acceptable in most applications. For the front seat vehicle occupancy detection problem, classification amounts to determining whether one face (driver only) or two faces (driver + passenger) are detected in the front seat of a vehicle on a roadway. For automatic license plate recognition, the classification problem is a type of optical character recognition problem encompassing multiple class classification. The SNoW machine learning classifier using local SMQT features is shown to be successful in these two transportation imaging applications.
Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

Treesearch

Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards

2006-01-01

Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest'sÂ© See5 and Cubist (for binary and continuous responses,...
Binary Classification of an Unknown Object through Atmospheric Turbulence Using a Polarimetric Blind-Deconvolution Algorithm Augmented with Adaptive Degree of Linear Polarization Priors

DTIC Science & Technology

2012-03-01

geometry of reflection from a smooth (or mirror-like) surface [27]. In passive polarimetry , the angle of polarization (AoP) provides information about... polarimetry for remote sens- ing applications”. Appl. Opt., 45(22):5453–5469, Aug 2006. URL http://ao.osa.org/abstract.cfm?URI=ao-45-22-5453. 27
Mining Predictors of Success in Air Force Flight Training Regiments via Semantic Analysis of Instructor Evaluations

DTIC Science & Technology

2018-03-01

We apply our methodology to the criticism text written in the flight-training program student evaluations in order to construct a model that...factors. We apply our methodology to the criticism text written in the flight-training program student evaluations in order to construct a model...9 D. BINARY CLASSIFICATION AND FEATURE SELECTION ..........11 III. METHODOLOGY
BOREAS TE-18 Landsat TM Physical Classification Image of the NSA

NASA Technical Reports Server (NTRS)

Hall, Forrest G. (Editor); Knapp, David

2000-01-01

The BOREAS TE-18 team focused its efforts on using remotely sensed data to characterize the successional and disturbance dynamics of the boreal forest for use in carbon modeling. The objective of this classification is to provide the BOREAS investigators with a data product that characterizes the land cover of the NSA. A Landsat-5 TM image from 21-Jun-1995 was used to derive the classification. A technique was implemented that uses reflectances of various land cover types along with a geometric optical canopy model to produce spectral trajectories. These trajectories are used in a way that is similar to training data to classify the image into the different land cover classes. The data are provided in a binary, image file format. The data files are available on a CD-ROM (see document number 20010000884), or from the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC).
BOREAS TE-18 Landsat TM Physical Classification Image of the SSA

NASA Technical Reports Server (NTRS)

Hall, Forrest G. (Editor); Knapp, David

2000-01-01

The BOREAS TE-18 team focused its efforts on using remotely sensed data to characterize the successional and disturbance dynamics of the boreal forest for use in carbon modeling. The objective of this classification is to provide the BOREAS investigators with a data product that characterizes the land cover of the SSA. A Landsat-5 TM image from 02-Sep-1994 was used to derive the classification. A technique was implemented that uses reflectances of various land cover types along with a geometric optical canopy model to produce spectral trajectories. These trajectories are used as training data to classify the image into the different land cover classes. These data are provided in a binary image file format. The data files are available on a CD-ROM (see document number 20010000884), or from the Oak Ridge National Laboratory (ORNL) Distributed Activity Archive Center (DAAC).
Quantitative EEG features selection in the classification of attention and response control in the children and adolescents with attention deficit hyperactivity disorder.

PubMed

Bashiri, Azadeh; Shahmoradi, Leila; Beigy, Hamid; Savareh, Behrouz A; Nosratabadi, Masood; N Kalhori, Sharareh R; Ghazisaeedi, Marjan

2018-06-01

Quantitative EEG gives valuable information in the clinical evaluation of psychological disorders. The purpose of the present study is to identify the most prominent features of quantitative electroencephalography (QEEG) that affect attention and response control parameters in children with attention deficit hyperactivity disorder. The QEEG features and the Integrated Visual and Auditory-Continuous Performance Test ( IVA-CPT) of 95 attention deficit hyperactivity disorder subjects were preprocessed by Independent Evaluation Criterion for Binary Classification. Then, the importance of selected features in the classification of desired outputs was evaluated using the artificial neural network. Findings uncovered the highest rank of QEEG features in each IVA-CPT parameters related to attention and response control. Using the designed model could help therapists to determine the existence or absence of defects in attention and response control relying on QEEG.
BOREAS TE-18 Landsat TM Maximum Likelihood Classification Image of the SSA

NASA Technical Reports Server (NTRS)

Hall, Forrest G. (Editor); Knapp, David

2000-01-01

The BOREAS TE-18 team focused its efforts on using remotely sensed data to characterize the successional and disturbance dynamics of the boreal forest for use in carbon modeling. The objective of this classification is to provide the BOREAS investigators with a data product that characterizes the land cover of the SSA. A Landsat-5 TM image from 02-Sep- 1994 was used to derive the classification. A technique was implemented that uses reflectances of various land cover types along with a geometric optical canopy model to produce spectral trajectories. These trajectories are used as training data to classify the image into the different land cover classes. These data are provided in a binary image file format. The data files are available on a CD-ROM (see document number 20010000884), or from the Oak Ridge National Laboratory (ORNL) Distributed Active Center (DAAC).
Constructing binary black hole initial data with high mass ratios and spins

NASA Astrophysics Data System (ADS)

Ossokine, Serguei; Foucart, Francois; Pfeiffer, Harald; Szilagyi, Bela; Simulating Extreme Spacetimes Collaboration

2015-04-01

Binary black hole systems have now been successfully modelled in full numerical relativity by many groups. In order to explore high-mass-ratio (larger than 1:10), high-spin systems (above 0.9 of the maximal BH spin), we revisit the initial-data problem for binary black holes. The initial-data solver in the Spectral Einstein Code (SpEC) was not able to solve for such initial data reliably and robustly. I will present recent improvements to this solver, among them adaptive mesh refinement and control of motion of the center of mass of the binary, and will discuss the much larger region of parameter space this code can now address.
Predicting allergic contact dermatitis: a hierarchical structure activity relationship (SAR) approach to chemical classification using topological and quantum chemical descriptors

NASA Astrophysics Data System (ADS)

Basak, Subhash C.; Mills, Denise; Hawkins, Douglas M.

2008-06-01

A hierarchical classification study was carried out based on a set of 70 chemicals—35 which produce allergic contact dermatitis (ACD) and 35 which do not. This approach was implemented using a regular ridge regression computer code, followed by conversion of regression output to binary data values. The hierarchical descriptor classes used in the modeling include topostructural (TS), topochemical (TC), and quantum chemical (QC), all of which are based solely on chemical structure. The concordance, sensitivity, and specificity are reported. The model based on the TC descriptors was found to be the best, while the TS model was extremely poor.
Testing of the Support Vector Machine for Binary-Class Classification

NASA Technical Reports Server (NTRS)

Scholten, Matthew

2011-01-01

The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
glmnetLRC f/k/a lrc package: Logistic Regression Classification

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-06-09

Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less
Solute transport with multisegment, equilibrium-controlled, classical reactions: Problem solvability and feed forward method's applicability for complex segments of at most binary participants

USGS Publications Warehouse

Rubin, Jacob

1992-01-01

The feed forward (FF) method derives efficient operational equations for simulating transport of reacting solutes. It has been shown to be applicable in the presence of networks with any number of homogeneous and/or heterogeneous, classical reaction segments that consist of three, at most binary participants. Using a sequential (network type after network type) exploration approach and, independently, theoretical explanations, it is demonstrated for networks with classical reaction segments containing more than three, at most binary participants that if any one of such networks leads to a solvable transport problem then the FF method is applicable. Ways of helping to avoid networks that produce problem insolvability are developed and demonstrated. A previously suggested algebraic, matrix rank procedure has been adapted and augmented to serve as the main, easy-to-apply solvability test for already postulated networks. Four network conditions that often generate insolvability have been identified and studied. Their early detection during network formulation may help to avoid postulation of insolvable networks.
Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

NASA Astrophysics Data System (ADS)

Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

2018-03-01

This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
Identification and characterization of neutrophil extracellular trap shapes in flow cytometry

NASA Astrophysics Data System (ADS)

Ginley, Brandon; Emmons, Tiffany; Sasankan, Prabhu; Urban, Constantin; Segal, Brahm H.; Sarder, Pinaki

2017-03-01

Neutrophil extracellular trap (NET) formation is an alternate immunologic weapon used mainly by neutrophils. Chromatin backbones fused with proteins derived from granules are shot like projectiles onto foreign invaders. It is thought that this mechanism is highly anti-microbial, aids in preventing bacterial dissemination, is used to break down structures several sizes larger than neutrophils themselves, and may have several more uses yet unknown. NETs have been implied to be involved in a wide array of systemic host immune defenses, including sepsis, autoimmune diseases, and cancer. Existing methods used to visually quantify NETotic versus non-NETotic shapes are extremely time-consuming and subject to user bias. These limitations are obstacles to developing NETs as prognostic biomarkers and therapeutic targets. We propose an automated pipeline for quantitatively detecting neutrophil and NET shapes captured using a flow cytometry-imaging system. Our method uses contrast limited adaptive histogram equalization to improve signal intensity in dimly illuminated NETs. From the contrast improved image, fixed value thresholding is applied to convert the image to binary. Feature extraction is performed on the resulting binary image, by calculating region properties of the resulting foreground structures. Classification of the resulting features is performed using Support Vector Machine. Our method classifies NETs from neutrophils without traps at 0.97/0.96 sensitivity/specificity on n = 387 images, and is 1500X faster than manual classification, per sample. Our method can be extended to rapidly analyze whole-slide immunofluorescence tissue images for NET classification, and has potential to streamline the quantification of NETs for patients with diseases associated with cancer and autoimmunity.

Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

PubMed

Henrard, S; Speybroeck, N; Hermans, C

2015-11-01

Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Automated detection of tuberculosis on sputum smeared slides using stepwise classification

NASA Astrophysics Data System (ADS)

Divekar, Ajay; Pangilinan, Corina; Coetzee, Gerrit; Sondh, Tarlochan; Lure, Fleming Y. M.; Kennedy, Sean

2012-03-01

Routine visual slide screening for identification of tuberculosis (TB) bacilli in stained sputum slides under microscope system is a tedious labor-intensive task and can miss up to 50% of TB. Based on the Shannon cofactor expansion on Boolean function for classification, a stepwise classification (SWC) algorithm is developed to remove different types of false positives, one type at a time, and to increase the detection of TB bacilli at different concentrations. Both bacilli and non-bacilli objects are first analyzed and classified into several different categories including scanty positive, high concentration positive, and several non-bacilli categories: small bright objects, beaded, dim elongated objects, etc. The morphological and contrast features are extracted based on aprior clinical knowledge. The SWC is composed of several individual classifiers. Individual classifier to increase the bacilli counts utilizes an adaptive algorithm based on a microbiologist's statistical heuristic decision process. Individual classifier to reduce false positive is developed through minimization from a binary decision tree to classify different types of true and false positive based on feature vectors. Finally, the detection algorithm is was tested on 102 independent confirmed negative and 74 positive cases. A multi-class task analysis shows high accordance rate for negative, scanty, and high-concentration as 88.24%, 56.00%, and 97.96%, respectively. A binary-class task analysis using a receiver operating characteristics method with the area under the curve (Az) is also utilized to analyze the performance of this detection algorithm, showing the superior detection performance on the high-concentration cases (Az=0.913) and cases mixed with high-concentration and scanty cases (Az=0.878).
Effects of truck traffic on crash injury severity on rural highways in Wyoming using Bayesian binary logit models.

PubMed

Ahmed, Mohamed M; Franke, Rebecca; Ksaibati, Khaled; Shinstine, Debbie S

2018-08-01

Roadway safety is an integral part of a functioning infrastructure. A major use of the highway system is the transport of goods. The United States has experienced constant growth in the amount of freight transported by truck in the last few years. Wyoming is experiencing a large increase in truck traffic on its local and county roads due to an increase in oil and gas production. This study explores the involvement of heavy trucks in crashes and their significance as a predictor of crash severity and addresses the effect that large truck traffic is having on the safety of roadways for various road classifications. Studies have been done on the factors involved in and the causation of heavy truck crashes, but none address the causation and effect of roadway classifications on truck crashes. Binary Logit Models (BLM) with Bayesian inferences were utilized to classify heavy truck involvement in severe and non-severe crashes using ten years (2002-2011) of historical crash data in the State of Wyoming. From the final main effects model, various interactions proved to be significant in predicting the severity of crashes and varied depending on the roadway classification. The results indicated the odds of a severe crash increase to 2.3 and 4.5 times when a heavy truck is involved on state and interstate highways respectively. The severity of crashes is significantly increased when road conditions were not clear, icy, and during snowy weather conditions. Copyright © 2018 Elsevier Ltd. All rights reserved.
On the Relative Signs of "ROT-Effects" in Ternary and Binary Fission of 233U and 235U Nuclei Induced by Polarized Cold Neutrons

NASA Astrophysics Data System (ADS)

Danilyan, G. V.

2018-02-01

Signs of the ROT-effects in ternary fission of 233U and 235U experimentally defined by PNPI group are the same, whereas in binary fission defined by ITEP group are opposite. This contradiction cannot be explained by the errors in the experiments of both groups, since such instrumental effects would be too large not to be noticed. Therefore, it is necessary to find the answer to this problem in the differences of the ternary and binary fission mechanisms.
A 15.7-Minute AM CVn Binary Discovered in K2

NASA Astrophysics Data System (ADS)

Green, M. J.; Hermes, J. J.; Marsh, T. R.; Steeghs, D. T. H.; Bell, Keaton J.; Littlefair, S. P.; Parsons, S. G.; Dennihy, E.; Fuchs, J. T.; Reding, J. S.; Kaiser, B. C.; Ashley, R. P.; Breedt, E.; Dhillon, V. S.; Gentile Fusillo, N. P.; Kerry, P.; Sahman, D. I.

2018-04-01

We present the discovery of SDSS J135154.46-064309.0, a short-period variable observed using 30-minute cadence photometry in K2 Campaign 6. Follow-up spectroscopy and high-speed photometry support a classification as a new member of the rare class of ultracompact accreting binaries known as AM CVn stars. The spectroscopic orbital period of 15.65 ± 0.12 minutes makes this system the fourth-shortest period AM CVn known, and the second system of this type to be discovered by the Kepler spacecraft. The K2 data show photometric periods at 15.7306 ± 0.0003 minutes, 16.1121 ± 0.0004 minutes and 664.82 ± 0.06 minutes, which we identify as the orbital period, superhump period, and disc precession period, respectively. From the superhump and orbital periods we estimate the binary mass ratio q = M2/M1 = 0.111 ± 0.005, though this method of mass ratio determination may not be well calibrated for helium-dominated binaries. This system is likely to be a bright foreground source of gravitational waves in the frequency range detectable by LISA, and may be of use as a calibration source if future studies are able to constrain the masses of its stellar components.
A 15.7-minAM CVn binary discovered in K2

NASA Astrophysics Data System (ADS)

Green, M. J.; Hermes, J. J.; Marsh, T. R.; Steeghs, D. T. H.; Bell, Keaton J.; Littlefair, S. P.; Parsons, S. G.; Dennihy, E.; Fuchs, J. T.; Reding, J. S.; Kaiser, B. C.; Ashley, R. P.; Breedt, E.; Dhillon, V. S.; Gentile Fusillo, N. P.; Kerry, P.; Sahman, D. I.

2018-07-01

We present the discovery of SDSS J135154.46-064309.0, a short-period variable observed using 30-mincadence photometry in K2 Campaign 6. Follow-up spectroscopy and high-speed photometry support a classification as a new member of the rare class of ultracompact accreting binaries known as AM CVn stars. The spectroscopic orbital period of 15.65 ± 0.12 min makes this system the fourth-shortest-period AM CVn known, and the second system of this type to be discovered by the Kepler spacecraft. The K2 data show photometric periods at 15.7306 ± 0.0003 min, 16.1121 ± 0.0004 min, and 664.82 ± 0.06 min, which we identify as the orbital period, superhump period, and disc precession period, respectively. From the superhump and orbital periods we estimate the binary mass ratio q = M2/M1= 0.111 ± 0.005, though this method of mass ratio determination may not be well calibrated for helium-dominated binaries. This system is likely to be a bright foreground source of gravitational waves in the frequency range detectable by Laser Interferometer Space Antenna, and may be of use as a calibration source if future studies are able to constrain the masses of its stellar components.
Single classifier, OvO, OvA and RCC multiclass classification method in handheld based smartphone gait identification

NASA Astrophysics Data System (ADS)

Raziff, Abdul Rafiez Abdul; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran

2017-10-01

Gait recognition is widely used in many applications. In the application of the gait identification especially in people, the number of classes (people) is many which may comprise to more than 20. Due to the large amount of classes, the usage of single classification mapping (direct classification) may not be suitable as most of the existing algorithms are mostly designed for the binary classification. Furthermore, having many classes in a dataset may result in the possibility of having a high degree of overlapped class boundary. This paper discusses the application of multiclass classifier mappings such as one-vs-all (OvA), one-vs-one (OvO) and random correction code (RCC) on handheld based smartphone gait signal for person identification. The results is then compared with a single J48 decision tree for benchmark. From the result, it can be said that using multiclass classification mapping method thus partially improved the overall accuracy especially on OvO and RCC with width factor more than 4. For OvA, the accuracy result is worse than a single J48 due to a high number of classes.
Classification scheme for phenomenological universalities in growth problems in physics and other sciences.

PubMed

Castorina, P; Delsanto, P P; Guiot, C

2006-05-12

A classification in universality classes of broad categories of phenomenologies, belonging to physics and other disciplines, may be very useful for a cross fertilization among them and for the purpose of pattern recognition and interpretation of experimental data. We present here a simple scheme for the classification of nonlinear growth problems. The success of the scheme in predicting and characterizing the well known Gompertz, West, and logistic models, suggests to us the study of a hitherto unexplored class of nonlinear growth problems.
Comparing Linear Discriminant Function with Logistic Regression for the Two-Group Classification Problem.

ERIC Educational Resources Information Center

Fan, Xitao; Wang, Lin

The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
RESEARCH ON THE USE OF A COUNSELEE PROBLEM CLASSIFICATION PLAN AT THE JUNIOR HIGH LEVEL.

ERIC Educational Resources Information Center

BLAKSLEE, ROBERT W.

A CLASSIFICATION PLAN FOR MAINTAINING CONFIDENTIAL RECORDS WAS CONSTRUCTED TO HELP THE COUNSELOR SAVE TIME AND STILL HAVE A USABLE RECORD OF HIS SESSIONS AND, WHEN ALL RECORDS ARE ANALYZED AND SUMMARIZED, TO PROVIDE INFORMATION ABOUT THE TYPES OF PROBLEMS EXPERIENCED BY STUDENTS. THE SCHOOL COUNSELOR CLASSIFICATION CATEGORIES CONSISTED OF (1) TWO…
An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals

ERIC Educational Resources Information Center

Verhelst, Norman D.

2008-01-01

Uniform sampling of binary matrices with fixed margins is known as a difficult problem. Two classes of algorithms to sample from a distribution not too different from the uniform are studied in the literature: importance sampling and Markov chain Monte Carlo (MCMC). Existing MCMC algorithms converge slowly, require a long burn-in period and yield…
A Binary System of Tertiary Education: Past Ideas, Contemporary Policy and Future Possibilities

ERIC Educational Resources Information Center

Beddie, Francesca M.

2015-01-01

This paper draws on a project examining the binary policy of higher education formulated in Australia in the mid-1960s. Its purpose is to discuss history as a policy tool and research impact. The historical analysis identified several enduring problems--beyond the central matter of funding--in tertiary education: insufficient diversity; obstacles…
Resettable binary latch mechanism for use with paraffin linear motors

NASA Technical Reports Server (NTRS)

Maus, Daryl; Tibbitts, Scott

1991-01-01

A new resettable Binary Latch Mechanism was developed utilizing a paraffin actuator as the motor. This linear actuator alternately latches between extended and retracted positions, maintaining either position with zero power consumption. The design evolution and kinematics of the latch mechanism are presented, as well as the development problems and lessons that were learned.
Angular velocity of gravitational radiation from precessing binaries and the corotating frame

NASA Astrophysics Data System (ADS)

Boyle, Michael

2013-05-01

This paper defines an angular velocity for time-dependent functions on the sphere and applies it to gravitational waveforms from compact binaries. Because it is geometrically meaningful and has a clear physical motivation, the angular velocity is uniquely useful in helping to solve an important—and largely ignored—problem in models of compact binaries: the inverse problem of deducing the physical parameters of a system from the gravitational waves alone. It is also used to define the corotating frame of the waveform. When decomposed in this frame, the waveform has no rotational dynamics and is therefore as slowly evolving as possible. The resulting simplifications lead to straightforward methods for accurately comparing waveforms and constructing hybrids. As formulated in this paper, the methods can be applied robustly to both precessing and nonprecessing waveforms, providing a clear, comprehensive, and consistent framework for waveform analysis. Explicit implementations of all these methods are provided in accompanying computer code.
Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models.

PubMed

Chen, Han; Wang, Chaolong; Conomos, Matthew P; Stilp, Adrienne M; Li, Zilin; Sofer, Tamar; Szpiro, Adam A; Chen, Wei; Brehm, John M; Celedón, Juan C; Redline, Susan; Papanicolaou, George J; Thornton, Timothy A; Laurie, Cathy C; Rice, Kenneth; Lin, Xihong

2016-04-07

Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs. Copyright © 2016 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Extremal Optimization for Quadratic Unconstrained Binary Problems

NASA Astrophysics Data System (ADS)

Boettcher, S.

We present an implementation of τ-EO for quadratic unconstrained binary optimization (QUBO) problems. To this end, we transform modify QUBO from its conventional Boolean presentation into a spin glass with a random external field on each site. These fields tend to be rather large compared to the typical coupling, presenting EO with a challenging two-scale problem, exploring smaller differences in couplings effectively while sufficiently aligning with those strong external fields. However, we also find a simple solution to that problem that indicates that those external fields apparently tilt the energy landscape to a such a degree such that global minima become more easy to find than those of spin glasses without (or very small) fields. We explore the impact of the weight distribution of the QUBO formulation in the operations research literature and analyze their meaning in a spin-glass language. This is significant because QUBO problems are considered among the main contenders for NP-hard problems that could be solved efficiently on a quantum computer such as D-Wave.
VizieR Online Data Catalog: Photometric study of fourteen low-mass binaries (Korda+, 2017)

NASA Astrophysics Data System (ADS)

Korda, D.; Zasche, P.; Wolf, M.; Kucakova, H.; Honkova, K.; Vrastil, J.

2018-05-01

All new photometric observations of 14 binaries were carried out in the Ondrejov Observatory in the Czech Republic with the 0.65 m reflecting-type telescope and the G2-3200 CCD camera. Observations were collected from 2015 February to 2016 November in the I, R, and V filters (Bessell 1990PASP..102.1181B). Some of the older observations obtained only in the R filter were used for refining the individual orbital periods. The stars were primarily chosen from the catalog of Hoffman et al. (2008, J/AJ/136/1067). For the selection of suitable stars, we used several criteria. Each binary's classification as a low-mass binary was performed using the photometric indices J-H and H-K, which are known from the 2MASS survey (Cutri et al. 2003, Cat. II/246; J-H>0.25 and H-K>0.07 Pecaut & Mamajek (2013, J/ApJS/208/9; www.pas.rochester.edu/~emamajek/EEMdwarfUBVIJHKcolorsTeff.txt)). Furthermore, we selected binary systems that have short orbital periods (P<1.5 days) and we chose the declination to be higher than +30°. The last criterion was that these systems cannot have been analyzed in detail before. We chose 11 systems in Hoffman's catalog (2008, J/AJ/136/1067), 2 more were found in the measured field (one of them is on the edge of criteria), and 1 star was added later. (6 data files).
A Wide-field Survey for Transiting Hot Jupiters and Eclipsing Pre-main-sequence Binaries in Young Stellar Associations

NASA Astrophysics Data System (ADS)

Oelkers, Ryan J.; Macri, Lucas M.; Marshall, Jennifer L.; DePoy, Darren L.; Lambas, Diego G.; Colazo, Carlos; Stringer, Katelyn

2016-09-01

The past two decades have seen a significant advancement in the detection, classification, and understanding of exoplanets and binaries. This is due, in large part, to the increase in use of small-aperture telescopes (<20 cm) to survey large areas of the sky to milli-mag precision with rapid cadence. The vast majority of the planetary and binary systems studied to date consists of main-sequence or evolved objects, leading to a dearth of knowledge of properties at early times (<50 Myr). Only a dozen binaries and one candidate transiting Hot Jupiter are known among pre-main-sequence objects, yet these are the systems that can provide the best constraints on stellar formation and planetary migration models. The deficiency in the number of well characterized systems is driven by the inherent and aperiodic variability found in pre-main-sequence objects, which can mask and mimic eclipse signals. Hence, a dramatic increase in the number of young systems with high-quality observations is highly desirable to guide further theoretical developments. We have recently completed a photometric survey of three nearby (<150 pc) and young (<50 Myr) moving groups with a small-aperture telescope. While our survey reached the requisite photometric precision, the temporal coverage was insufficient to detect Hot Jupiters. Nevertheless, we discovered 346 pre-main-sequence binary candidates, including 74 high-priority objects for further study. This paper includes data taken at The McDonald Observatory of The University of Texas at Austin.
Fitness Probability Distribution of Bit-Flip Mutation.

PubMed

Chicano, Francisco; Sutton, Andrew M; Whitley, L Darrell; Alba, Enrique

2015-01-01

Bit-flip mutation is a common mutation operator for evolutionary algorithms applied to optimize functions over binary strings. In this paper, we develop results from the theory of landscapes and Krawtchouk polynomials to exactly compute the probability distribution of fitness values of a binary string undergoing uniform bit-flip mutation. We prove that this probability distribution can be expressed as a polynomial in p, the probability of flipping each bit. We analyze these polynomials and provide closed-form expressions for an easy linear problem (Onemax), and an NP-hard problem, MAX-SAT. We also discuss a connection of the results with runtime analysis.
Solidification of a binary mixture

NASA Technical Reports Server (NTRS)

Antar, B. N.

1982-01-01

The time dependent concentration and temperature profiles of a finite layer of a binary mixture are investigated during solidification. The coupled time dependent Stefan problem is solved numerically using an implicit finite differencing algorithm with the method of lines. Specifically, the temporal operator is approximated via an implicit finite difference operator resulting in a coupled set of ordinary differential equations for the spatial distribution of the temperature and concentration for each time. Since the resulting differential equations set form a boundary value problem with matching conditions at an unknown spatial point, the method of invariant imbedding is used for its solution.

Genetic programming and serial processing for time series classification.

PubMed

Alfaro-Cid, Eva; Sharman, Ken; Esparcia-Alcázar, Anna I

2014-01-01

This work describes an approach devised by the authors for time series classification. In our approach genetic programming is used in combination with a serial processing of data, where the last output is the result of the classification. The use of genetic programming for classification, although still a field where more research in needed, is not new. However, the application of genetic programming to classification tasks is normally done by considering the input data as a feature vector. That is, to the best of our knowledge, there are not examples in the genetic programming literature of approaches where the time series data are processed serially and the last output is considered as the classification result. The serial processing approach presented here fills a gap in the existing literature. This approach was tested in three different problems. Two of them are real world problems whose data were gathered for online or conference competitions. As there are published results of these two problems this gives us the chance to compare the performance of our approach against top performing methods. The serial processing of data in combination with genetic programming obtained competitive results in both competitions, showing its potential for solving time series classification problems. The main advantage of our serial processing approach is that it can easily handle very large datasets.
Active learning strategies for the deduplication of electronic patient data using classification trees.

PubMed

Sariyar, M; Borg, A; Pommerening, K

2012-10-01

Supervised record linkage methods often require a clerical review to gain informative training data. Active learning means to actively prompt the user to label data with special characteristics in order to minimise the review costs. We conducted an empirical evaluation to investigate whether a simple active learning strategy using binary comparison patterns is sufficient or if string metrics together with a more sophisticated algorithm are necessary to achieve high accuracies with a small training set. Based on medical registry data with different numbers of attributes, we used active learning to acquire training sets for classification trees, which were then used to classify the remaining data. Active learning for binary patterns means that every distinct comparison pattern represents a stratum from which one item is sampled. Active learning for patterns consisting of the Levenshtein string metric values uses an iterative process where the most informative and representative examples are added to the training set. In this context, we extended the active learning strategy by Sarawagi and Bhamidipaty (2002). On the original data set, active learning based on binary comparison patterns leads to the best results. When dropping four or six attributes, using string metrics leads to better results. In both cases, not more than 200 manually reviewed training examples are necessary. In record linkage applications where only forename, name and birthday are available as attributes, we suggest the sophisticated active learning strategy based on string metrics in order to achieve highly accurate results. We recommend the simple strategy if more attributes are available, as in our study. In both cases, active learning significantly reduces the amount of manual involvement in training data selection compared to usual record linkage settings. Copyright © 2012 Elsevier Inc. All rights reserved.
Describing three-class task performance: three-class linear discriminant analysis and three-class ROC analysis

NASA Astrophysics Data System (ADS)

He, Xin; Frey, Eric C.

2007-03-01

Binary ROC analysis has solid decision-theoretic foundations and a close relationship to linear discriminant analysis (LDA). In particular, for the case of Gaussian equal covariance input data, the area under the ROC curve (AUC) value has a direct relationship to the Hotelling trace. Many attempts have been made to extend binary classification methods to multi-class. For example, Fukunaga extended binary LDA to obtain multi-class LDA, which uses the multi-class Hotelling trace as a figure-of-merit, and we have previously developed a three-class ROC analysis method. This work explores the relationship between conventional multi-class LDA and three-class ROC analysis. First, we developed a linear observer, the three-class Hotelling observer (3-HO). For Gaussian equal covariance data, the 3- HO provides equivalent performance to the three-class ideal observer and, under less strict conditions, maximizes the signal to noise ratio for classification of all pairs of the three classes simultaneously. The 3-HO templates are not the eigenvectors obtained from multi-class LDA. Second, we show that the three-class Hotelling trace, which is the figureof- merit in the conventional three-class extension of LDA, has significant limitations. Third, we demonstrate that, under certain conditions, there is a linear relationship between the eigenvectors obtained from multi-class LDA and 3-HO templates. We conclude that the 3-HO based on decision theory has advantages both in its decision theoretic background and in the usefulness of its figure-of-merit. Additionally, there exists the possibility of interpreting the two linear features extracted by the conventional extension of LDA from a decision theoretic point of view.
Multistage classification of multispectral Earth observational data: The design approach

NASA Technical Reports Server (NTRS)

Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.

1981-01-01

An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.
Multifrequency observations of symbiotic stars

NASA Technical Reports Server (NTRS)

Kenyon, Scott J.

1988-01-01

The discovery of symbiotic stars is described, and the results of multifrequency observations made during the past two decades are presented. Observational data identify symbiotic stars as long-period binary systems that can be divided into two basic physical classes: detached symbiotics containing a red giant (or a Mira variable), and semidetached symbiotics containing a lobe-filling red giant and a solar-type main sequence star. Three components are typically observed: (1) the cool giant component with an effective temperature of 2500-4000 K, which can be divided by the IR spectral classification into normal M giants (S-types) and heavily reddened Mira variables (D-types); (2) the hot companion displaying a bright blue continuum at UV wavelengths, which is sometimes also an X-ray source; and (3) a gaseous nebula enveloping the binary.
SHAPING POINT- AND MIRROR-SYMMETRIC PROTOPLANETARY NEBULAE BY THE ORBITAL MOTION OF THE CENTRAL BINARY SYSTEM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haro-Corzo, Sinhue A. R.; Velazquez, Pablo F.; Raga, Alejandro C.

We present three-dimensional hydrodynamical simulations of a jet launched from the secondary star of a binary system inside a protoplanetary nebula. The secondary star moves around the primary in a close eccentric orbit. From the gasdynamic simulations we compute synthetic [N II] lambda 6583 emission maps. Different jet axis inclinations with respect to the orbital plane, as well as different orientations of the flow with respect to the observer, are considered. For some parameter combinations, we obtain structures that show point- or mirror-symmetric morphologies depending on the orientation of the flow with respect to the observer. Furthermore, our models canmore » explain some of the emission distribution asymmetries that are summarized in the classification given by Soker and Hadar.« less
Biclustering sparse binary genomic data.

PubMed

van Uitert, Miranda; Meuleman, Wouter; Wessels, Lodewyk

2008-12-01

Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.
Multigrid contact detection method

NASA Astrophysics Data System (ADS)

He, Kejing; Dong, Shoubin; Zhou, Zhaoyao

2007-03-01

Contact detection is a general problem of many physical simulations. This work presents a O(N) multigrid method for general contact detection problems (MGCD). The multigrid idea is integrated with contact detection problems. Both the time complexity and memory consumption of the MGCD are O(N) . Unlike other methods, whose efficiencies are influenced strongly by the object size distribution, the performance of MGCD is insensitive to the object size distribution. We compare the MGCD with the no binary search (NBS) method and the multilevel boxing method in three dimensions for both time complexity and memory consumption. For objects with similar size, the MGCD is as good as the NBS method, both of which outperform the multilevel boxing method regarding memory consumption. For objects with diverse size, the MGCD outperform both the NBS method and the multilevel boxing method. We use the MGCD to solve the contact detection problem for a granular simulation system based on the discrete element method. From this granular simulation, we get the density property of monosize packing and binary packing with size ratio equal to 10. The packing density for monosize particles is 0.636. For binary packing with size ratio equal to 10, when the number of small particles is 300 times as the number of big particles, the maximal packing density 0.824 is achieved.
Mutual gravitational potential, force, and torque of a homogeneous polyhedron and an extended body: an application to binary asteroids

NASA Astrophysics Data System (ADS)

Shi, Yu; Wang, Yue; Xu, Shijie

2017-11-01

Binary systems are quite common within the populations of near-Earth asteroids, main-belt asteroids, and Kuiper belt asteroids. The dynamics of binary systems, which can be modeled as the full two-body problem, is a fundamental problem for their evolution and the design of relevant space missions. This paper proposes a new shape-based model for the mutual gravitational potential of binary asteroids, differing from prior approaches such as inertia integrals, spherical harmonics, or symmetric trace-free tensors. One asteroid is modeled as a homogeneous polyhedron, while the other is modeled as an extended rigid body with arbitrary mass distribution. Since the potential of the polyhedron is precisely described in a closed form, the mutual gravitational potential can be formulated as a volume integral over the extended body. By using Taylor expansion, the mutual potential is then derived in terms of inertia integrals of the extended body, derivatives of the polyhedron's potential, and the relative location and orientation between the two bodies. The gravitational forces and torques acting on the two bodies described in the body-fixed frame of the polyhedron are derived in the form of a second-order expansion. The gravitational model is then used to simulate the evolution of the binary asteroid (66391) 1999 KW4, and compared with previous results in the literature.
Galaxy Rotation and Rapid Supermassive Binary Coalescence

NASA Astrophysics Data System (ADS)

Holley-Bockelmann, Kelly; Khan, Fazeel Mahmood

2015-09-01

Galaxy mergers usher the supermassive black hole (SMBH) in each galaxy to the center of the potential, where they form an SMBH binary. The binary orbit shrinks by ejecting stars via three-body scattering, but ample work has shown that in spherical galaxy models, the binary separation stalls after ejecting all the stars in its loss cone—this is the well-known final parsec problem. However, it has been shown that SMBH binaries in non-spherical galactic nuclei harden at a nearly constant rate until reaching the gravitational wave regime. Here we use a suite of direct N-body simulations to follow SMBH binary evolution in both corotating and counterrotating flattened galaxy models. For N > 500 K, we find that the evolution of the SMBH binary is convergent and is independent of the particle number. Rotation in general increases the hardening rate of SMBH binaries even more effectively than galaxy geometry alone. SMBH binary hardening rates are similar for co- and counterrotating galaxies. In the corotating case, the center of mass of the SMBH binary settles into an orbit that is in corotation resonance with the background rotating model, and the coalescence time is roughly a few 100 Myr faster than a non-rotating flattened model. We find that counterrotation drives SMBHs to coalesce on a nearly radial orbit promptly after forming a hard binary. We discuss the implications for gravitational wave astronomy, hypervelocity star production, and the effect on the structure of the host galaxy.
GALAXY ROTATION AND RAPID SUPERMASSIVE BINARY COALESCENCE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holley-Bockelmann, Kelly; Khan, Fazeel Mahmood, E-mail: k.holley@vanderbilt.edu

2015-09-10

Galaxy mergers usher the supermassive black hole (SMBH) in each galaxy to the center of the potential, where they form an SMBH binary. The binary orbit shrinks by ejecting stars via three-body scattering, but ample work has shown that in spherical galaxy models, the binary separation stalls after ejecting all the stars in its loss cone—this is the well-known final parsec problem. However, it has been shown that SMBH binaries in non-spherical galactic nuclei harden at a nearly constant rate until reaching the gravitational wave regime. Here we use a suite of direct N-body simulations to follow SMBH binary evolutionmore » in both corotating and counterrotating flattened galaxy models. For N > 500 K, we find that the evolution of the SMBH binary is convergent and is independent of the particle number. Rotation in general increases the hardening rate of SMBH binaries even more effectively than galaxy geometry alone. SMBH binary hardening rates are similar for co- and counterrotating galaxies. In the corotating case, the center of mass of the SMBH binary settles into an orbit that is in corotation resonance with the background rotating model, and the coalescence time is roughly a few 100 Myr faster than a non-rotating flattened model. We find that counterrotation drives SMBHs to coalesce on a nearly radial orbit promptly after forming a hard binary. We discuss the implications for gravitational wave astronomy, hypervelocity star production, and the effect on the structure of the host galaxy.« less
Automatic crack detection and classification method for subway tunnel safety monitoring.

PubMed

Zhang, Wenyu; Zhang, Zhenjiang; Qi, Dapeng; Liu, Yun

2014-10-16

Cracks are an important indicator reflecting the safety status of infrastructures. This paper presents an automatic crack detection and classification methodology for subway tunnel safety monitoring. With the application of high-speed complementary metal-oxide-semiconductor (CMOS) industrial cameras, the tunnel surface can be captured and stored in digital images. In a next step, the local dark regions with potential crack defects are segmented from the original gray-scale images by utilizing morphological image processing techniques and thresholding operations. In the feature extraction process, we present a distance histogram based shape descriptor that effectively describes the spatial shape difference between cracks and other irrelevant objects. Along with other features, the classification results successfully remove over 90% misidentified objects. Also, compared with the original gray-scale images, over 90% of the crack length is preserved in the last output binary images. The proposed approach was tested on the safety monitoring for Beijing Subway Line 1. The experimental results revealed the rules of parameter settings and also proved that the proposed approach is effective and efficient for automatic crack detection and classification.
A New Experiment on Bengali Character Recognition

NASA Astrophysics Data System (ADS)

Barman, Sumana; Bhattacharyya, Debnath; Jeon, Seung-Whan; Kim, Tai-Hoon; Kim, Haeng-Kon

This paper presents a method to use View based approach in Bangla Optical Character Recognition (OCR) system providing reduced data set to the ANN classification engine rather than the traditional OCR methods. It describes how Bangla characters are processed, trained and then recognized with the use of a Backpropagation Artificial neural network. This is the first published account of using a segmentation-free optical character recognition system for Bangla using a view based approach. The methodology presented here assumes that the OCR pre-processor has presented the input images to the classification engine described here. The size and the font face used to render the characters are also significant in both training and classification. The images are first converted into greyscale and then to binary images; these images are then scaled to a fit a pre-determined area with a fixed but significant number of pixels. The feature vectors are then formed extracting the characteristics points, which in this case is simply a series of 0s and 1s of fixed length. Finally, an artificial neural network is chosen for the training and classification process.
Classifying machinery condition using oil samples and binary logistic regression

NASA Astrophysics Data System (ADS)

Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.

2015-08-01

The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Automatic Screening and Grading of Age-Related Macular Degeneration from Texture Analysis of Fundus Images

PubMed Central

Phan, Thanh Vân; Seoud, Lama; Chakor, Hadi; Cheriet, Farida

2016-01-01

Age-related macular degeneration (AMD) is a disease which causes visual deficiency and irreversible blindness to the elderly. In this paper, an automatic classification method for AMD is proposed to perform robust and reproducible assessments in a telemedicine context. First, a study was carried out to highlight the most relevant features for AMD characterization based on texture, color, and visual context in fundus images. A support vector machine and a random forest were used to classify images according to the different AMD stages following the AREDS protocol and to evaluate the features' relevance. Experiments were conducted on a database of 279 fundus images coming from a telemedicine platform. The results demonstrate that local binary patterns in multiresolution are the most relevant for AMD classification, regardless of the classifier used. Depending on the classification task, our method achieves promising performances with areas under the ROC curve between 0.739 and 0.874 for screening and between 0.469 and 0.685 for grading. Moreover, the proposed automatic AMD classification system is robust with respect to image quality. PMID:27190636
Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring

PubMed Central

Zhang, Wenyu; Zhang, Zhenjiang; Qi, Dapeng; Liu, Yun

2014-01-01

Cracks are an important indicator reflecting the safety status of infrastructures. This paper presents an automatic crack detection and classification methodology for subway tunnel safety monitoring. With the application of high-speed complementary metal-oxide-semiconductor (CMOS) industrial cameras, the tunnel surface can be captured and stored in digital images. In a next step, the local dark regions with potential crack defects are segmented from the original gray-scale images by utilizing morphological image processing techniques and thresholding operations. In the feature extraction process, we present a distance histogram based shape descriptor that effectively describes the spatial shape difference between cracks and other irrelevant objects. Along with other features, the classification results successfully remove over 90% misidentified objects. Also, compared with the original gray-scale images, over 90% of the crack length is preserved in the last output binary images. The proposed approach was tested on the safety monitoring for Beijing Subway Line 1. The experimental results revealed the rules of parameter settings and also proved that the proposed approach is effective and efficient for automatic crack detection and classification. PMID:25325337
Rapid crop cover mapping for the conterminous United States

USGS Publications Warehouse

Dahal, Devendra; Wylie, Bruce K.; Howard, Daniel

2018-01-01

Timely crop cover maps with sufficient resolution are important components to various environmental planning and research applications. Through the modification and use of a previously developed crop classification model (CCM), which was originally developed to generate historical annual crop cover maps, we hypothesized that such crop cover maps could be generated rapidly during the growing season. Through a process of incrementally removing weekly and monthly independent variables from the CCM and implementing a ‘two model mapping’ approach, we found it viable to generate conterminous United States-wide rapid crop cover maps at a resolution of 250 m for the current year by the month of September. In this approach, we divided the CCM model into one ‘crop type model’ to handle the classification of nine specific crops and a second, binary model to classify the presence or absence of ‘other’ crops. Under the two model mapping approach, the training errors were 0.8% and 1.5% for the crop type and binary model, respectively, while test errors were 5.5% and 6.4%, respectively. With spatial mapping accuracies for annual maps reaching upwards of 70%, this approach demonstrated a strong potential for generating rapid crop cover maps by the 1st of September.
Change detection and classification in brain MR images using change vector analysis.

PubMed

Simões, Rita; Slump, Cornelis

2011-01-01

The automatic detection of longitudinal changes in brain images is valuable in the assessment of disease evolution and treatment efficacy. Most existing change detection methods that are currently used in clinical research to monitor patients suffering from neurodegenerative diseases--such as Alzheimer's--focus on large-scale brain deformations. However, such patients often have other brain impairments, such as infarcts, white matter lesions and hemorrhages, which are typically overlooked by the deformation-based methods. Other unsupervised change detection algorithms have been proposed to detect tissue intensity changes. The outcome of these methods is typically a binary change map, which identifies changed brain regions. However, understanding what types of changes these regions underwent is likely to provide equally important information about lesion evolution. In this paper, we present an unsupervised 3D change detection method based on Change Vector Analysis. We compute and automatically threshold the Generalized Likelihood Ratio map to obtain a binary change map. Subsequently, we perform histogram-based clustering to classify the change vectors. We obtain a Kappa Index of 0.82 using various types of simulated lesions. The classification error is 2%. Finally, we are able to detect and discriminate both small changes and ventricle expansions in datasets from Mild Cognitive Impairment patients.
The "Cool Algol" BD+05 706 : Photometric observations of a new eclipsing double-lined spectroscopic binary

NASA Astrophysics Data System (ADS)

Marschall, L. A.; Torres, G.; Neuhauser, R.

1998-05-01

BVRI Observations of the star BD+05 706, carried out between January, 1997, and April 1998 using the 0.4m reflector and Photometrics CCD camera at the Gettysburg College Observatory, show that the star is an eclipsing binary system with a light curve characteristic of a class of semi-detached binaries known as the "cool Algols". These results are in good agreement with the previous report of BD+05 706 as a cool Algol by Torres, Neuhauser, and Wichmann,(Astron. J., 115, May 1998) who based their classification on the strong X-ray emission detected by Rosat and on a series of spectroscopic observations of the radial velocities of both components of the system obtained at the Oak Ridge Observatory, the Fred L. Whipple Observatory, and the Multiple Mirror Telescope. Only 10 other examples of cool Algols are known, and the current photometric light curve, together with the radial velocity curves obtained previously, allows us to derive a complete solution for the physical parameters of each component, providing important constraints on models for these interesting systems.
The Effect of Normalization in Violence Video Classification Performance

NASA Astrophysics Data System (ADS)

Ali, Ashikin; Senan, Norhalina

2017-08-01

Basically, data pre-processing is an important part of data mining. Normalization is a pre-processing stage for any type of problem statement, especially in video classification. Challenging problems that arises in video classification is because of the heterogeneous content, large variations in video quality and complex semantic meanings of the concepts involved. Therefore, to regularize this problem, it is thoughtful to ensure normalization or basically involvement of thorough pre-processing stage aids the robustness of classification performance. This process is to scale all the numeric variables into certain range to make it more meaningful for further phases in available data mining techniques. Thus, this paper attempts to examine the effect of 2 normalization techniques namely Min-max normalization and Z-score in violence video classifications towards the performance of classification rate using Multi-layer perceptron (MLP) classifier. Using Min-Max Normalization range of [0,1] the result shows almost 98% of accuracy, meanwhile Min-Max Normalization range of [-1,1] accuracy is 59% and for Z-score the accuracy is 50%.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.