Parallel protein secondary structure prediction based on neural networks.
Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi
2004-01-01
Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.
An information-based network approach for protein classification
Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.
2017-01-01
Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835
Yasui, Yutaka; Pepe, Margaret; Thompson, Mary Lou; Adam, Bao-Ling; Wright, George L; Qu, Yinsheng; Potter, John D; Winget, Marcy; Thornquist, Mark; Feng, Ziding
2003-07-01
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.
AlzhCPI: A knowledge base for predicting chemical-protein interactions towards Alzheimer's disease.
Fang, Jiansong; Wang, Ling; Li, Yecheng; Lian, Wenwen; Pang, Xiaocong; Wang, Hong; Yuan, Dongsheng; Wang, Qi; Liu, Ai-Lin; Du, Guan-Hua
2017-01-01
Alzheimer's disease (AD) is a complicated progressive neurodegeneration disorder. To confront AD, scientists are searching for multi-target-directed ligands (MTDLs) to delay disease progression. The in silico prediction of chemical-protein interactions (CPI) can accelerate target identification and drug discovery. Previously, we developed 100 binary classifiers to predict the CPI for 25 key targets against AD using the multi-target quantitative structure-activity relationship (mt-QSAR) method. In this investigation, we aimed to apply the mt-QSAR method to enlarge the model library to predict CPI towards AD. Another 104 binary classifiers were further constructed to predict the CPI for 26 preclinical AD targets based on the naive Bayesian (NB) and recursive partitioning (RP) algorithms. The internal 5-fold cross-validation and external test set validation were applied to evaluate the performance of the training sets and test set, respectively. The area under the receiver operating characteristic curve (ROC) for the test sets ranged from 0.629 to 1.0, with an average of 0.903. In addition, we developed a web server named AlzhCPI to integrate the comprehensive information of approximately 204 binary classifiers, which has potential applications in network pharmacology and drug repositioning. AlzhCPI is available online at http://rcidm.org/AlzhCPI/index.html. To illustrate the applicability of AlzhCPI, the developed system was employed for the systems pharmacology-based investigation of shichangpu against AD to enhance the understanding of the mechanisms of action of shichangpu from a holistic perspective.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Tabei, Yasuo; Pauwels, Edouard; Stoven, Véronique; Takemoto, Kazuhiro; Yamanishi, Yoshihiro
2012-01-01
Motivation: Drug effects are mainly caused by the interactions between drug molecules and their target proteins including primary targets and off-targets. Identification of the molecular mechanisms behind overall drug–target interactions is crucial in the drug design process. Results: We develop a classifier-based approach to identify chemogenomic features (the underlying associations between drug chemical substructures and protein domains) that are involved in drug–target interaction networks. We propose a novel algorithm for extracting informative chemogenomic features by using L1 regularized classifiers over the tensor product space of possible drug–target pairs. It is shown that the proposed method can extract a very limited number of chemogenomic features without loosing the performance of predicting drug–target interactions and the extracted features are biologically meaningful. The extracted substructure–domain association network enables us to suggest ligand chemical fragments specific for each protein domain and ligand core substructures important for a wide range of protein families. Availability: Softwares are available at the supplemental website. Contact: yamanishi@bioreg.kyushu-u.ac.jp Supplementary Information: Datasets and all results are available at http://cbio.ensmp.fr/~yyamanishi/l1binary/ . PMID:22962471
Reduction from cost-sensitive ordinal ranking to weighted binary classification.
Lin, Hsuan-Tien; Li, Ling
2012-05-01
We present a reduction framework from ordinal ranking to binary classification. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranker from the binary classifier. Based on the framework, we show that a weighted 0/1 loss of the binary classifier upper-bounds the mislabeling cost of the ranker, both error-wise and regret-wise. Our framework allows not only the design of good ordinal ranking algorithms based on well-tuned binary classification approaches, but also the derivation of new generalization bounds for ordinal ranking from known bounds for binary classification. In addition, our framework unifies many existing ordinal ranking algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms. In addition, the newly designed algorithms lead to better cost-sensitive ordinal ranking performance, as well as improved listwise ranking performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paegert, Martin; Stassun, Keivan G.; Burger, Dan M.
2014-08-01
We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together withmore » a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (∼60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.« less
A Novel Design of 4-Class BCI Using Two Binary Classifiers and Parallel Mental Tasks
Geng, Tao; Gan, John Q.; Dyson, Matthew; Tsui, Chun SL; Sepulveda, Francisco
2008-01-01
A novel 4-class single-trial brain computer interface (BCI) based on two (rather than four or more) binary linear discriminant analysis (LDA) classifiers is proposed, which is called a “parallel BCI.” Unlike other BCIs where mental tasks are executed and classified in a serial way one after another, the parallel BCI uses properly designed parallel mental tasks that are executed on both sides of the subject body simultaneously, which is the main novelty of the BCI paradigm used in our experiments. Each of the two binary classifiers only classifies the mental tasks executed on one side of the subject body, and the results of the two binary classifiers are combined to give the result of the 4-class BCI. Data was recorded in experiments with both real movement and motor imagery in 3 able-bodied subjects. Artifacts were not detected or removed. Offline analysis has shown that, in some subjects, the parallel BCI can generate a higher accuracy than a conventional 4-class BCI, although both of them have used the same feature selection and classification algorithms. PMID:18584040
The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics
Wei, Qiong; Dunbrack, Roland L.
2013-01-01
Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand. PMID:23874456
NASA Astrophysics Data System (ADS)
Liao, Zhijun; Wang, Xinrui; Zeng, Yeting; Zou, Quan
2016-12-01
The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.
Combining multiple decisions: applications to bioinformatics
NASA Astrophysics Data System (ADS)
Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.
2008-01-01
Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.
Zhang, Jian; Gao, Bo; Chai, Haiting; Ma, Zhiqiang; Yang, Guifu
2016-08-26
DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable. In this study, we proposed an accurate method for the prediction of DBPs. Firstly, we focused on the challenge of improving DBP prediction accuracy with information solely from the sequence. Secondly, we used multiple informative features to encode the protein. These features included evolutionary conservation profile, secondary structure motifs, and physicochemical properties. Thirdly, we introduced a novel improved Binary Firefly Algorithm (BFA) to remove redundant or noisy features as well as select optimal parameters for the classifier. The experimental results of our predictor on two benchmark datasets outperformed many state-of-the-art predictors, which revealed the effectiveness of our method. The promising prediction performance on a new-compiled independent testing dataset from PDB and a large-scale dataset from UniProt proved the good generalization ability of our method. In addition, the BFA forged in this research would be of great potential in practical applications in optimization fields, especially in feature selection problems. A highly accurate method was proposed for the identification of DBPs. A user-friendly web-server named iDbP (identification of DNA-binding Proteins) was constructed and provided for academic use.
Kiranyaz, Serkan; Mäkinen, Toni; Gabbouj, Moncef
2012-10-01
In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases. Copyright © 2012 Elsevier Ltd. All rights reserved.
On multi-site damage identification using single-site training data
NASA Astrophysics Data System (ADS)
Barthorpe, R. J.; Manson, G.; Worden, K.
2017-11-01
This paper proposes a methodology for developing multi-site damage location systems for engineering structures that can be trained using single-site damaged state data only. The methodology involves training a sequence of binary classifiers based upon single-site damage data and combining the developed classifiers into a robust multi-class damage locator. In this way, the multi-site damage identification problem may be decomposed into a sequence of binary decisions. In this paper Support Vector Classifiers are adopted as the means of making these binary decisions. The proposed methodology represents an advancement on the state of the art in the field of multi-site damage identification which require either: (1) full damaged state data from single- and multi-site damage cases or (2) the development of a physics-based model to make multi-site model predictions. The potential benefit of the proposed methodology is that a significantly reduced number of recorded damage states may be required in order to train a multi-site damage locator without recourse to physics-based model predictions. In this paper it is first demonstrated that Support Vector Classification represents an appropriate approach to the multi-site damage location problem, with methods for combining binary classifiers discussed. Next, the proposed methodology is demonstrated and evaluated through application to a real engineering structure - a Piper Tomahawk trainer aircraft wing - with its performance compared to classifiers trained using the full damaged-state dataset.
Multiclass classification of microarray data samples with a reduced number of genes
2011-01-01
Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Agarwal, Shashank; Liu, Feifan; Yu, Hong
2011-10-03
Protein-protein interaction (PPI) is an important biomedical phenomenon. Automatically detecting PPI-relevant articles and identifying methods that are used to study PPI are important text mining tasks. In this study, we have explored domain independent features to develop two open source machine learning frameworks. One performs binary classification to determine whether the given article is PPI relevant or not, named "Simple Classifier", and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named "OntoNorm". We evaluated our system in the context of BioCreative challenge competition using the standardized data set. Our systems are amongst the top systems reported by the organizers, attaining 60.8% F1-score for identifying relevant documents, and 52.3% F1-score for mapping articles to interaction method ontology. Our results show that domain-independent machine learning frameworks can perform competitively well at the tasks of detecting PPI relevant articles and identifying the methods that were used to study the interaction in such articles. Simple Classifier is available at http://sourceforge.net/p/simpleclassify/home/ and OntoNorm at http://sourceforge.net/p/ontonorm/home/.
Avci, Ertug; Culha, Mustafa
2014-01-01
The size-dependent interactions of eight blood proteins with silver nanoparticles (AgNPs) in their binary mixtures were investigated using surface-enhanced Raman scattering (SERS). Principal component analysis (PCA) was performed on the SERS spectra of each binary mixture, and the differentiation ability of the mixtures was tested. It was found that the effect of relative concentration change on the SERS spectra of the binary mixtures of small proteins could be detected using PCA. However, this change was not observed with the binary mixtures of large proteins. This study demonstrated that the relative interactions of the smaller proteins with an average size of 50 nm AgNPs smaller than the large proteins could be monitored, and this information can be used for the detection of proteins in protein mixtures.
Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin
2009-01-01
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.
Li, Chuan-Xi; Chen, Peng; Wang, Ru-Jing; Wang, Xiu-Jie; Su, Ya-Ru; Li, Jinyan
2014-01-01
Mining Protein-Protein Interactions (PPIs) from the fast-growing biomedical literature resources has been proven as an effective approach for the identification of biological regulatory networks. This paper presents a novel method based on the idea of Interaction Relation Ontology (IRO), which specifies and organises words of various proteins interaction relationships. Our method is a two-stage PPI extraction method. At first, IRO is applied in a binary classifier to determine whether sentences contain a relation or not. Then, IRO is taken to guide PPI extraction by building sentence dependency parse tree. Comprehensive and quantitative evaluations and detailed analyses are used to demonstrate the significant performance of IRO on relation sentences classification and PPI extraction. Our PPI extraction method yielded a recall of around 80% and 90% and an F1 of around 54% and 66% on corpora of AIMed and BioInfer, respectively, which are superior to most existing extraction methods.
Fault detection and multiclassifier fusion for unmanned aerial vehicles (UAVs)
NASA Astrophysics Data System (ADS)
Yan, Weizhong
2001-03-01
UAVs demand more accurate fault accommodation for their mission manager and vehicle control system in order to achieve a reliability level that is comparable to that of a pilot aircraft. This paper attempts to apply multi-classifier fusion techniques to achieve the necessary performance of the fault detection function for the Lockheed Martin Skunk Works (LMSW) UAV Mission Manager. Three different classifiers that meet the design requirements of the fault detection of the UAAV are employed. The binary decision outputs from the classifiers are then aggregated using three different classifier fusion schemes, namely, majority vote, weighted majority vote, and Naieve Bayes combination. All of the three schemes are simple and need no retraining. The three fusion schemes (except the majority vote that gives an average performance of the three classifiers) show the classification performance that is better than or equal to that of the best individual. The unavoidable correlation between the classifiers with binary outputs is observed in this study. We conclude that it is the correlation between the classifiers that limits the fusion schemes to achieve an even better performance.
Recognition Using Hybrid Classifiers.
Osadchy, Margarita; Keren, Daniel; Raviv, Dolev
2016-04-01
A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
An Automatic Diagnosis Method of Facial Acne Vulgaris Based on Convolutional Neural Network.
Shen, Xiaolei; Zhang, Jiachi; Yan, Chenjun; Zhou, Hong
2018-04-11
In this paper, we present a new automatic diagnosis method for facial acne vulgaris which is based on convolutional neural networks (CNNs). To overcome the shortcomings of previous methods which were the inability to classify enough types of acne vulgaris. The core of our method is to extract features of images based on CNNs and achieve classification by classifier. A binary-classifier of skin-and-non-skin is used to detect skin area and a seven-classifier is used to achieve the classification task of facial acne vulgaris and healthy skin. In the experiments, we compare the effectiveness of our CNN and the VGG16 neural network which is pre-trained on the ImageNet data set. We use a ROC curve to evaluate the performance of binary-classifier and use a normalized confusion matrix to evaluate the performance of seven-classifier. The results of our experiments show that the pre-trained VGG16 neural network is effective in extracting features from facial acne vulgaris images. And the features are very useful for the follow-up classifiers. Finally, we try applying the classifiers both based on the pre-trained VGG16 neural network to assist doctors in facial acne vulgaris diagnosis.
A simple atomic-level hydrophobicity scale reveals protein interfacial structure.
Kapcha, Lauren H; Rossky, Peter J
2014-01-23
Many amino acid residue hydrophobicity scales have been created in an effort to better understand and rapidly characterize water-protein interactions based only on protein structure and sequence. There is surprisingly low consistency in the ranking of residue hydrophobicity between scales, and their ability to provide insightful characterization varies substantially across subject proteins. All current scales characterize hydrophobicity based on entire amino acid residue units. We introduce a simple binary but atomic-level hydrophobicity scale that allows for the classification of polar and non-polar moieties within single residues, including backbone atoms. This simple scale is first shown to capture the anticipated hydrophobic character for those whole residues that align in classification among most scales. Examination of a set of protein binding interfaces establishes good agreement between residue-based and atomic-level descriptions of hydrophobicity for five residues, while the remaining residues produce discrepancies. We then show that the atomistic scale properly classifies the hydrophobicity of functionally important regions where residue-based scales fail. To illustrate the utility of the new approach, we show that the atomic-level scale rationalizes the hydration of two hydrophobic pockets and the presence of a void in a third pocket within a single protein and that it appropriately classifies all of the functionally important hydrophilic sites within two otherwise hydrophobic pores. We suggest that an atomic level of detail is, in general, necessary for the reliable depiction of hydrophobicity for all protein surfaces. The present formulation can be implemented simply in a manner no more complex than current residue-based approaches. © 2013.
Spectral types of four binaries based on photometric observations
NASA Astrophysics Data System (ADS)
Shimanskii, V. V.; Bikmaev, I. F.; Borisov, N. V.; Vlasyuk, V. V.; Galeev, A. I.; Sakhibullin, N. A.; Spiridonova, O. I.
2008-09-01
We present results of photometric and spectroscopic observations of four close binaries with subdwarf B components: PG 0918+029, PG 1000+408, PG 1116+301, PG 0001+275. We discovered that PG 1000+408 is a close binary, with the most probable orbital period being P orb = 1.041145 day. Based on a comparison of the observed light curves at selected orbital phases and theoretical predictions for their variations, all the systems are classified as doubly degenerate binaries with low-luminosity white-dwarf secondaries.
Spectroscopic classification of X-ray sources in the Galactic Bulge Survey
NASA Astrophysics Data System (ADS)
Wevers, T.; Torres, M. A. P.; Jonker, P. G.; Nelemans, G.; Heinke, C.; Mata Sánchez, D.; Johnson, C. B.; Gazer, R.; Steeghs, D. T. H.; Maccarone, T. J.; Hynes, R. I.; Casares, J.; Udalski, A.; Wetuski, J.; Britt, C. T.; Kostrzewa-Rutkowska, Z.; Wyrzykowski, Ł.
2017-10-01
We present the classification of 26 optical counterparts to X-ray sources discovered in the Galactic Bulge Survey. We use (time-resolved) photometric and spectroscopic observations to classify the X-ray sources based on their multiwavelength properties. We find a variety of source classes, spanning different phases of stellar/binary evolution. We classify CX21 as a quiescent cataclysmic variable (CV) below the period gap, and CX118 as a high accretion rate (nova-like) CV. CXB12 displays excess UV emission, and could contain a compact object with a giant star companion, making it a candidate symbiotic binary or quiescent low-mass X-ray binary (although other scenarios cannot be ruled out). CXB34 is a magnetic CV (polar) that shows photometric evidence for a change in accretion state. The magnetic classification is based on the detection of X-ray pulsations with a period of 81 ± 2 min. CXB42 is identified as a young stellar object, namely a weak-lined T Tauri star exhibiting (to date unexplained) UX Ori-like photometric variability. The optical spectrum of CXB43 contains two (resolved) unidentified double-peaked emission lines. No known scenario, such as an active galactic nucleus or symbiotic binary, can easily explain its characteristics. We additionally classify 20 objects as likely active stars based on optical spectroscopy, their X-ray to optical flux ratios and photometric variability. In four cases we identify the sources as binary stars.
Zheng, Wenjing; Balzer, Laura; van der Laan, Mark; Petersen, Maya
2018-01-30
Binary classification problems are ubiquitous in health and social sciences. In many cases, one wishes to balance two competing optimality considerations for a binary classifier. For instance, in resource-limited settings, an human immunodeficiency virus prevention program based on offering pre-exposure prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program. In this article, we consider a general class of constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold. These include the minimization of the rate of positive predictions subject to a minimum sensitivity, the maximization of sensitivity subject to a maximum rate of positive predictions, and the Neyman-Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner methodology. This approach linearly combines a user-supplied library of scoring algorithms, with combination weights and a discriminating threshold chosen to minimize the constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individualized PrEP targeting strategy in a resource-limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Pilania, G.; Gubernatis, J. E.; Lookman, T.
2015-12-03
The role of dynamical (or Born effective) charges in classification of octet AB-type binary compounds between four-fold (zincblende/wurtzite crystal structures) and six-fold (rocksalt crystal structure) coordinated systems is discussed. We show that the difference in the dynamical charges of the fourfold and sixfold coordinated structures, in combination with Harrison’s polarity, serves as an excellent feature to classify the coordination of 82 sp–bonded binary octet compounds. We use a support vector machine classifier to estimate the average classification accuracy and the associated variance in our model where a decision boundary is learned in a supervised manner. Lastly, we compare the out-of-samplemore » classification accuracy achieved by our feature pair with those reported previously.« less
The binary protein-protein interaction landscape of Escherichia coli
Rajagopala, Seesandra V.; Vlasblom, James; Arnold, Roland; Franca-Koh, Jonathan; Pakala, Suman B.; Phanse, Sadhna; Ceol, Arnaud; Häuser, Roman; Siszler, Gabriella; Wuchty, Stefan; Emili, Andrew; Babu, Mohan; Aloy, Patrick; Pieper, Rembert; Uetz, Peter
2014-01-01
Efforts to map the Escherichia coli interactome have identified several hundred macromolecular complexes, but direct binary protein-protein interactions (PPIs) have not been surveyed on a large scale. Here we performed yeast two-hybrid screens of 3,305 baits against 3,606 preys (~70% of the E. coli proteome) in duplicate to generate a map of 2,234 interactions, approximately doubling the number of known binary PPIs in E. coli. Integration of binary PPIs and genetic interactions revealed functional dependencies among components involved in cellular processes, including envelope integrity, flagellum assembly and protein quality control. Many of the binary interactions that could be mapped within multi-protein complexes were informative regarding internal topology and indicated that interactions within complexes are significantly more conserved than those interactions connecting different complexes. This resource will be useful for inferring bacterial gene function and provides a draft reference of the basic physical wiring network of this evolutionarily significant model microbe. PMID:24561554
Wang, Xueyi; Davidson, Nicholas J.
2011-01-01
Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed. PMID:21853162
Optimal threshold estimation for binary classifiers using game theory.
Sanchez, Ignacio Enrique
2016-01-01
Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.
Class-specific Error Bounds for Ensemble Classifiers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prenger, R; Lemmond, T; Varshney, K
2009-10-06
The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missedmore » detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.« less
NASA Astrophysics Data System (ADS)
Terheide, Rachel; Zhang, Liyun; Han, Xianming; Lu, Hongpeng
2018-01-01
We present full-phase VRI-band light curves for eclipsing binary 1SWASP J061850.43+220511.9, and full-phase BVRI-band light curves for eclipsing binary 2MASS J07095549+3643564. The observations were conducted using the 0.94-m Holcomb Observatory telescope located on Butler University Campus in Indianapolis, Indiana, and the 0.6-m SARA telescope located at the Cerro Tololo Inter-American Observatory in Chile. We obtained key system parameters for both eclipsing binaries. For 1SWASP J061850.43+220511.9, the period is 0.21482 ±0.00053 days compared to 0.21439 days from an older study (Lohr et. al), the system mass ratio is found as 2.50 and the system is classified as EW type. Similarly, for 2MASS J07095549+3643564, we obtained a linear ephemeris and a physical model for the first time. We found its period to be 0.22297 ±0.00032 days, as compared to 0.446092 days and 0.11152 days from previous research (Drake et. al 2014, Hartman et. al 2011). 2MASS J07095549+3643564 is classified as a W Uma type eclipsing binary.
Contamination of RR Lyrae stars from Binary Evolution Pulsators
NASA Astrophysics Data System (ADS)
Karczmarek, Paulina; Pietrzyński, Grzegorz; Belczyński, Krzysztof; Stępień, Kazimierz; Wiktorowicz, Grzegorz; Iłkiewicz, Krystian
2016-06-01
Binary Evolution Pulsator (BEP) is an extremely low-mass member of a binary system, which pulsates as a result of a former mass transfer to its companion. BEP mimics RR Lyrae-type pulsations but has different internal structure and evolution history. We present possible evolution channels to produce BEPs, and evaluate the contamination value, i.e. how many objects classified as RR Lyrae stars can be undetected BEPs. In this analysis we use population synthesis code StarTrack.
Classification of X-ray sources in the direction of M31
NASA Astrophysics Data System (ADS)
Vasilopoulos, G.; Hatzidimitriou, D.; Pietsch, W.
2012-01-01
M31 is our nearest spiral galaxy, at a distance of 780 kpc. Identification of X-ray sources in nearby galaxies is important for interpreting the properties of more distant ones, mainly because we can classify nearby sources using both X-ray and optical data, while more distant ones via X-rays alone. The XMM-Newton Large Project for M31 has produced an abundant sample of about 1900 X-ray sources in the direction of M31. Most of them remain elusive, giving us little signs of their origin. Our goal is to classify these sources using criteria based on properties of already identified ones. In particular we construct candidate lists of high mass X-ray binaries, low mass X-ray binaries, X-ray binaries correlated with globular clusters and AGN based on their X-ray emission and the properties of their optical counterparts, if any. Our main methodology consists of identifying particular loci of X-ray sources on X-ray hardness ratio diagrams and the color magnitude diagrams of their optical counterparts. Finally, we examined the X-ray luminosity function of the X-ray binaries populations.
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns.
Haghnegahdar, A A; Kolahi, S; Khojastepour, L; Tajeripour, F
2018-03-01
Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages.
Wang, Xinglong; Rak, Rafal; Restificar, Angelo; Nobata, Chikashi; Rupp, C J; Batista-Navarro, Riza Theresa B; Nawaz, Raheel; Ananiadou, Sophia
2011-10-03
The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest's Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task's development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew's Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.
Mapping spatial patterns with morphological image processing
Peter Vogt; Kurt H. Riitters; Christine Estreguil; Jacek Kozak; Timothy G. Wade; James D. Wickham
2006-01-01
We use morphological image processing for classifying spatial patterns at the pixel level on binary land-cover maps. Land-cover pattern is classified as 'perforated,' 'edge,' 'patch,' and 'core' with higher spatial precision and thematic accuracy compared to a previous approach based on image convolution, while retaining the...
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
Amidi, Afshine; Megalooikonomou, Vasileios; Paragios, Nikos
2018-01-01
During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet. PMID:29740518
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation.
Amidi, Afshine; Amidi, Shervine; Vlachakis, Dimitrios; Megalooikonomou, Vasileios; Paragios, Nikos; Zacharaki, Evangelia I
2018-01-01
During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.
OpenCL based machine learning labeling of biomedical datasets
NASA Astrophysics Data System (ADS)
Amoros, Oscar; Escalera, Sergio; Puig, Anna
2011-03-01
In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and labeling speeds.
Yang, Fan; Xu, Ying-Ying; Shen, Hong-Bin
2014-01-01
Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
Analysis of the statistical thermodynamic model for nonlinear binary protein adsorption equilibria.
Zhou, Xiao-Peng; Su, Xue-Li; Sun, Yan
2007-01-01
The statistical thermodynamic (ST) model was used to study nonlinear binary protein adsorption equilibria on an anion exchanger. Single-component and binary protein adsorption isotherms of bovine hemoglobin (Hb) and bovine serum albumin (BSA) on DEAE Spherodex M were determined by batch adsorption experiments in 10 mM Tris-HCl buffer containing a specific NaCl concentration (0.05, 0.10, and 0.15 M) at pH 7.40. The ST model was found to depict the effect of ionic strength on the single-component equilibria well, with model parameters depending on ionic strength. Moreover, the ST model gave acceptable fitting to the binary adsorption data with the fitted single-component model parameters, leading to the estimation of the binary ST model parameter. The effects of ionic strength on the model parameters are reasonably interpreted by the electrostatic and thermodynamic theories. The effective charge of protein in adsorption phase can be separately calculated from the two categories of the model parameters, and the values obtained from the two methods are consistent. The results demonstrate the utility of the ST model for describing nonlinear binary protein adsorption equilibria.
Properties OF M31. V. 298 eclipsing binaries from PAndromeda
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, C.-H.; Koppenhoefer, J.; Seitz, S.
2014-12-10
The goal of this work is to conduct a photometric study of eclipsing binaries in M31. We apply a modified box-fitting algorithm to search for eclipsing binary candidates and determine their period. We classify these candidates into detached, semi-detached, and contact systems using the Fourier decomposition method. We cross-match the position of our detached candidates with the photometry from Local Group Survey and select 13 candidates brighter than 20.5 mag in V. The relative physical parameters of these detached candidates are further characterized with the Detached Eclipsing Binary Light curve fitter (DEBiL) by Devor. We will follow up the detachedmore » eclipsing binaries spectroscopically and determine the distance to M31.« less
Support vector machine as a binary classifier for automated object detection in remotely sensed data
NASA Astrophysics Data System (ADS)
Wardaya, P. D.
2014-02-01
In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.
Slit identification for a uranium slab using a binary classifier based on cosmic-ray muon scattering
NASA Astrophysics Data System (ADS)
Xiao, S.; He, W.; Chen, Y.; Dang, X.; Wu, L.; Shuai, M.
2017-12-01
Traditional muon tomographic method has been fraught with difficulty when it is applied to identify some defective high-Z objects or other complicated structures, since it usually gets into trouble when attempting to produce a precise three-dimensional image for such objects. In this paper, we present a binary classifier based on cosmic-ray muon scattering to identify the slit potentially located in a uranium slab. The superiority of this classifier is established by steering clear of the stubborn imaging procedure necessary for the conventional methods. Simulation results demonstrate its capability to spot a horizontal or vertical slit with a reasonable exposure time. The minimum width of a spotted slit is on the level of millimeters or even sub-millimeters. Therefore, this technique will be prospective in terms of monitoring the long-term status of nuclear storage and facilities in real life.
Planetary Nebulae that Cannot Be Explained by Binary Systems
NASA Astrophysics Data System (ADS)
Bear, Ealeal; Soker, Noam
2017-03-01
We examine the images of hundreds of planetary nebulae (PNe) and find that for about one in six PNe the morphology is too “messy” to be accounted for by models of stellar binary interaction. We speculate that interacting triple stellar systems shaped these PNe. In this preliminary study, we qualitatively classify PNe by one of four categories. (1) PNe that show no need for a tertiary star to account for their morphology. (2) PNe whose structure possesses a pronounced departure from axial-symmetry and/or mirror-symmetry. We classify these, according to our speculation, as “having a triple stellar progenitor.” (3) PNe whose morphology possesses departure from axial-symmetry and/or mirror-symmetry, but not as pronounced as in the previous class, and are classified as “likely shaped by triple stellar system.” (4) PNe with minor departure from axial-symmetry and/or mirror-symmetry that could have been also caused by an eccentric binary system or the interstellar medium. These are classified as “maybe shaped by a triple stellar system.” Given a weight η t = 1, η l = 0.67, and η m = 0.33 to classes 2, 3, and 4, respectively, we find that according to our assumption about 13%-21% of PNe have been shaped by triple stellar systems. Although in some evolutionary scenarios not all three stars survive the evolution, we encourage the search for a triple stellar systems at the center of some PNe.
Diagnosis of Tempromandibular Disorders Using Local Binary Patterns
Haghnegahdar, A.A.; Kolahi, S.; Khojastepour, L.; Tajeripour, F.
2018-01-01
Background: Temporomandibular joint disorder (TMD) might be manifested as structural changes in bone through modification, adaptation or direct destruction. We propose to use Local Binary Pattern (LBP) characteristics and histogram-oriented gradients on the recorded images as a diagnostic tool in TMD assessment. Material and Methods: CBCT images of 66 patients (132 joints) with TMD and 66 normal cases (132 joints) were collected and 2 coronal cut prepared from each condyle, although images were limited to head of mandibular condyle. In order to extract features of images, first we use LBP and then histogram of oriented gradients. To reduce dimensionality, the linear algebra Singular Value Decomposition (SVD) is applied to the feature vectors matrix of all images. For evaluation, we used K nearest neighbor (K-NN), Support Vector Machine, Naïve Bayesian and Random Forest classifiers. We used Receiver Operating Characteristic (ROC) to evaluate the hypothesis. Results: K nearest neighbor classifier achieves a very good accuracy (0.9242), moreover, it has desirable sensitivity (0.9470) and specificity (0.9015) results, when other classifiers have lower accuracy, sensitivity and specificity. Conclusion: We proposed a fully automatic approach to detect TMD using image processing techniques based on local binary patterns and feature extraction. K-NN has been the best classifier for our experiments in detecting patients from healthy individuals, by 92.42% accuracy, 94.70% sensitivity and 90.15% specificity. The proposed method can help automatically diagnose TMD at its initial stages. PMID:29732343
DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information
ERIC Educational Resources Information Center
McCallister, Gary
2005-01-01
The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)
Applications Of Binary Image Analysis Techniques
NASA Astrophysics Data System (ADS)
Tropf, H.; Enderle, E.; Kammerer, H. P.
1983-10-01
After discussing the conditions where binary image analysis techniques can be used, three new applications of the fast binary image analysis system S.A.M. (Sensorsystem for Automation and Measurement) are reported: (1) The human view direction is measured at TV frame rate while the subject's head is free movable. (2) Industrial parts hanging on a moving conveyor are classified prior to spray painting by robot. (3) In automotive wheel assembly, the eccentricity of the wheel is minimized by turning the tyre relative to the rim in order to balance the eccentricity of the components.
Optical Neural Classification Of Binary Patterns
NASA Astrophysics Data System (ADS)
Gustafson, Steven C.; Little, Gordon R.
1988-05-01
Binary pattern classification that may be implemented using optical hardware and neural network algorithms is considered. Pattern classification problems that have no concise description (as in classifying handwritten characters) or no concise computation (as in NP-complete problems) are expected to be particularly amenable to this approach. For example, optical processors that efficiently classify binary patterns in accordance with their Boolean function complexity might be designed. As a candidate for such a design, an optical neural network model is discussed that is designed for binary pattern classification and that consists of an optical resonator with a dynamic multiplex-recorded reflection hologram and a phase conjugate mirror with thresholding and gain. In this model, learning or training examples of binary patterns may be recorded on the hologram such that one bit in each pattern marks the pattern class. Any input pattern, including one with an unknown class or marker bit, will be modified by a large number of parallel interactions with the reflection hologram and nonlinear mirror. After perhaps several seconds and 100 billion interactions, a steady-state pattern may develop with a marker bit that represents a minimum-Boolean-complexity classification of the input pattern. Computer simulations are presented that illustrate progress in understanding the behavior of this model and in developing a processor design that could have commanding and enduring performance advantages compared to current pattern classification techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patel, Sanjay V.; Jenkins, Mark W.; Hughes, Robert C.
1999-07-19
We demonstrate a ''universal solvent sensor'' constructed from a small array of carbon/polymer composite chemiresistors that respond to solvents spanning a wide range of Hildebrand volubility parameters. Conductive carbon particles provide electrical continuity in these composite films. When the polymer matrix absorbs solvent vapors, the composite film swells, the average separation between carbon particles increases, and an increase in film resistance results, as some of the conduction pathways are broken. The adverse effects of contact resistance at high solvent concentrations are reported. Solvent vapors including isooctane, ethanol, dlisopropyhnethylphosphonate (DIMP), and water are correctly identified (''classified'') using three chemiresistors, their compositemore » coatings chosen to span the full range of volubility parameters. With the same three sensors, binary mixtures of solvent vapor and water vapor are correctly classified, following classification, two sensors suffice to determine the concentrations of both vapor components. Polyethylene vinylacetate and polyvinyl alcohol (PVA) are two such polymers that are used to classify binary mixtures of DIMP with water vapor; the PVA/carbon-particle-composite films are sensitive to less than 0.25{degree}A relative humidity. The Sandia-developed VERI (Visual-Empirical Region of Influence) technique is used as a method of pattern recognition to classify the solvents and mixtures and to distinguish them from water vapor. In many cases, the response of a given composite sensing film to a binary mixture deviates significantly from the sum of the responses to the isolated vapor components at the same concentrations. While these nonlinearities pose significant difficulty for (primarily) linear methods such as principal components analysis, VERI handles both linear and nonlinear data with equal ease. In the present study the maximum speciation accuracy is achieved by an array containing three or four sensor elements, with the addition of more sensors resulting in a measurable accuracy decrease.« less
Protein domain definition should allow for conditional disorder
Yegambaram, Kavestri; Bulloch, Esther MM; Kingston, Richard L
2013-01-01
Abstract: Proteins are often classified in a binary fashion as either structured or disordered. However this approach has several deficits. Firstly, protein folding is always conditional on the physiochemical environment. A protein which is structured in some circumstances will be disordered in others. Secondly, it hides a fundamental asymmetry in behavior. While all structured proteins can be unfolded through a change in environment, not all disordered proteins have the capacity for folding. Failure to accommodate these complexities confuses the definition of both protein structural domains and intrinsically disordered regions. We illustrate these points with an experimental study of a family of small binding domains, drawn from the RNA polymerase of mumps virus and its closest relatives. Assessed at face value the domains fall on a structural continuum, with folded, partially folded, and near unstructured members. Yet the disorder present in the family is conditional, and these closely related polypeptides can access the same folded state under appropriate conditions. Any heuristic definition of the protein domain emphasizing conformational stability divides this domain family in two, in a way that makes no biological sense. Structural domains would be better defined by their ability to adopt a specific tertiary structure: a structure that may or may not be realized, dependent on the circumstances. This explicitly allows for the conditional nature of protein folding, and more clearly demarcates structural domains from intrinsically disordered regions that may function without folding. PMID:23963781
NASA Astrophysics Data System (ADS)
Hwang, Han-Jeong; Choi, Han; Kim, Jeong-Youn; Chang, Won-Du; Kim, Do-Won; Kim, Kiwoong; Jo, Sungho; Im, Chang-Hwan
2016-09-01
In traditional brain-computer interface (BCI) studies, binary communication systems have generally been implemented using two mental tasks arbitrarily assigned to "yes" or "no" intentions (e.g., mental arithmetic calculation for "yes"). A recent pilot study performed with one paralyzed patient showed the possibility of a more intuitive paradigm for binary BCI communications, in which the patient's internal yes/no intentions were directly decoded from functional near-infrared spectroscopy (fNIRS). We investigated whether such an "fNIRS-based direct intention decoding" paradigm can be reliably used for practical BCI communications. Eight healthy subjects participated in this study, and each participant was administered 70 disjunctive questions. Brain hemodynamic responses were recorded using a multichannel fNIRS device, while the participants were internally expressing "yes" or "no" intentions to each question. Different feature types, feature numbers, and time window sizes were tested to investigate optimal conditions for classifying the internal binary intentions. About 75% of the answers were correctly classified when the individual best feature set was employed (75.89% ±1.39 and 74.08% ±2.87 for oxygenated and deoxygenated hemoglobin responses, respectively), which was significantly higher than a random chance level (68.57% for p<0.001). The kurtosis feature showed the highest mean classification accuracy among all feature types. The grand-averaged hemodynamic responses showed that wide brain regions are associated with the processing of binary implicit intentions. Our experimental results demonstrated that direct decoding of internal binary intention has the potential to be used for implementing more intuitive and user-friendly communication systems for patients with motor disabilities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hillwig, Todd C.; Schaub, S. C.; Bond, Howard E.
We explore the photometrically variable central stars of the planetary nebulae HaTr 4 and Hf 2-2. Both have been classified as close binary star systems previously based on their light curves alone. Here, we present additional arguments and data confirming the identification of both as close binaries with an irradiated cool companion to the hot central star. We include updated light curves, orbital periods, and preliminary binary modeling for both systems. We also identify for the first time the central star of HaTr 4 as an eclipsing binary. Neither system has been well studied in the past, but we utilizemore » the small amount of existing data to limit possible binary parameters, including system inclination. These parameters are then compared to nebular parameters to further our knowledge of the relationship between binary central stars of planetary nebulae and nebular shaping and ejection.« less
Rebehmed, Joseph; Quintus, Flavien; Mornon, Jean-Paul; Callebaut, Isabelle
2016-05-01
Several studies have highlighted the leading role of the sequence periodicity of polar and nonpolar amino acids (binary patterns) in the formation of regular secondary structures (RSS). However, these were based on the analysis of only a few simple cases, with no direct mean to correlate binary patterns with the limits of RSS. Here, HCA-derived hydrophobic clusters (HC) which are conditioned binary patterns whose positions fit well those of RSS, were considered. All the HC types, defined by unique binary patterns, which were commonly observed in three-dimensional (3D) structures of globular domains, were analyzed. The 180 HC types with preferences for either α-helices or β-strands distinctly contain basic binary units typical of these RSS. Therefore a general trend supporting the "binary pattern preference" assumption was observed. HC for which observed RSS are in disagreement with their expected behavior (discordant HC) were also examined. They were separated in HC types with moderate preferences for RSS, having "weak" binary patterns and versatile RSS and HC types with high preferences for RSS, having "strong" binary patterns and then displaying nonpolar amino acids at the protein surface. It was shown that in both cases, discordant HC could be distinguished from concordant ones by well-differentiated amino acid compositions. The obtained results could, thus, help to complement the currently available methods for the accurate prediction of secondary structures in proteins from the only information of a single amino acid sequence. This can be especially useful for characterizing orphan sequences and for assisting protein engineering and design. © 2016 Wiley Periodicals, Inc.
Automatic classification of protein structures relying on similarities between alignments
2012-01-01
Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. PMID:22974051
Hasan, Md Mehedi; Khatun, Mst Shamima; Mollah, Md Nurul Haque; Yong, Cao; Guo, Dianjing
2017-01-01
Lysine succinylation, an important type of protein posttranslational modification, plays significant roles in many cellular processes. Accurate identification of succinylation sites can facilitate our understanding about the molecular mechanism and potential roles of lysine succinylation. However, even in well-studied systems, a majority of the succinylation sites remain undetected because the traditional experimental approaches to succinylation site identification are often costly, time-consuming, and laborious. In silico approach, on the other hand, is potentially an alternative strategy to predict succinylation substrates. In this paper, a novel computational predictor SuccinSite2.0 was developed for predicting generic and species-specific protein succinylation sites. This predictor takes the composition of profile-based amino acid and orthogonal binary features, which were used to train a random forest classifier. We demonstrated that the proposed SuccinSite2.0 predictor outperformed other currently existing implementations on a complementarily independent dataset. Furthermore, the important features that make visible contributions to species-specific and cross-species-specific prediction of protein succinylation site were analyzed. The proposed predictor is anticipated to be a useful computational resource for lysine succinylation site prediction. The integrated species-specific online tool of SuccinSite2.0 is publicly accessible.
2011-01-01
Background The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. Results We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Conclusions Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance. PMID:22151769
Automatic construction of a recurrent neural network based classifier for vehicle passage detection
NASA Astrophysics Data System (ADS)
Burnaev, Evgeny; Koptelov, Ivan; Novikov, German; Khanipov, Timur
2017-03-01
Recurrent Neural Networks (RNNs) are extensively used for time-series modeling and prediction. We propose an approach for automatic construction of a binary classifier based on Long Short-Term Memory RNNs (LSTM-RNNs) for detection of a vehicle passage through a checkpoint. As an input to the classifier we use multidimensional signals of various sensors that are installed on the checkpoint. Obtained results demonstrate that the previous approach to handcrafting a classifier, consisting of a set of deterministic rules, can be successfully replaced by an automatic RNN training on an appropriately labelled data.
Hwang, Han-Jeong; Choi, Han; Kim, Jeong-Youn; Chang, Won-Du; Kim, Do-Won; Kim, Kiwoong; Jo, Sungho; Im, Chang-Hwan
2016-09-01
In traditional brain-computer interface (BCI) studies, binary communication systems have generally been implemented using two mental tasks arbitrarily assigned to “yes” or “no” intentions (e.g., mental arithmetic calculation for “yes”). A recent pilot study performed with one paralyzed patient showed the possibility of a more intuitive paradigm for binary BCI communications, in which the patient’s internal yes/no intentions were directly decoded from functional near-infrared spectroscopy (fNIRS). We investigated whether such an “fNIRS-based direct intention decoding” paradigm can be reliably used for practical BCI communications. Eight healthy subjects participated in this study, and each participant was administered 70 disjunctive questions. Brain hemodynamic responses were recorded using a multichannel fNIRS device, while the participants were internally expressing “yes” or “no” intentions to each question. Different feature types, feature numbers, and time window sizes were tested to investigate optimal conditions for classifying the internal binary intentions. About 75% of the answers were correctly classified when the individual best feature set was employed (75.89% ± 1.39 and 74.08% ± 2.87 for oxygenated and deoxygenated hemoglobin responses, respectively), which was significantly higher than a random chance level (68.57% for p < 0.001). The kurtosis feature showed the highest mean classification accuracy among all feature types. The grand-averaged hemodynamic responses showed that wide brain regions are associated with the processing of binary implicit intentions. Our experimental results demonstrated that direct decoding of internal binary intention has the potential to be used for implementing more intuitive and user-friendly communication systems for patients with motor disabilities.
Deng, Lei; Wu, Hongjie; Liu, Chuyao; Zhan, Weihua; Zhang, Jingpu
2018-06-01
Long non-coding RNAs (lncRNAs) are involved in many biological processes, such as immune response, development, differentiation and gene imprinting and are associated with diseases and cancers. But the functions of the vast majority of lncRNAs are still unknown. Predicting the biological functions of lncRNAs is one of the key challenges in the post-genomic era. In our work, We first build a global network including a lncRNA similarity network, a lncRNA-protein association network and a protein-protein interaction network according to the expressions and interactions, then extract the topological feature vectors of the global network. Using these features, we present an SVM-based machine learning approach, PLNRGO, to annotate human lncRNAs. In PLNRGO, we construct a training data set according to the proteins with GO annotations and train a binary classifier for each GO term. We assess the performance of PLNRGO on our manually annotated lncRNA benchmark and a protein-coding gene benchmark with known functional annotations. As a result, the performance of our method is significantly better than that of other state-of-the-art methods in terms of maximum F-measure and coverage. Copyright © 2018 Elsevier Ltd. All rights reserved.
An ensemble of SVM classifiers based on gene pairs.
Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin
2013-07-01
In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Suzuki, Izumi; Mikami, Yoshiki; Ohsato, Ario
A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.
Wang, Lei; Pedersen, Peder C; Agu, Emmanuel; Strong, Diane M; Tulu, Bengisu
2017-09-01
The standard chronic wound assessment method based on visual examination is potentially inaccurate and also represents a significant clinical workload. Hence, computer-based systems providing quantitative wound assessment may be valuable for accurately monitoring wound healing status, with the wound area the best suited for automated analysis. Here, we present a novel approach, using support vector machines (SVM) to determine the wound boundaries on foot ulcer images captured with an image capture box, which provides controlled lighting and range. After superpixel segmentation, a cascaded two-stage classifier operates as follows: in the first stage, a set of k binary SVM classifiers are trained and applied to different subsets of the entire training images dataset, and incorrectly classified instances are collected. In the second stage, another binary SVM classifier is trained on the incorrectly classified set. We extracted various color and texture descriptors from superpixels that are used as input for each stage in the classifier training. Specifically, color and bag-of-word representations of local dense scale invariant feature transformation features are descriptors for ruling out irrelevant regions, and color and wavelet-based features are descriptors for distinguishing healthy tissue from wound regions. Finally, the detected wound boundary is refined by applying the conditional random field method. We have implemented the wound classification on a Nexus 5 smartphone platform, except for training which was done offline. Results are compared with other classifiers and show that our approach provides high global performance rates (average sensitivity = 73.3%, specificity = 94.6%) and is sufficiently efficient for a smartphone-based image analysis.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G
2018-05-15
Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Lure, Y. M. Fleming; Grody, Norman C.; Chiou, Y. S. Peter; Yeh, H. Y. Michael
1993-01-01
A data fusion system with artificial neural networks (ANN) is used for fast and accurate classification of five earth surface conditions and surface changes, based on seven SSMI multichannel microwave satellite measurements. The measurements include brightness temperatures at 19, 22, 37, and 85 GHz at both H and V polarizations (only V at 22 GHz). The seven channel measurements are processed through a convolution computation such that all measurements are located at same grid. Five surface classes including non-scattering surface, precipitation over land, over ocean, snow, and desert are identified from ground-truth observations. The system processes sensory data in three consecutive phases: (1) pre-processing to extract feature vectors and enhance separability among detected classes; (2) preliminary classification of Earth surface patterns using two separate and parallely acting classifiers: back-propagation neural network and binary decision tree classifiers; and (3) data fusion of results from preliminary classifiers to obtain the optimal performance in overall classification. Both the binary decision tree classifier and the fusion processing centers are implemented by neural network architectures. The fusion system configuration is a hierarchical neural network architecture, in which each functional neural net will handle different processing phases in a pipelined fashion. There is a total of around 13,500 samples for this analysis, of which 4 percent are used as the training set and 96 percent as the testing set. After training, this classification system is able to bring up the detection accuracy to 94 percent compared with 88 percent for back-propagation artificial neural networks and 80 percent for binary decision tree classifiers. The neural network data fusion classification is currently under progress to be integrated in an image processing system at NOAA and to be implemented in a prototype of a massively parallel and dynamically reconfigurable Modular Neural Ring (MNR).
The binary progenitors of short and long GRBs and their gravitational-wave emission
NASA Astrophysics Data System (ADS)
Rueda, J. A.; Ruffini, R.; Rodriguez, J. F.; Muccino, M.; Aimuratov, Y.; Barres de Almeida, U.; Becerra, L.; Bianco, C. L.; Cherubini, C.; Filippi, S.; Kovacevic, M.; Moradi, R.; Pisani, G. B.; Wang, Y.
2018-01-01
We have sub-classified short and long-duration gamma-ray bursts (GRBs) into seven families according to the binary nature of their progenitors. Short GRBs are produced in mergers of neutron-star binaries (NS-NS) or neutron star-black hole binaries (NS-BH). Long GRBs are produced via the induced gravitational collapse (IGC) scenario occurring in a tight binary system composed of a carbon-oxygen core (COcore) and a NS companion. The COcore explodes as type Ic supernova (SN) leading to a hypercritical accretion process onto the NS: if the accretion is sufficiently high the NS reaches the critical mass and collapses forming a BH, otherwise a massive NS is formed. Therefore long GRBs can lead either to NS-BH or to NS-NS binaries depending on the entity of the accretion. We discuss for the above compact-object binaries: 1) the role of the NS structure and the nuclear equation of state; 2) the occurrence rates obtained from X and gamma-rays observations; 3) the predicted annual number of detections by the Advanced LIGO interferometer of their gravitational-wave emission.
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
out of charity, but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project...identification, and machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ...Support Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F
Cellular-automata-based learning network for pattern recognition
NASA Astrophysics Data System (ADS)
Tzionas, Panagiotis G.; Tsalides, Phillippos G.; Thanailakis, Adonios
1991-11-01
Most classification techniques either adopt an approach based directly on the statistical characteristics of the pattern classes involved, or they transform the patterns in a feature space and try to separate the point clusters in this space. An alternative approach based on memory networks has been presented, its novelty being that it can be implemented in parallel and it utilizes direct features of the patterns rather than statistical characteristics. This study presents a new approach for pattern classification using pseudo 2-D binary cellular automata (CA). This approach resembles the memory network classifier in the sense that it is based on an adaptive knowledge based formed during a training phase, and also in the fact that both methods utilize pattern features that are directly available. The main advantage of this approach is that the sensitivity of the pattern classifier can be controlled. The proposed pattern classifier has been designed using 1.5 micrometers design rules for an N-well CMOS process. Layout has been achieved using SOLO 1400. Binary pseudo 2-D hybrid additive CA (HACA) is described in the second section of this paper. The third section describes the operation of the pattern classifier and the fourth section presents some possible applications. The VLSI implementation of the pattern classifier is presented in the fifth section and, finally, the sixth section draws conclusions from the results obtained.
Phan, Thanh G; Chen, Jian; Beare, Richard; Ma, Henry; Clissold, Benjamin; Van Ly, John; Srikanth, Velandai
2017-01-01
Prognostication following intracerebral hemorrhage (ICH) has focused on poor outcome at the expense of lumping together mild and moderate disability. We aimed to develop a novel approach at classifying a range of disability following ICH. The Virtual International Stroke Trial Archive collaboration database was searched for patients with ICH and known volume of ICH on baseline CT scans. Disability was partitioned into mild [modified Rankin Scale (mRS) at 90 days of 0-2], moderate (mRS = 3-4), and severe disabilities (mRS = 5-6). We used binary and trichotomy decision tree methodology. The data were randomly divided into training (2/3 of data) and validation (1/3 data) datasets. The area under the receiver operating characteristic curve (AUC) was used to calculate the accuracy of the decision tree model. We identified 957 patients, age 65.9 ± 12.3 years, 63.7% males, and ICH volume 22.6 ± 22.1 ml. The binary tree showed that lower ICH volume (<13.7 ml), age (<66.5 years), serum glucose (<8.95 mmol/l), and systolic blood pressure (<170 mm Hg) discriminate between mild versus moderate-to-severe disabilities with AUC of 0.79 (95% CI 0.73-0.85). Large ICH volume (>27.9 ml), older age (>69.5 years), and low Glasgow Coma Scale (<15) classify severe disability with AUC of 0.80 (95% CI 0.75-0.86). The trichotomy tree showed that ICH volume, age, and serum glucose can separate mild, moderate, and severe disability groups with AUC 0.79 (95% CI 0.71-0.87). Both the binary and trichotomy methods provide equivalent discrimination of disability outcome after ICH. The trichotomy method can classify three categories at once, whereas this action was not possible with the binary method. The trichotomy method may be of use to clinicians and trialists for classifying a range of disability in ICH.
Bhaduri, Aritra; Banerjee, Amitava; Roy, Subhrajit; Kar, Sougata; Basu, Arindam
2018-03-01
We present a neuromorphic current mode implementation of a spiking neural classifier with lumped square law dendritic nonlinearity. It has been shown previously in software simulations that such a system with binary synapses can be trained with structural plasticity algorithms to achieve comparable classification accuracy with fewer synaptic resources than conventional algorithms. We show that even in real analog systems with manufacturing imperfections (CV of 23.5% and 14.4% for dendritic branch gains and leaks respectively), this network is able to produce comparable results with fewer synaptic resources. The chip fabricated in [Formula: see text]m complementary metal oxide semiconductor has eight dendrites per cell and uses two opposing cells per class to cancel common-mode inputs. The chip can operate down to a [Formula: see text] V and dissipates 19 nW of static power per neuronal cell and [Formula: see text] 125 pJ/spike. For two-class classification problems of high-dimensional rate encoded binary patterns, the hardware achieves comparable performance as software implementation of the same with only about a 0.5% reduction in accuracy. On two UCI data sets, the IC integrated circuit has classification accuracy comparable to standard machine learners like support vector machines and extreme learning machines while using two to five times binary synapses. We also show that the system can operate on mean rate encoded spike patterns, as well as short bursts of spikes. To the best of our knowledge, this is the first attempt in hardware to perform classification exploiting dendritic properties and binary synapses.
Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah
2017-06-12
Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type = 2860, n_ SeverityLevel = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused with medium risk incidents (SAC3). Binary classifier ensembles appear to be a feasible method for identifying incidents by type and severity level. Automated identification should enable safety problems to be detected and addressed in a more timely manner. Multi-label classifiers may be necessary for reports that relate to more than one incident type.
Optimized p53 immunohistochemistry is an accurate predictor of TP53 mutation in ovarian carcinoma.
Köbel, Martin; Piskorz, Anna M; Lee, Sandra; Lui, Shuhong; LePage, Cecile; Marass, Francesco; Rosenfeld, Nitzan; Mes Masson, Anne-Marie; Brenton, James D
2016-10-01
TP53 mutations are ubiquitous in high-grade serous ovarian carcinomas (HGSOC), and the presence of TP53 mutation discriminates between high and low-grade serous carcinomas and is now an important biomarker for clinical trials targeting mutant p53. p53 immunohistochemistry (IHC) is widely used as a surrogate for TP53 mutation but its accuracy has not been established. The objective of this study was to test whether improved methods for p53 IHC could reliably predict TP53 mutations independently identified by next generation sequencing (NGS). Four clinical p53 IHC assays and tagged-amplicon NGS for TP53 were performed on 171 HGSOC and 80 endometrioid carcinomas (EC). p53 expression was scored as overexpression (OE), complete absence (CA), cytoplasmic (CY) or wild type (WT). p53 IHC was evaluated as a binary classifier where any abnormal staining predicted deleterious TP53 mutation and as a ternary classifier where OE, CA or WT staining predicted gain-of-function (GOF or nonsynonymous), loss-of-function (LOF including stopgain, indel, splicing) or no detectable TP53 mutations (NDM), respectively. Deleterious TP53 mutations were detected in 169/171 (99%) HGSOC and 7/80 (8.8%) EC. The overall accuracy for the best performing IHC assay for binary and ternary prediction was 0.94 and 0.91 respectively, which improved to 0.97 (sensitivity 0.96, specificity 1.00) and 0.95 after secondary analysis of discordant cases. The sensitivity for predicting LOF mutations was lower at 0.76 because p53 IHC detected mutant p53 protein in 13 HGSOC with LOF mutations. CY staining associated with LOF was seen in 4 (2.3%) of HGSOC. Optimized p53 IHC can approach 100% specificity for the presence of TP53 mutation and its high negative predictive value is clinically useful as it can exclude the possibility of a low-grade serous tumour. 4.1% of HGSOC cases have detectable WT staining while harboring a TP53 LOF mutation, which limits sensitivity for binary prediction of mutation to 96%.
Optimized p53 immunohistochemistry is an accurate predictor of TP53 mutation in ovarian carcinoma
Köbel, Martin; Piskorz, Anna M; Lee, Sandra; Lui, Shuhong; LePage, Cecile; Marass, Francesco; Rosenfeld, Nitzan; Mes Masson, Anne‐Marie
2016-01-01
Abstract TP53 mutations are ubiquitous in high‐grade serous ovarian carcinomas (HGSOC), and the presence of TP53 mutation discriminates between high and low‐grade serous carcinomas and is now an important biomarker for clinical trials targeting mutant p53. p53 immunohistochemistry (IHC) is widely used as a surrogate for TP53 mutation but its accuracy has not been established. The objective of this study was to test whether improved methods for p53 IHC could reliably predict TP53 mutations independently identified by next generation sequencing (NGS). Four clinical p53 IHC assays and tagged‐amplicon NGS for TP53 were performed on 171 HGSOC and 80 endometrioid carcinomas (EC). p53 expression was scored as overexpression (OE), complete absence (CA), cytoplasmic (CY) or wild type (WT). p53 IHC was evaluated as a binary classifier where any abnormal staining predicted deleterious TP53 mutation and as a ternary classifier where OE, CA or WT staining predicted gain‐of‐function (GOF or nonsynonymous), loss‐of‐function (LOF including stopgain, indel, splicing) or no detectable TP53 mutations (NDM), respectively. Deleterious TP53 mutations were detected in 169/171 (99%) HGSOC and 7/80 (8.8%) EC. The overall accuracy for the best performing IHC assay for binary and ternary prediction was 0.94 and 0.91 respectively, which improved to 0.97 (sensitivity 0.96, specificity 1.00) and 0.95 after secondary analysis of discordant cases. The sensitivity for predicting LOF mutations was lower at 0.76 because p53 IHC detected mutant p53 protein in 13 HGSOC with LOF mutations. CY staining associated with LOF was seen in 4 (2.3%) of HGSOC. Optimized p53 IHC can approach 100% specificity for the presence of TP53 mutation and its high negative predictive value is clinically useful as it can exclude the possibility of a low‐grade serous tumour. 4.1% of HGSOC cases have detectable WT staining while harboring a TP53 LOF mutation, which limits sensitivity for binary prediction of mutation to 96%. PMID:27840695
Clostridium and bacillus binary enterotoxins: bad for the bowels, and eukaryotic being.
Stiles, Bradley G; Pradhan, Kisha; Fleming, Jodie M; Samy, Ramar Perumal; Barth, Holger; Popoff, Michel R
2014-09-05
Some pathogenic spore-forming bacilli employ a binary protein mechanism for intoxicating the intestinal tracts of insects, animals, and humans. These Gram-positive bacteria and their toxins include Clostridium botulinum (C2 toxin), Clostridium difficile (C. difficile toxin or CDT), Clostridium perfringens (ι-toxin and binary enterotoxin, or BEC), Clostridium spiroforme (C. spiroforme toxin or CST), as well as Bacillus cereus (vegetative insecticidal protein or VIP). These gut-acting proteins form an AB complex composed of ADP-ribosyl transferase (A) and cell-binding (B) components that intoxicate cells via receptor-mediated endocytosis and endosomal trafficking. Once inside the cytosol, the A components inhibit normal cell functions by mono-ADP-ribosylation of globular actin, which induces cytoskeletal disarray and death. Important aspects of each bacterium and binary enterotoxin will be highlighted in this review, with particular focus upon the disease process involving the biochemistry and modes of action for each toxin.
Clostridium and Bacillus Binary Enterotoxins: Bad for the Bowels, and Eukaryotic Being
Stiles, Bradley G.; Pradhan, Kisha; Fleming, Jodie M.; Samy, Ramar Perumal; Barth, Holger; Popoff, Michel R.
2014-01-01
Some pathogenic spore-forming bacilli employ a binary protein mechanism for intoxicating the intestinal tracts of insects, animals, and humans. These Gram-positive bacteria and their toxins include Clostridium botulinum (C2 toxin), Clostridium difficile (C. difficile toxin or CDT), Clostridium perfringens (ι-toxin and binary enterotoxin, or BEC), Clostridium spiroforme (C. spiroforme toxin or CST), as well as Bacillus cereus (vegetative insecticidal protein or VIP). These gut-acting proteins form an AB complex composed of ADP-ribosyl transferase (A) and cell-binding (B) components that intoxicate cells via receptor-mediated endocytosis and endosomal trafficking. Once inside the cytosol, the A components inhibit normal cell functions by mono-ADP-ribosylation of globular actin, which induces cytoskeletal disarray and death. Important aspects of each bacterium and binary enterotoxin will be highlighted in this review, with particular focus upon the disease process involving the biochemistry and modes of action for each toxin. PMID:25198129
Gravitational Wave Detection of Compact Binaries Through Multivariate Analysis
NASA Astrophysics Data System (ADS)
Atallah, Dany Victor; Dorrington, Iain; Sutton, Patrick
2017-01-01
The first detection of gravitational waves (GW), GW150914, as produced by a binary black hole merger, has ushered in the era of GW astronomy. The detection technique used to find GW150914 considered only a fraction of the information available describing the candidate event: mainly the detector signal to noise ratios and chi-squared values. In hopes of greatly increasing detection rates, we want to take advantage of all the information available about candidate events. We employ a technique called Multivariate Analysis (MVA) to improve LIGO sensitivity to GW signals. MVA techniques are efficient ways to scan high dimensional data spaces for signal/noise classification. Our goal is to use MVA to classify compact-object binary coalescence (CBC) events composed of any combination of black holes and neutron stars. CBC waveforms are modeled through numerical relativity. Templates of the modeled waveforms are used to search for CBCs and quantify candidate events. Different MVA pipelines are under investigation to look for CBC signals and un-modelled signals, with promising results. One such MVA pipeline used for the un-modelled search can theoretically analyze far more data than the MVA pipelines currently explored for CBCs, potentially making a more powerful classifier. In principle, this extra information could improve the sensitivity to GW signals. We will present the results from our efforts to adapt an MVA pipeline used in the un-modelled search to classify candidate events from the CBC search.
Saito, Takaya; Rehmsmeier, Marc
2015-01-01
Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.
Das, Dipak Kumar; Patra, Animesh; Mitra, Rajib Kumar
2016-09-01
We report the changes in the hydration dynamics around a model protein hen egg white lysozyme (HEWL) in water-dimethyl sulfoxide (DMSO) binary mixture using THz time domain spectroscopy (TTDS) technique. DMSO molecules get preferentially solvated at the protein surface, as indicated by circular dichroism (CD) and Fourier transform infrared (FTIR) study in the mid-infrared region, resulting in a conformational change in the protein, which consequently modifies the associated hydration dynamics. As a control we also study the collective hydration dynamics of water-DMSO binary mixture and it is found that it follows a non-ideal behavior owing to the formation of DMSO-water clusters. It is observed that the cooperative dynamics of water at the protein surface does follow the DMSO-mediated conformational modulation of the protein. Copyright © 2016 Elsevier B.V. All rights reserved.
Origin of the computational hardness for learning with binary synapses.
Huang, Haiping; Kabashima, Yoshiyuki
2014-11-01
Through supervised learning in a binary perceptron one is able to classify an extensive number of random patterns by a proper assignment of binary synaptic weights. However, to find such assignments in practice is quite a nontrivial task. The relation between the weight space structure and the algorithmic hardness has not yet been fully understood. To this end, we analytically derive the Franz-Parisi potential for the binary perceptron problem by starting from an equilibrium solution of weights and exploring the weight space structure around it. Our result reveals the geometrical organization of the weight space; the weight space is composed of isolated solutions, rather than clusters of exponentially many close-by solutions. The pointlike clusters far apart from each other in the weight space explain the previously observed glassy behavior of stochastic local search heuristics.
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins. PMID:27418910
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
A multiple maximum scatter difference discriminant criterion for facial feature extraction.
Song, Fengxi; Zhang, David; Mei, Dayong; Guo, Zhongwei
2007-12-01
Maximum scatter difference (MSD) discriminant criterion was a recently presented binary discriminant criterion for pattern classification that utilizes the generalized scatter difference rather than the generalized Rayleigh quotient as a class separability measure, thereby avoiding the singularity problem when addressing small-sample-size problems. MSD classifiers based on this criterion have been quite effective on face-recognition tasks, but as they are binary classifiers, they are not as efficient on large-scale classification tasks. To address the problem, this paper generalizes the classification-oriented binary criterion to its multiple counterpart--multiple MSD (MMSD) discriminant criterion for facial feature extraction. The MMSD feature-extraction method, which is based on this novel discriminant criterion, is a new subspace-based feature-extraction method. Unlike most other subspace-based feature-extraction methods, the MMSD computes its discriminant vectors from both the range of the between-class scatter matrix and the null space of the within-class scatter matrix. The MMSD is theoretically elegant and easy to calculate. Extensive experimental studies conducted on the benchmark database, FERET, show that the MMSD out-performs state-of-the-art facial feature-extraction methods such as null space method, direct linear discriminant analysis (LDA), eigenface, Fisherface, and complete LDA.
An Enterotoxin-Like Binary Protein from Pseudomonas protegens with Potent Nematicidal Activity.
Wei, Jun-Zhi; Siehl, Daniel L; Hou, Zhenglin; Rosen, Barbara; Oral, Jarred; Taylor, Christopher G; Wu, Gusui
2017-10-01
Soil microbes are a major food source for free-living soil nematodes. It is known that certain soil bacteria have evolved systems to combat predation. We identified the nematode-antagonistic Pseudomonas protegens strain 15G2 from screening of microbes. Through protein purification we identified a binary protein, designated Pp-ANP, which is responsible for the nematicidal activity. This binary protein inhibits Caenorhabditis elegans growth and development by arresting larvae at the L1 stage and killing older-staged worms. The two subunits, Pp-ANP1a and Pp-ANP2a, are active when reconstituted from separate expression in Escherichia coli The binary toxin also shows strong nematicidal activity against three other free-living nematodes ( Pristionchus pacificus , Panagrellus redivivus , and Acrobeloides sp.), but we did not find any activity against insects and fungi under test conditions, indicating specificity for nematodes. Pp-ANP1a has no significant identity to any known proteins, while Pp-ANP2a shows ∼30% identity to E. coli heat-labile enterotoxin (LT) subunit A and cholera toxin (CT) subunit A. Protein modeling indicates that Pp-ANP2a is structurally similar to CT/LT and likely acts as an ADP-ribosyltransferase. Despite the similarity, Pp-ANP shows several characteristics distinct from CT/LT toxins. Our results indicate that Pp-ANP is a new enterotoxin-like binary toxin with potent and specific activity to nematodes. The potency and specificity of Pp-ANP suggest applications in controlling parasitic nematodes and open an avenue for further research on its mechanism of action and role in bacterium-nematode interaction. IMPORTANCE This study reports the discovery of a new enterotoxin-like binary protein, Pp-ANP, from a Pseudomonas protegens strain. Pp-ANP shows strong nematicidal activity against Caenorhabditis elegans larvae and older-staged worms. It also shows strong activity on other free-living nematodes ( Pristionchus pacificus , Panagrellus redivivus , and Acrobeloides sp.). The two subunits, Pp-ANP1a and Pp-ANP2a, can be expressed separately and reconstituted to form the active complex. Pp-ANP shows some distinct characteristics compared with other toxins, including Escherichia coli enterotoxin and cholera toxin. The present study indicates that Pp-ANP is a novel binary toxin and that it has potential applications in controlling parasitic nematodes and in studying toxin-host interaction. Copyright © 2017 Wei et al.
Building Multiclass Classifiers for Remote Homology Detection and Fold Recognition
2006-04-05
classes. In this study we evaluate the effectiveness of one of these formulations that was developed by Crammer and Singer [9], which leads to...significantly more complex model can be learned by directly applying the Crammer -Singer multiclass formulation on the outputs of the binary classifiers...will refer to this as the Crammer -Singer (CS) model. Comparing the scaling approach to the Crammer -Singer approach we can see that the Crammer -Singer
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.
Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz
2015-01-01
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Wang, Yuchun; Du, Xuezhong
2006-07-04
The miscibility and stability of the binary monolayers of zwitterionic dipalmitoylphosphatidylcholine (DPPC) and cationic dioctadecyldimethylammonium bromide (DOMA) at the air-water interface and the interaction of ferritin with the immobilized monolayers have been studied in detail using surface pressure-area isotherms and surface plasmon resonance technique, respectively. The surface pressure-area isotherms indicated that the binary monolayers of DPPC and DOMA at the air-water interface were miscible and more stable than the monolayers of the two individual components. The surface plasmon resonance studies indicated that ferritin binding to the immobilized monolayers was primarily driven by the electrostatic interaction and that the amount of adsorbed protein at saturation was closely related not only to the number of positive charges in the monolayers but also to the pattern of positive charges at a given mole fraction of DOMA. The protein adsorption kinetics was determined by the properties of the monolayers (i.e., the protein-monolayer interaction) and the structure of preadsorbed protein molecules (i.e., the protein-protein interaction).
Mnguni, Malitsatsi J; Michael, Joseph P; Lemmerer, Andreas
2018-06-01
An analysis and classification of the 2925 neutral binary organic cocrystals in the Cambridge Structural Database is reported, focusing specifically on those both showing polymorphism and containing an active pharmaceutical ingredient (API). The search was confined to molecules having only C, H, N, O, S and halogens atoms. It was found that 400 out of 2925 cocrystals can be classified as pharmaceutical cocrystals, containing at least one API, and that of those, 56 can be classified as being polymorphic cocrystals. In general, the total number of polymorphic cocrystal systems of any type stands at 125. In addition, a new polymorph of the pharmaceutical cocrystal theophylline-3,4-dihydroxybenzoic acid (1/1), C 7 H 8 N 4 O 2 ·C 7 H 6 O 4 , is reported.
Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tamagnini, Paolo; Krause, Josua W.; Dasgupta, Aritra
2017-05-14
To realize the full potential of machine learning in diverse real- world domains, it is necessary for model predictions to be readily interpretable and actionable for the human in the loop. Analysts, who are the users but not the developers of machine learning models, often do not trust a model because of the lack of transparency in associating predictions with the underlying data space. To address this problem, we propose Rivelo, a visual analytic interface that enables analysts to understand the causes behind predictions of binary classifiers by interactively exploring a set of instance-level explanations. These explanations are model-agnostic, treatingmore » a model as a black box, and they help analysts in interactively probing the high-dimensional binary data space for detecting features relevant to predictions. We demonstrate the utility of the interface with a case study analyzing a random forest model on the sentiment of Yelp reviews about doctors.« less
Moghaddasi, Hanie; Nourian, Saeed
2016-06-01
Heart disease is the major cause of death as well as a leading cause of disability in the developed countries. Mitral Regurgitation (MR) is a common heart disease which does not cause symptoms until its end stage. Therefore, early diagnosis of the disease is of crucial importance in the treatment process. Echocardiography is a common method of diagnosis in the severity of MR. Hence, a method which is based on echocardiography videos, image processing techniques and artificial intelligence could be helpful for clinicians, especially in borderline cases. In this paper, we introduce novel features to detect micro-patterns of echocardiography images in order to determine the severity of MR. Extensive Local Binary Pattern (ELBP) and Extensive Volume Local Binary Pattern (EVLBP) are presented as image descriptors which include details from different viewpoints of the heart in feature vectors. Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Template Matching techniques are used as classifiers to determine the severity of MR based on textural descriptors. The SVM classifier with Extensive Uniform Local Binary Pattern (ELBPU) and Extensive Volume Local Binary Pattern (EVLBP) have the best accuracy with 99.52%, 99.38%, 99.31% and 99.59%, respectively, for the detection of Normal, Mild MR, Moderate MR and Severe MR subjects among echocardiography videos. The proposed method achieves 99.38% sensitivity and 99.63% specificity for the detection of the severity of MR and normal subjects. Copyright © 2016 Elsevier Ltd. All rights reserved.
Inferring Binary and Trinary Stellar Populations in Photometric and Astrometric Surveys
NASA Astrophysics Data System (ADS)
Widmark, Axel; Leistedt, Boris; Hogg, David W.
2018-04-01
Multiple stellar systems are ubiquitous in the Milky Way but are often unresolved and seen as single objects in spectroscopic, photometric, and astrometric surveys. However, modeling them is essential for developing a full understanding of large surveys such as Gaia and connecting them to stellar and Galactic models. In this paper, we address this problem by jointly fitting the Gaia and Two Micron All Sky Survey photometric and astrometric data using a data-driven Bayesian hierarchical model that includes populations of binary and trinary systems. This allows us to classify observations into singles, binaries, and trinaries, in a robust and efficient manner, without resorting to external models. We are able to identify multiple systems and, in some cases, make strong predictions for the properties of their unresolved stars. We will be able to compare such predictions with Gaia Data Release 4, which will contain astrometric identification and analysis of binary systems.
Du, Xuezhong; Wang, Yuchun
2007-03-08
Infrared reflection absorption spectroscopy (IRRAS) and surface plasmon resonance (SPR) techniques have been employed to investigate human serum albumin (HSA) binding to binary monolayers of zwitterionic dipalmitoylphosphatidylcholine (DPPC) and cationic dioctadecyldimethylammonium bromide (DOMA). At the air-water interface, the favorable electrostatic interaction between DPPC and DOMA leads to a dense chain packing. The tilt angle of the hydrocarbon chains decreases with increasing mole fraction of DOMA (X(DOMA)) in the monolayers at the surface pressure 30 mN/m: DPPC ( approximately 30 degrees ), X(DOMA) = 0.1 ( approximately 15 degrees ), and X(DOMA) = 0.3 ( approximately 0 degrees ). Negligible protein binding to the DPPC monolayer is observed in contrast to a significant binding to the binary monolayers. After HSA binding, the hydrocarbon chains at X(DOMA) = 0.1 undergo an increase in tilt angle from 15 degrees to 25 approximately 30 degrees , and the chains at X(DOMA) = 0.3 remain almost unchanged. The two components in the monolayers deliver through lateral reorganization, induced by the protein in the subphase, to form multiple interaction sites favorable for protein binding. The surfaces with a high protein affinity are created through the directed assembly of binary monolayers for use in biosensing.
Zheng, Mengge; Chao, Chen; Yu, Jinglin; Copeland, Les; Wang, Shuo; Wang, Shujun
2018-02-28
The effects of chain length and degree of unsaturation of fatty acids (FAs) on structure and in vitro digestibility of starch-protein-FA complexes were investigated in model systems. Studies with the rapid visco analyzer (RVA) showed that the formation of ternary complex resulted in higher viscosities than those of binary complex during the cooling and holding stages. The results of differential scanning calorimetry (DSC), Raman, and X-ray diffraction (XRD) showed that the structural differences for ternary complexes were much less than those for binary complexes. Starch-protein-FA complexes presented lower in vitro enzymatic digestibility compared with starch-FAs complexes. We conclude that shorter chain and lower unsaturation FAs favor the formation of ternary complexes but decrease the thermal stability of these complexes. FAs had a smaller effect on the ordered structures of ternary complexes than on those of binary complexes and little effect on enzymatic digestibility of both binary and ternary complexes.
Clostridial binary toxins: iota and C2 family portraits.
Stiles, Bradley G; Wigelsworth, Darran J; Popoff, Michel R; Barth, Holger
2011-01-01
There are many pathogenic Clostridium species with diverse virulence factors that include protein toxins. Some of these bacteria, such as C. botulinum, C. difficile, C. perfringens, and C. spiroforme, cause enteric problems in animals as well as humans. These often fatal diseases can partly be attributed to binary protein toxins that follow a classic AB paradigm. Within a targeted cell, all clostridial binary toxins destroy filamentous actin via mono-ADP-ribosylation of globular actin by the A component. However, much less is known about B component binding to cell-surface receptors. These toxins share sequence homology amongst themselves and with those produced by another Gram-positive, spore-forming bacterium also commonly associated with soil and disease: Bacillus anthracis. This review focuses upon the iota and C2 families of clostridial binary toxins and includes: (1) basics of the bacterial source; (2) toxin biochemistry; (3) sophisticated cellular uptake machinery; and (4) host-cell responses following toxin-mediated disruption of the cytoskeleton. In summary, these protein toxins aid diverse enteric species within the genus Clostridium.
NASA Astrophysics Data System (ADS)
Dong, Yi-Ze; Gu, Wei-Min; Liu, Tong; Wang, Junfeng
2018-03-01
Gamma-ray bursts (GRBs) are luminous and violent phenomena in the Universe. Traditionally, long GRBs are expected to be produced by the collapse of massive stars and associated with supernovae. However, some low-redshift long GRBs have no detection of supernova association, such as GRBs 060505, 060614, and 111005A. It is hard to classify these events convincingly according to usual classifications, and the lack of the supernova implies a non-massive star origin. We propose a new path to produce long GRBs without supernova association, the unstable and extremely violent accretion in a contact binary system consisting of a stellar-mass black hole and a white dwarf, which fills an important gap in compact binary evolution.
Using color photometry to separate transiting exoplanets from false positives
NASA Astrophysics Data System (ADS)
Tingley, B.
2004-10-01
The radial velocity technique is currently used to classify transiting objects. While capable of identifying grazing binary eclipses, this technique cannot reliably identify blends, a chance overlap of a faint background eclipsing binary with an ordinary foreground star. Blends generally have no observable radial velocity shifts, as the foreground star is brighter by several magnitudes and therefore dominates the spectrum, but their combined light can produce events that closely resemble those produced by transiting exoplanets. The radial velocity technique takes advantage of the mass difference between planets and stars to classify exoplanet candidates. However, the existence of blends renders this difference an unreliable discriminator. Another difference must therefore be utilized for this classification - the physical size of the transiting body. Due to the dependence of limb darkening on color, planets and stars produce subtly different transit shapes. These differences can be relatively weak, little more than 1/10th the transit depth. However, the presence of even small color differences between the individual components of the blend increases this difference. This paper shows that this color difference is capable of discriminating between exoplanets and blends reliably, theoretically capable of classifying even terrestrial-class transits, unlike the radial velocity technique.
Bayesian truthing and experimental validation in homeland security and defense
NASA Astrophysics Data System (ADS)
Jannson, Tomasz; Forrester, Thomas; Wang, Wenjian; Kostrzewski, Andrew; Pradhan, Ranjit
2014-05-01
In this paper we discuss relations between Bayesian Truthing (experimental validation), Bayesian statistics, and Binary Sensing in the context of selected Homeland Security and Intelligence, Surveillance, Reconnaissance (ISR) optical and nonoptical application scenarios. The basic Figure of Merit (FoM) is Positive Predictive Value (PPV), as well as false positives and false negatives. By using these simple binary statistics, we can analyze, classify, and evaluate a broad variety of events including: ISR; natural disasters; QC; and terrorism-related, GIS-related, law enforcement-related, and other C3I events.
About decomposition approach for solving the classification problem
NASA Astrophysics Data System (ADS)
Andrianova, A. A.
2016-11-01
This article describes the features of the application of an algorithm with using of decomposition methods for solving the binary classification problem of constructing a linear classifier based on Support Vector Machine method. Application of decomposition reduces the volume of calculations, in particular, due to the emerging possibilities to build parallel versions of the algorithm, which is a very important advantage for the solution of problems with big data. The analysis of the results of computational experiments conducted using the decomposition approach. The experiment use known data set for binary classification problem.
The enigmatic star EZ Pegasi - A mystery solved?
NASA Technical Reports Server (NTRS)
Howell, S. B.; Bopp, B. W.
1985-01-01
EZ Peg, a ninth-magnitude G star that has been classified by various authors as an irregular variable, a U Gem system, and a contact binary, is shown to have all the spectroscopic and photometric characteristics of an active-chromosphere RS CVn binary. It is suggested that the reported outburst of 1943, when the spectrum appeared to be that of a B star, never occurred. The strong Ca II H and K reversals, viewed with low spectral resolution, caused the photospheric Ca II absorption to appear abnormally weak, mimicking a much earlier spectral type.
NASA Astrophysics Data System (ADS)
Rebassa-Mansergas, A.; Ren, J. J.; Irawati, P.; García-Berro, E.; Parsons, S. G.; Schreiber, M. R.; Gänsicke, B. T.; Rodríguez-Gil, P.; Liu, X.; Manser, C.; Nevado, S. P.; Jiménez-Ibarra, F.; Costero, R.; Echevarría, J.; Michel, R.; Zorotovic, M.; Hollands, M.; Han, Z.; Luo, A.; Villaver, E.; Kong, X.
2017-12-01
We present the second paper of a series of publications aiming at obtaining a better understanding regarding the nature of type Ia supernovae (SN Ia) progenitors by studying a large sample of detached F, G and K main-sequence stars in close orbits with white dwarf companions (i.e. WD+FGK binaries). We employ the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) data release 4 spectroscopic data base together with Galaxy Evolution Explorer (GALEX) ultraviolet fluxes to identify 1549 WD+FGK binary candidates (1057 of which are new), thus doubling the number of known sources. We measure the radial velocities of 1453 of these binaries from the available LAMOST spectra and/or from spectra obtained by us at a wide variety of different telescopes around the globe. The analysis of the radial velocity data allows us to identify 24 systems displaying more than 3σ radial velocity variation that we classify as close binaries. We also discuss the fraction of close binaries among WD+FGK systems, which we find to be ∼10 per cent, and demonstrate that high-resolution spectroscopy is required to efficiently identify double-degenerate SN Ia progenitor candidates.
EXTRASOLAR BINARY PLANETS. II. DETECTABILITY BY TRANSIT OBSERVATIONS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lewis, K. M.; Ida, S.; Ochiai, H.
2015-05-20
We discuss the detectability of gravitationally bound pairs of gas-giant planets (which we call “binary planets”) in extrasolar planetary systems that are formed through orbital instability followed by planet–planet dynamical tides during their close encounters, based on the results of N-body simulations by Ochiai et al. (Paper I). Paper I showed that the formation probability of a binary is as much as ∼10% for three giant planet systems that undergo orbital instability, and after post-capture long-term tidal evolution, the typical binary separation is three to five times the sum of the physical radii of the planets. The binary planets aremore » stable during the main-sequence lifetime of solar-type stars, if the stellarcentric semimajor axis of the binary is larger than 0.3 AU. We show that detecting modulations of transit light curves is the most promising observational method to detect binary planets. Since the likely binary separations are comparable to the stellar diameter, the shape of the transit light curve is different from transit to transit, depending on the phase of the binary’s orbit. The transit durations and depth for binary planet transits are generally longer and deeper than those for the single planet case. We point out that binary planets could exist among the known inflated gas-giant planets or objects classified as false positive detections at orbital radii ≳0.3 AU, propose a binary planet explanation for the CoRoT candidate SRc01 E2 1066, and show that binary planets are likely to be present in, and could be detected using, Kepler-quality data.« less
Automated detection of tuberculosis on sputum smeared slides using stepwise classification
NASA Astrophysics Data System (ADS)
Divekar, Ajay; Pangilinan, Corina; Coetzee, Gerrit; Sondh, Tarlochan; Lure, Fleming Y. M.; Kennedy, Sean
2012-03-01
Routine visual slide screening for identification of tuberculosis (TB) bacilli in stained sputum slides under microscope system is a tedious labor-intensive task and can miss up to 50% of TB. Based on the Shannon cofactor expansion on Boolean function for classification, a stepwise classification (SWC) algorithm is developed to remove different types of false positives, one type at a time, and to increase the detection of TB bacilli at different concentrations. Both bacilli and non-bacilli objects are first analyzed and classified into several different categories including scanty positive, high concentration positive, and several non-bacilli categories: small bright objects, beaded, dim elongated objects, etc. The morphological and contrast features are extracted based on aprior clinical knowledge. The SWC is composed of several individual classifiers. Individual classifier to increase the bacilli counts utilizes an adaptive algorithm based on a microbiologist's statistical heuristic decision process. Individual classifier to reduce false positive is developed through minimization from a binary decision tree to classify different types of true and false positive based on feature vectors. Finally, the detection algorithm is was tested on 102 independent confirmed negative and 74 positive cases. A multi-class task analysis shows high accordance rate for negative, scanty, and high-concentration as 88.24%, 56.00%, and 97.96%, respectively. A binary-class task analysis using a receiver operating characteristics method with the area under the curve (Az) is also utilized to analyze the performance of this detection algorithm, showing the superior detection performance on the high-concentration cases (Az=0.913) and cases mixed with high-concentration and scanty cases (Az=0.878).
Crystallisation via novel 3D nanotemplates as a tool for protein purification and bio-separation
NASA Astrophysics Data System (ADS)
Shah, Umang V.; Jahn, Niklas H.; Huang, Shanshan; Yang, Zhongqiang; Williams, Daryl R.; Heng, Jerry Y. Y.
2017-07-01
This study reports an experimental validation of the surface preferential nucleation of proteins on the basis of a relationship between nucleant pore diameter and protein hydrodynamic diameter. The validated correlation was employed for the selection of nucleant pore diameter to crystallise a target protein from binary, equivolume protein mixture. We report proof-of-concept preliminary experimental evidence for the rational approach for crystallisation of a target protein from a binary protein mixture on the surface of 3D nanotemplates with controlled surface porosity and narrow pore-size distribution selected on the basis of a relationship between the nucleant pore diameter and protein hydrodynamic diameter. The outcome of this study opens up an exciting opportunity for exploring protein crystallisation as a potential route for protein purification and bio-separation in both technical and pharmaceutical applications.
On the decoding process in ternary error-correcting output codes.
Escalera, Sergio; Pujol, Oriol; Radeva, Petia
2010-01-01
A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-Correcting Output Codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a "do not care" symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI Machine Learning Repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved.
Gu, Yao; Ni, Yongnian; Kokot, Serge
2012-09-13
A novel, simple and direct fluorescence method for analysis of complex substances and their potential substitutes has been researched and developed. Measurements involved excitation and emission (EEM) fluorescence spectra of powdered, complex, medicinal herbs, Cortex Phellodendri Chinensis (CPC) and the similar Cortex Phellodendri Amurensis (CPA); these substances were compared and discriminated from each other and the potentially adulterated samples (Caulis mahoniae (CM) and David poplar bark (DPB)). Different chemometrics methods were applied for resolution of the complex spectra, and the excitation spectra were found to be the most informative; only the rank-ordering PROMETHEE method was able to classify the samples with single ingredients (CPA, CPC, CM) or those with binary mixtures (CPA/CPC, CPA/CM, CPC/CM). Interestingly, it was essential to use the geometrical analysis for interactive aid (GAIA) display for a full understanding of the classification results. However, these two methods, like the other chemometrics models, were unable to classify composite spectral matrices consisting of data from samples of single ingredients and binary mixtures; this suggested that the excitation spectra of the different samples were very similar. However, the method is useful for classification of single-ingredient samples and, separately, their binary mixtures; it may also be applied for similar classification work with other complex substances.
Automatic annotation of protein motif function with Gene Ontology terms.
Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; Buchanan, Bruce G
2004-09-02
Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. This paper presents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifs is viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association is found to be a very useful feature. We take advantage of the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correct association. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about the functions of newly discovered candidate protein motifs.
PPCM: Combing multiple classifiers to improve protein-protein interaction prediction
Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan
2015-08-01
Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. Ultimately, this pipeline will be useful for predicting PPI in nonmodel species.« less
Cell classification using big data analytics plus time stretch imaging (Conference Presentation)
NASA Astrophysics Data System (ADS)
Jalali, Bahram; Chen, Claire L.; Mahjoubfar, Ata
2016-09-01
We show that blood cells can be classified with high accuracy and high throughput by combining machine learning with time stretch quantitative phase imaging. Our diagnostic system captures quantitative phase images in a flow microscope at millions of frames per second and extracts multiple biophysical features from individual cells including morphological characteristics, light absorption and scattering parameters, and protein concentration. These parameters form a hyperdimensional feature space in which supervised learning and cell classification is performed. We show binary classification of T-cells against colon cancer cells, as well classification of algae cell strains with high and low lipid content. The label-free screening averts the negative impact of staining reagents on cellular viability or cell signaling. The combination of time stretch machine vision and learning offers unprecedented cell analysis capabilities for cancer diagnostics, drug development and liquid biopsy for personalized genomics.
Kachhap, Sangita; Priyadarshini, Pragya; Singh, Balvinder
2017-05-01
Aristaless (Al) and clawless (Cll) homeodomains that are involved in leg development in Drosophila melanogaster are known to bind cooperatively to 5'-(T/C)TAATTAA(T/A)(T/A)G-3' DNA sequence, but the mechanism of their binding to DNA is unknown. Molecular dynamics (MD) studies have been carried out on binary, ternary, and reconstructed protein-DNA complexes involving Al, Cll, and DNA along with binding free energy analysis of these complexes. Analysis of MD trajectories of Cll-3A01, binary complex reveals that C-terminal end of helixIII of Cll, unwind in the absence of Al and remains so in reconstructed ternary complex, Cll-3A01-Al. In addition, this change in secondary structure of Cll does not allow it to form protein-protein interactions with Al in the ternary reconstructed complex. However, secondary structure of Cll and its interactions are maintained in other reconstructed ternary complex, Al-3A01-Cll where Cll binds to Al-3A01, binary complex to form ternary complex. These interactions as observed during MD simulations compare well with those observed in ternary crystal structure. Thus, this study highlights the role of helixIII of Cll and protein-protein interactions while proposing likely mechanism of recognition in ternary complex, Al-Cll-DNA.
Have Observatory, Will Travel.
ERIC Educational Resources Information Center
White, James C., II
1996-01-01
Describes several of the labs developed by Project CLEA (Contemporary Laboratory Experiences in Astronomy). The computer labs cover simulated spectrometer use, investigating the moons of Jupiter, radar measurements, energy flow out of the sun, classifying stellar spectra, photoelectric photometry, Doppler effect, eclipsing binary stars, and lunar…
Clostridial Binary Toxins: Iota and C2 Family Portraits
Stiles, Bradley G.; Wigelsworth, Darran J.; Popoff, Michel R.; Barth, Holger
2011-01-01
There are many pathogenic Clostridium species with diverse virulence factors that include protein toxins. Some of these bacteria, such as C. botulinum, C. difficile, C. perfringens, and C. spiroforme, cause enteric problems in animals as well as humans. These often fatal diseases can partly be attributed to binary protein toxins that follow a classic AB paradigm. Within a targeted cell, all clostridial binary toxins destroy filamentous actin via mono-ADP-ribosylation of globular actin by the A component. However, much less is known about B component binding to cell-surface receptors. These toxins share sequence homology amongst themselves and with those produced by another Gram-positive, spore-forming bacterium also commonly associated with soil and disease: Bacillus anthracis. This review focuses upon the iota and C2 families of clostridial binary toxins and includes: (1) basics of the bacterial source; (2) toxin biochemistry; (3) sophisticated cellular uptake machinery; and (4) host–cell responses following toxin-mediated disruption of the cytoskeleton. In summary, these protein toxins aid diverse enteric species within the genus Clostridium. PMID:22919577
Mehta, Chirag M; White, Edward T; Litster, James D
2013-01-01
Interactions measurement is a valuable tool to predict equilibrium phase separation of a desired protein in the presence of unwanted macromolecules. In this study, cross-interactions were measured as the osmotic second virial cross-coefficients (B23 ) for the three binary protein systems involving lysozyme, ovalbumin, and α-amylase in salt solutions (sodium chloride and ammonium sulfate). They were correlated with solubility for the binary protein mixtures. The cross-interaction behavior at different salt concentrations was interpreted by either electrostatic or hydrophobic interaction forces. At low salt concentrations, the protein surface charge dominates cross-interaction behavior as a function of pH. With added ovalbumin, the lysozyme solubility decreased linearly at low salt concentration in sodium chloride and increased at high salt concentration in ammonium sulfate. The B23 value was found to be proportional to the slope of the lysozyme solubility against ovalbumin concentration and the correlation was explained by preferential interaction theory. © 2013 American Institute of Chemical Engineers.
Electrostatic assembly of binary nanoparticle superlattices using protein cages
NASA Astrophysics Data System (ADS)
Kostiainen, Mauri A.; Hiekkataipale, Panu; Laiho, Ari; Lemieux, Vincent; Seitsonen, Jani; Ruokolainen, Janne; Ceci, Pierpaolo
2013-01-01
Binary nanoparticle superlattices are periodic nanostructures with lattice constants much shorter than the wavelength of light and could be used to prepare multifunctional metamaterials. Such superlattices are typically made from synthetic nanoparticles, and although biohybrid structures have been developed, incorporating biological building blocks into binary nanoparticle superlattices remains challenging. Protein-based nanocages provide a complex yet monodisperse and geometrically well-defined hollow cage that can be used to encapsulate different materials. Such protein cages have been used to program the self-assembly of encapsulated materials to form free-standing crystals and superlattices at interfaces or in solution. Here, we show that electrostatically patchy protein cages--cowpea chlorotic mottle virus and ferritin cages--can be used to direct the self-assembly of three-dimensional binary superlattices. The negatively charged cages can encapsulate RNA or superparamagnetic iron oxide nanoparticles, and the superlattices are formed through tunable electrostatic interactions with positively charged gold nanoparticles. Gold nanoparticles and viruses form an AB8fcc crystal structure that is not isostructural with any known atomic or molecular crystal structure and has previously been observed only with large colloidal polymer particles. Gold nanoparticles and empty or nanoparticle-loaded ferritin cages form an interpenetrating simple cubic AB structure (isostructural with CsCl). We also show that these magnetic assemblies provide contrast enhancement in magnetic resonance imaging.
A Binary Approach to Define and Classify Final Ecosystem Goods and Services
The ecosystem services literature decries the lack of consistency and standards in the application of ecosystem services as well as the inability of current approaches to explicitly link ecosystem services to human well-being. Recently, SEEA and CICES have conceptually identifie...
Seismic event classification system
Dowla, F.U.; Jarpe, S.P.; Maurer, W.
1994-12-13
In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities. 21 figures.
LaViola, Joseph J; Zeleznik, Robert C
2007-11-01
We present a practical technique for using a writer-independent recognition engine to improve the accuracy and speed while reducing the training requirements of a writer-dependent symbol recognizer. Our writer-dependent recognizer uses a set of binary classifiers based on the AdaBoost learning algorithm, one for each possible pairwise symbol comparison. Each classifier consists of a set of weak learners, one of which is based on a writer-independent handwriting recognizer. During online recognition, we also use the n-best list of the writer-independent recognizer to prune the set of possible symbols and thus reduce the number of required binary classifications. In this paper, we describe the geometric and statistical features used in our recognizer and our all-pairs classification algorithm. We also present the results of experiments that quantify the effect incorporating a writer-independent recognition engine into a writer-dependent recognizer has on accuracy, speed, and user training time.
Seismic event classification system
Dowla, Farid U.; Jarpe, Stephen P.; Maurer, William
1994-01-01
In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities.
NASA Astrophysics Data System (ADS)
Sheikhan, Mansour; Abbasnezhad Arabi, Mahdi; Gharavian, Davood
2015-10-01
Artificial neural networks are efficient models in pattern recognition applications, but their performance is dependent on employing suitable structure and connection weights. This study used a hybrid method for obtaining the optimal weight set and architecture of a recurrent neural emotion classifier based on gravitational search algorithm (GSA) and its binary version (BGSA), respectively. By considering the features of speech signal that were related to prosody, voice quality, and spectrum, a rich feature set was constructed. To select more efficient features, a fast feature selection method was employed. The performance of the proposed hybrid GSA-BGSA method was compared with similar hybrid methods based on particle swarm optimisation (PSO) algorithm and its binary version, PSO and discrete firefly algorithm, and hybrid of error back-propagation and genetic algorithm that were used for optimisation. Experimental tests on Berlin emotional database demonstrated the superior performance of the proposed method using a lighter network structure.
Xu, Fangzhou; Zhou, Weidong; Zhen, Yilin; Yuan, Qi; Wu, Qi
2016-09-01
The feature extraction and classification of brain signal is very significant in brain-computer interface (BCI). In this study, we describe an algorithm for motor imagery (MI) classification of electrocorticogram (ECoG)-based BCI. The proposed approach employs multi-resolution fractal measures and local binary pattern (LBP) operators to form a combined feature for characterizing an ECoG epoch recording from the right hemisphere of the brain. A classifier is trained by using the gradient boosting in conjunction with ordinary least squares (OLS) method. The fractal intercept, lacunarity and LBP features are extracted to classify imagined movements of either the left small finger or the tongue. Experimental results on dataset I of BCI competition III demonstrate the superior performance of our method. The cross-validation accuracy and accuracy is 90.6% and 95%, respectively. Furthermore, the low computational burden of this method makes it a promising candidate for real-time BCI systems.
MT Ser, a binary blue subdwarf
NASA Astrophysics Data System (ADS)
Shimanskii, V. V.; Borisov, N. V.; Sakhibullin, N. A.; Sheveleva, D. V.
2008-06-01
We have classified and determined the parameters of the evolved close binary MT Ser. Our moderate-resolution spectra covering various phases of the orbital period were taken with the 6-m telescope of the Special Astrophysical Observatory. The spectra of MT Ser freed from the contribution of the surrounding nebula Abell 41 contained no emission lines due to the reflection effect. The radial velocities measured from lines of different elements showed them to be constant on a time scale corresponding to the orbital period. At the same time, we find effects of broadening for the HeII absorption lines, due to the orbital motion of two hot stars of similar types. As a result, we classify MT Ser as a system with two blue subdwarfs after the common-envelope stage. We estimate the component masses and the distance to the object from the Doppler broadening of the HeII lines. We demonstrate that the age of the ambient nebula, Abell 41, is about 35 000 years.
Chaotic particle swarm optimization with mutation for classification.
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms.
Baracco, Yanina; Rodriguez Furlán, Laura T; Campderrós, Mercedes E
2017-04-01
The aim of this work was to study the influence of the binary and ternary combinations of bovine plasma proteins (BPP), inulin (I) and κ-carrageenan (C) in the overall quality of fat-reduced sausages. The influence of these components over different properties (chemical composition, weight loss after cooking, emulsion stability, texture profile and sensory analysis of fat-reduced sausages) was studied and compared against two samples, one without fat reduction and another a fat-reduced sample without addition of texturing agents. In this sense, a full factorial experimental design of two levels with central point was used. The samples containing BPP+I and BPP+C showed a synergy in which the binary combinations presented higher values of moisture and protein content than the samples containing the individual components. The reduction of fat content increases the values of hardness and decreases the values of springiness. Samples with 5% BPP (w/w) and binary combinations of BPP+C and BPP+I had the best stability values (low total fluid loss), demonstrating a significant synergistic effect by combining BPP+C. Similar results were obtained from the study of weight loss after cooking. However, both studies showed a destabilization of the sample BPP+I+C as emulsion stability decreased and weight loss increased after cooking compared to binary combinations ( P < 0.05). Samples with a binary combination of BPP+C and BPP+I do not present a statistically significant difference in the chewiness with respect to a not-fat-reduced commercial sample ( P > 0.05). The less acceptable sample for flavor and texture was the one containing only BPP. However, when BPP combined with I or C, a major acceptability was obtained, demonstrating the synergistic effect of these binary combinations. Therefore, our studies revealed that the binary combinations of BPP with I or C are good alternatives for the development of fat-reduced sausage.
Chromospherically Active Stars. XXV. HD 144110=EV Draconis, a Double-lined Dwarf Binary
NASA Astrophysics Data System (ADS)
Fekel, Francis C.; Henry, Gregory W.; Lewis, Ceteka
2005-08-01
New spectroscopic and photometric observations of HD 144110 have been used to obtain an improved orbital element solution and determine some basic properties of the system. This chromospherically active, double-lined spectroscopic binary has an orbital period of 1.6714012 days and a circular orbit. We classify the components as G5 V and K0 V and suggest that they are slightly metal-rich. The photometric observations indicate that the rotation of HD 144110 is synchronous with the orbital period. Despite the short orbital period, no evidence of eclipses is seen in our photometry.
Classification of skin cancer images using local binary pattern and SVM classifier
NASA Astrophysics Data System (ADS)
Adjed, Faouzi; Faye, Ibrahima; Ababsa, Fakhreddine; Gardezi, Syed Jamal; Dass, Sarat Chandra
2016-11-01
In this paper, a classification method for melanoma and non-melanoma skin cancer images has been presented using the local binary patterns (LBP). The LBP computes the local texture information from the skin cancer images, which is later used to compute some statistical features that have capability to discriminate the melanoma and non-melanoma skin tissues. Support vector machine (SVM) is applied on the feature matrix for classification into two skin image classes (malignant and benign). The method achieves good classification accuracy of 76.1% with sensitivity of 75.6% and specificity of 76.7%.
Malof, Jordan M.; Mazurowski, Maciej A.; Tourassi, Georgia D.
2013-01-01
Case selection is a useful approach for increasing the efficiency and performance of case-based classifiers. Multiple techniques have been designed to perform case selection. This paper empirically investigates how class imbalance in the available set of training cases can impact the performance of the resulting classifier as well as properties of the selected set. In this study, the experiments are performed using a dataset for the problem of detecting breast masses in screening mammograms. The classification problem was binary and we used a k-nearest neighbor classifier. The classifier’s performance was evaluated using the Receiver Operating Characteristic (ROC) area under the curve (AUC) measure. The experimental results indicate that although class imbalance reduces the performance of the derived classifier and the effectiveness of selection at improving overall classifier performance, case selection can still be beneficial, regardless of the level of class imbalance. PMID:21820273
Collell, Guillem; Prelec, Drazen; Patil, Kaustubh R
2018-01-31
Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori , i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.
Neural network ensemble based CAD system for focal liver lesions from B-mode ultrasound.
Virmani, Jitendra; Kumar, Vinod; Kalra, Naveen; Khandelwal, Niranjan
2014-08-01
A neural network ensemble (NNE) based computer-aided diagnostic (CAD) system to assist radiologists in differential diagnosis between focal liver lesions (FLLs), including (1) typical and atypical cases of Cyst, hemangioma (HEM) and metastatic carcinoma (MET) lesions, (2) small and large hepatocellular carcinoma (HCC) lesions, along with (3) normal (NOR) liver tissue is proposed in the present work. Expert radiologists, visualize the textural characteristics of regions inside and outside the lesions to differentiate between different FLLs, accordingly texture features computed from inside lesion regions of interest (IROIs) and texture ratio features computed from IROIs and surrounding lesion regions of interests (SROIs) are taken as input. Principal component analysis (PCA) is used for reducing the dimensionality of the feature space before classifier design. The first step of classification module consists of a five class PCA-NN based primary classifier which yields probability outputs for five liver image classes. The second step of classification module consists of ten binary PCA-NN based secondary classifiers for NOR/Cyst, NOR/HEM, NOR/HCC, NOR/MET, Cyst/HEM, Cyst/HCC, Cyst/MET, HEM/HCC, HEM/MET and HCC/MET classes. The probability outputs of five class PCA-NN based primary classifier is used to determine the first two most probable classes for a test instance, based on which it is directed to the corresponding binary PCA-NN based secondary classifier for crisp classification between two classes. By including the second step of the classification module, classification accuracy increases from 88.7 % to 95 %. The promising results obtained by the proposed system indicate its usefulness to assist radiologists in differential diagnosis of FLLs.
Rajendran, Senthilnathan; Jothi, Arunachalam
2018-05-16
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Sankari, E Siva; Manimegalai, D
2017-12-21
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
PG 1316+678: A young pre-cataclysmic binary with weak reflection effects
NASA Astrophysics Data System (ADS)
Shimansky, V. V.; Borisov, N. V.; Bikmaev, I. F.; Sakhibullin, N. A.; Shimanskaya, N. N.; Spiridonova, O. I.; Irtuganov, E. N.
2013-03-01
The PG 1316+678 star is classified as a pre-cataclysmic binary, as is evidenced by its photometric and spectroscopic observations. Its orbital period is determined to be P orb = 3.3803d, which coincides with the photometric period. The intensities of the emission HI and HeI lines are shown to vary synchronously with the brightness of the object (Δ m V = 0.065 m , Δ m R = 0.08 m ). These variations arise as the UV radiation from the DAO white dwarf is reflected from the surface of the cold companion. The parameters of the binary are estimated and the time of its evolution after the common-envelope phase is determined to be t ≈ 240 000 years. Thus, PG 1316+678 is a young pre-cataclysmic NN Ser variable with the smallest known photometric reflection effect.
Quantum Support Vector Machine for Big Data Classification
NASA Astrophysics Data System (ADS)
Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth
2014-09-01
Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.
Khazendar, S; Sayasneh, A; Al-Assam, H; Du, H; Kaijser, J; Ferrara, L; Timmerman, D; Jassim, S; Bourne, T
2015-01-01
Preoperative characterisation of ovarian masses into benign or malignant is of paramount importance to optimise patient management. In this study, we developed and validated a computerised model to characterise ovarian masses as benign or malignant. Transvaginal 2D B mode static ultrasound images of 187 ovarian masses with known histological diagnosis were included. Images were first pre-processed and enhanced, and Local Binary Pattern Histograms were then extracted from 2 × 2 blocks of each image. A Support Vector Machine (SVM) was trained using stratified cross validation with randomised sampling. The process was repeated 15 times and in each round 100 images were randomly selected. The SVM classified the original non-treated static images as benign or malignant masses with an average accuracy of 0.62 (95% CI: 0.59-0.65). This performance significantly improved to an average accuracy of 0.77 (95% CI: 0.75-0.79) when images were pre-processed, enhanced and treated with a Local Binary Pattern operator (mean difference 0.15: 95% 0.11-0.19, p < 0.0001, two-tailed t test). We have shown that an SVM can classify static 2D B mode ultrasound images of ovarian masses into benign and malignant categories. The accuracy improves if texture related LBP features extracted from the images are considered.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qian, S.-B.; Han, Z.-T.; Zhang, B.
1SWASP J162117.36+441254.2 was originally classified as an EW-type binary with a period of 0.20785 days. However, it was detected to have undergone a stellar outburst on 2016 June 3. Although the system was later classified as a cataclysmic variable (CV) and the event was attributed as a dwarf nova outburst, the physical reason is still unknown. This binary has been monitored photometrically since 2016 April 19, and many light curves were obtained before, during, and after the outburst. Those light and color curves observed before the outburst indicate that the system is a special CV. The white dwarf is notmore » accreting material from the secondary and there are no accretion disks surrounding the white dwarf. By comparing the light curves obtained from 2016 April 19 to those from September 14, it was found that magnetic activity of the secondary is associated with the outburst. We show strong evidence that the L {sub 1} region on the secondary was heavily spotted before and after the outburst and thus quench the mass transfer, while the outburst is produced by a sudden mass accretion of the white dwarf. These results suggest that J162117 is a good astrophysical laboratory to study stellar magnetic activity and its influences on CV mass transfer and mass accretion.« less
Mikkola, Arto; Aro, Jussi; Rannikko, Sakari; Ruutu, Mirja
2009-01-01
To develop three prognostic groups for disease specific mortality based on the binary classified pretreatment variables age, haemoglobin concentration (Hb), erythrocyte sedimentation rate (ESR), alkaline phosphatase (ALP), prostate-specific antigen (PSA), plasma testosterone and estradiol level in hormonally treated patients with metastatic prostate cancer (PCa). The present study comprised 200 Finnprostate 6 study patients, but data on all variables were not known for every patient. The patients were divided into three prognostic risk groups (Rgs) using the prognostically best set of pretreatment variables. The best set was found by backward stepwise selection and the effect of every excluded variable on the binary classification cut-off points of the remaining variables was checked and corrected when needed. The best group of variables was ALP, PSA, ESR and age. All data were known in 142 patients. Patients were given one risk point each for ALP > 180 U/l (normal value 60-275 U/l), PSA > 35 microg/l, ESR > 80 mm/h and age < 60 years. Three risk groups were formed: Rg-a (0-1 risk points), Rg-b (2 risk points) and Rg-c (3-4 risk points). The risk of death from PCa increased statistically significantly with advancing prognostic group. Patients with metastatic PCa can be divided into three statistically significantly different prognostic risk groups for PCa-specific mortality by using the binary classified pretreatment variables ALP, PSA, ESR and age.
NASA Astrophysics Data System (ADS)
Qian, S.-B.; Han, Z.-T.; Zhang, B.; Zejda, M.; Michel, R.; Zhu, L.-Y.; Zhao, E.-G.; Liao, W.-P.; Tian, X.-M.; Wang, Z.-H.
2017-10-01
1SWASP J162117.36+441254.2 was originally classified as an EW-type binary with a period of 0.20785 days. However, it was detected to have undergone a stellar outburst on 2016 June 3. Although the system was later classified as a cataclysmic variable (CV) and the event was attributed as a dwarf nova outburst, the physical reason is still unknown. This binary has been monitored photometrically since 2016 April 19, and many light curves were obtained before, during, and after the outburst. Those light and color curves observed before the outburst indicate that the system is a special CV. The white dwarf is not accreting material from the secondary and there are no accretion disks surrounding the white dwarf. By comparing the light curves obtained from 2016 April 19 to those from September 14, it was found that magnetic activity of the secondary is associated with the outburst. We show strong evidence that the L 1 region on the secondary was heavily spotted before and after the outburst and thus quench the mass transfer, while the outburst is produced by a sudden mass accretion of the white dwarf. These results suggest that J162117 is a good astrophysical laboratory to study stellar magnetic activity and its influences on CV mass transfer and mass accretion.
Orbit classification in an equal-mass non-spinning binary black hole pseudo-Newtonian system
NASA Astrophysics Data System (ADS)
Zotos, Euaggelos E.; Dubeibe, Fredy L.; González, Guillermo A.
2018-07-01
The dynamics of a test particle in a non-spinning binary black hole system of equal masses is numerically investigated. The binary system is modelled in the context of the pseudo-Newtonian circular restricted three-body problem, such that the primaries are separated by a fixed distance and move in a circular orbit around each other. In particular, the Paczyński-Wiita potential is used for describing the gravitational field of the two non-Newtonian primaries. The orbital properties of the test particle are determined through the classification of the initial conditions of the orbits, using several values of the Jacobi constant, in the Hill's regions of possible motion. The initial conditions are classified into three main categories: (i) bounded, (ii) escaping, and (iii) displaying close encounters. Using the smaller alignment index chaos indicator, we further classify bounded orbits into regular, sticky, or chaotic. To gain a complete view of the dynamics of the system, we define grids of initial conditions on different types of two-dimensional planes. The orbital structure of the configuration plane, along with the corresponding distributions of the escape and collision/close encounter times, allow us to observe the transition from the classical Newtonian to the pseudo-Newtonian regime. Our numerical results reveal a strong dependence of the properties of the considered basins with the Jacobi constant as well as with the Schwarzschild radius of the black holes.
Jha, Sunil K; Hayashi, Kenshi
2015-03-01
In present work, a novel quartz crystal microbalance (QCM) sensor array has been developed for prompt identification of primary aldehydes in human body odor. Molecularly imprinted polymers (MIP) are prepared using the polyacrylic acid (PAA) polymer matrix and three organic acids (propenoic acid, hexanoic acid and octanoic acid) as template molecules, and utilized as QCM surface coating layer. The performance of MIP films is characterized by 4-element QCM sensor array (three coated with MIP layers and one with pure PAA for reference) dynamic and static responses to target aldehydes: hexanal, heptanal, and nonanal in single, binary, and tertiary mixtures at distinct concentrations. The target aldehydes were selected subsequent to characterization of body odor samples with solid phase-micro extraction gas chromatography mass spectrometer (SPME-GC-MS). The hexanoic acid and octanoic acid imprinted PAA exhibit fast response, and better sensitivity, selectivity and reproducibility than the propenoic acid, and non-imprinted PAA in array. The response time and recovery time for hexanoic acid imprinted PAA are obtained as 5 s and 12 s respectively to typical concentrations of binary and tertiary mixtures of aldehydes using the static response. Dynamic sensor array response matrix has been processed with principal component analysis (PCA) for visual, and support vector machine (SVM) classifier for quantitative identification of target odors. Aldehyde odors were identified successfully in principal component (PC) space. SVM classifier results maximum recognition rate 79% for three classes of binary odors and 83% including single, binary, and tertiary odor classes in 3-fold cross validation. Copyright © 2014 Elsevier B.V. All rights reserved.
Re-Examining Dissociations between Remembering and Knowing: Binary Judgments vs. Independent Ratings
ERIC Educational Resources Information Center
Brown, Aaron A.; Bodner, Glen E.
2011-01-01
When participants must classify their recognition experiences as remembering or knowing, variables often have dissociative effects on the two judgments. In contrast, when participants independently rate recollection "and" familiarity only parallel effects have been reported. To investigate this discrepancy we compared the effects of masked priming…
How Binary Skills Obscure the Transition from Non-Mastery to Mastery
ERIC Educational Resources Information Center
Karelitz, Tzur M.
2008-01-01
What is the nature of latent predictors that facilitate diagnostic classification? Rupp and Templin (this issue) suggest that these predictors should be multidimensional, categorical variables that can be combined in various ways. Diagnostic Classification Models (DCM) typically use multiple categorical predictors to classify respondents into…
Taylor, Cooper A; Miller, Bill R; Shah, Soleil S; Parish, Carol A
2017-02-01
Mutations in the amyloid precursor protein (APP) are responsible for the formation of amyloid-β peptides. These peptides play a role in Alzheimer's and other dementia-related diseases. The cargo binding domain of the kinesin-1 light chain motor protein (KLC1) may be responsible for transporting APP either directly or via interaction with C-jun N-terminal kinase-interacting protein 1 (JIP1). However, to date there has been no direct experimental or computational assessment of such binding at the atomistic level. We used molecular dynamics and free energy estimations to gauge the affinity for the binary complexes of KLC1, APP, and JIP1. We find that all binary complexes (KLC1:APP, KLC1:JIP1, and APP:JIP1) contain conformations with favorable binding free energies. For KLC1:APP the inclusion of approximate entropies reduces the favorability. This is likely due to the flexibility of the 42-residue APP protein. In all cases we analyze atomistic/residue driving forces for favorable interactions. Proteins 2017; 85:221-234. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Li, Shanshan; Yang, Dingyun; Tu, Haiyang; Deng, Hongtao; Du, Dan; Zhang, Aidong
2013-07-15
This work reports a study of protein adsorption and cell adhesion on binary self-assembled monolayers (SAMs) of alkanethiols with terminal perfluoroalkyl (PFA) and oligo(ethylene glycol) (OEG) chains in varying ratios. The surface chemistry of the SAMs was characterized by contact angle measurement, grazing angle infrared spectroscopy (GIR), X-ray photoelectron spectroscopy, and the effect on protein adsorption was investigated by surface plasmon resonance, GIR, and immunosorbent assay. Hela cell adhesion on these surfaces was also studied by fluorescence microscopy. Results reveal that, compared to OEG, PFA tended to be a higher fraction of the composition in SAM than in the assembly solution. More interestingly, the nearly 38% PFA SAM had a strong antifouling property whereas the 74% PFA SAM showed a high adsorption capacity to protein and cell. The binary PFA/OEG SAMs were favorable for maintaining the fibrinogen conformation, hence its high activity. The findings may have important implications for constructing PFA-containing surfaces with the distinct properties that is highly resistant or highly favorable toward protein adsorption and cell adhesion. Copyright © 2013 Elsevier Inc. All rights reserved.
Gan, Fah Fatt; Tang, Xu; Zhu, Yexin; Lim, Puay Weng
2017-06-01
The traditional variable life-adjusted display (VLAD) is a graphical display of the difference between expected and actual cumulative deaths. The VLAD assumes binary outcomes: death within 30 days of an operation or survival beyond 30 days. Full recovery and bedridden for life, for example, are considered the same outcome. This binary classification results in a great loss of information. Although there are many grades of survival, the binary outcomes are commonly used to classify surgical outcomes. Consequently, quality monitoring procedures are developed based on binary outcomes. With a more refined set of outcomes, the sensitivities of these procedures can be expected to improve. A likelihood ratio method is used to define a penalty-reward scoring system based on three or more surgical outcomes for the new VLAD. The likelihood ratio statistic W is based on testing the odds ratio of cumulative probabilities of recovery R. Two methods of implementing the new VLAD are proposed. We accumulate the statistic W-W¯R to estimate the performance of a surgeon where W¯R is the average of the W's of a historical data set. The accumulated sum will be zero based on the historical data set. This ensures that if a new VLAD is plotted for a future surgeon of performance similar to this average performance, the plot will exhibit a horizontal trend. For illustration of the new VLAD, we consider 3-outcome surgical results: death within 30 days, partial and full recoveries. In our first illustration, we show the effect of partial recoveries on surgical results of a surgeon. In our second and third illustrations, the surgical results of two surgeons are compared using both the traditional VLAD based on binary-outcome data and the new VLAD based on 3-outcome data. A reversal in relative performance of surgeons is observed when the new VLAD is used. In our final illustration, we display the surgical results of four surgeons using the new VLAD based completely on 3-outcome data. Full recovery and bedridden for life are two completely different outcomes. There is a great loss of information when different grades of 'successful' operations are naively classified as survival. When surgical outcomes are classified more accurately into more than two categories, the resulting new VLAD will reveal more accurately and fairly the surgical results. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Prediction and Identification of Krüppel-Like Transcription Factors by Machine Learning Method.
Liao, Zhijun; Wang, Xinrui; Chen, Xingyong; Zou, Quan
2017-01-01
The Krüppel-like factors (KLFs) are a family of containing Zn finger(ZF) motif transcription factors with 18 members in human genome, among them, KLF18 is predicted by bioinformatics. KLFs possess various physiological function involving in a number of cancers and other diseases. Here we perform a binary-class classification of KLFs and non-KLFs by machine learning methods. The protein sequences of KLFs and non-KLFs were searched from UniProt and randomly separate them into training dataset(containing positive and negative sequences) and test dataset(containing only negative sequences), after extracting the 188-dimensional(188D) feature vectors we carry out category with four classifiers(GBDT, libSVM, RF, and k-NN). On the human KLFs, we further dig into the evolutionary relationship and motif distribution, and finally we analyze the conserved amino acid residue of three zinc fingers. The classifier model from training dataset were well constructed, and the highest specificity(Sp) was 99.83% from a library for support vector machine(libSVM) and all the correctly classified rates were over 70% for 10-fold cross-validation on test dataset. The 18 human KLFs can be further divided into 7 groups and the zinc finger domains were located at the carboxyl terminus, and many conserved amino acid residues including Cysteine and Histidine, and the span and interval between them were consistent in the three ZF domains. Two classification models for KLFs prediction have been built by novel machine learning methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Classification of ligand molecules in PDB with fast heuristic graph match algorithm COMPLIG.
Saito, Mihoko; Takemura, Naomi; Shirai, Tsuyoshi
2012-12-14
A fast heuristic graph-matching algorithm, COMPLIG, was devised to classify the small-molecule ligands in the Protein Data Bank (PDB), which are currently not properly classified on structure basis. By concurrently classifying proteins and ligands, we determined the most appropriate parameter for categorizing ligands to be more than 60% identity of atoms and bonds between molecules, and we classified 11,585 types of ligands into 1946 clusters. Although the large clusters were composed of nucleotides or amino acids, a significant presence of drug compounds was also observed. Application of the system to classify the natural ligand status of human proteins in the current database suggested that, at most, 37% of the experimental structures of human proteins were in complex with natural ligands. However, protein homology- and/or ligand similarity-based modeling was implied to provide models of natural interactions for an additional 28% of the total, which might be used to increase the knowledge of intrinsic protein-metabolite interactions. Copyright © 2012 Elsevier Ltd. All rights reserved.
In-vivo detection of binary PKA network interactions upon activation of endogenous GPCRs
Röck, Ruth; Bachmann, Verena; Bhang, Hyo-eun C; Malleshaiah, Mohan; Raffeiner, Philipp; Mayrhofer, Johanna E; Tschaikner, Philipp M; Bister, Klaus; Aanstad, Pia; Pomper, Martin G; Michnick, Stephen W; Stefan, Eduard
2015-01-01
Membrane receptor-sensed input signals affect and modulate intracellular protein-protein interactions (PPIs). Consequent changes occur to the compositions of protein complexes, protein localization and intermolecular binding affinities. Alterations of compartmentalized PPIs emanating from certain deregulated kinases are implicated in the manifestation of diseases such as cancer. Here we describe the application of a genetically encoded Protein-fragment Complementation Assay (PCA) based on the Renilla Luciferase (Rluc) enzyme to compare binary PPIs of the spatially and temporally controlled protein kinase A (PKA) network in diverse eukaryotic model systems. The simplicity and sensitivity of this cell-based reporter allows for real-time recordings of mutually exclusive PPIs of PKA upon activation of selected endogenous G protein-coupled receptors (GPCRs) in cancer cells, xenografts of mice, budding yeast, and zebrafish embryos. This extends the application spectrum of Rluc PCA for the quantification of PPI-based receptor-effector relationships in physiological and pathological model systems. PMID:26099953
Chaotic Particle Swarm Optimization with Mutation for Classification
Assarzadeh, Zahra; Naghsh-Nilchi, Ahmad Reza
2015-01-01
In this paper, a chaotic particle swarm optimization with mutation-based classifier particle swarm optimization is proposed to classify patterns of different classes in the feature space. The introduced mutation operators and chaotic sequences allows us to overcome the problem of early convergence into a local minima associated with particle swarm optimization algorithms. That is, the mutation operator sharpens the convergence and it tunes the best possible solution. Furthermore, to remove the irrelevant data and reduce the dimensionality of medical datasets, a feature selection approach using binary version of the proposed particle swarm optimization is introduced. In order to demonstrate the effectiveness of our proposed classifier, mutation-based classifier particle swarm optimization, it is checked out with three sets of data classifications namely, Wisconsin diagnostic breast cancer, Wisconsin breast cancer and heart-statlog, with different feature vector dimensions. The proposed algorithm is compared with different classifier algorithms including k-nearest neighbor, as a conventional classifier, particle swarm-classifier, genetic algorithm, and Imperialist competitive algorithm-classifier, as more sophisticated ones. The performance of each classifier was evaluated by calculating the accuracy, sensitivity, specificity and Matthews's correlation coefficient. The experimental results show that the mutation-based classifier particle swarm optimization unequivocally performs better than all the compared algorithms. PMID:25709937
Câmara, P R; Dutra, S N; Takahama Júnior, A; Fontes, Kbfc; Azevedo, R S
2016-09-01
To evaluate comparatively the influence of histopathological features on epithelial dysplasia (ED) and the effectiveness in usage of WHO and binary grading systems in actinic cheilitis (AC). Cytological and architectural alterations established by WHO for ED were evaluated in 107 cases of AC. Epithelial dysplasia was graded using WHO and binary systems. The comparisons were performed using kappa, chi-square, and phi coefficient tests (P < 0.05). Most cases were classified as mild ED (44.5%) in the WHO system and as low risk for malignant transformation (64.5%) in the binary system. There was a positive correlation between WHO and binary systems (k = 0.33; P < 0.0002). Loss of basal cell polarity (P < 0.001) was associated with severity of ED grade in the WHO system. Anisonucleosis (P < 0.0001), nuclear pleomorphism (P < 0.0001), anisocytosis (P = 0.03), cell pleomorphism (P = 0.002) increased nuclear/cytoplasm ratio (P < 0.0001), increased nuclear size (P < 0.0001), increased number of mitotic figures (P = 0.0006), and dyskeratosis (P = 0.008) were associated with severity of ED grade in the binary system. It seems that usage of binary ED grading system in AC may be more precise because there is correlation between many of cytological and some of architectural microscopic alterations with increased grade of ED. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Abou Zeid, Elias; Rezazadeh Sereshkeh, Alborz; Schultz, Benjamin; Chau, Tom
2017-01-01
In recent years, the readiness potential (RP), a type of pre-movement neural activity, has been investigated for asynchronous electroencephalogram (EEG)-based brain-computer interfaces (BCIs). Since the RP is attenuated for involuntary movements, a BCI driven by RP alone could facilitate intentional control amid a plethora of unintentional movements. Previous studies have mainly attempted binary single-trial classification of RP. An RP-based BCI with three or more states would expand the options for functional control. Here, we propose a ternary BCI based on single-trial RPs. This BCI classifies amongst an idle state, a left hand and a right hand self-initiated fine movement. A pipeline of spatio-temporal filtering with per participant parameter optimization was used for feature extraction. The ternary classification was decomposed into binary classifications using a decision-directed acyclic graph (DDAG). For each class pair in the DDAG structure, an ordered diversified classifier system (ODCS-DDAG) was used to select the best among various classification algorithms or to combine the results of different classification algorithms. Using EEG data from 14 participants performing self-initiated left or right key presses, punctuated with rest periods, we compared the performance of ODCS-DDAG to a ternary classifier and four popular multiclass decomposition methods using only a single classification algorithm. ODCS-DDAG had the highest performance (0.769 Cohen's Kappa score) and was significantly better than the ternary classifier and two of the four multiclass decomposition methods. Our work supports further study of RP-based BCI for intuitive asynchronous environmental control or augmentative communication. PMID:28596725
NASA Astrophysics Data System (ADS)
Sasaki, Kenya; Mitani, Yoshihiro; Fujita, Yusuke; Hamamoto, Yoshihiko; Sakaida, Isao
2017-02-01
In this paper, in order to classify liver cirrhosis on regions of interest (ROIs) images from B-mode ultrasound images, we have proposed to use the higher order local autocorrelation (HLAC) features. In a previous study, we tried to classify liver cirrhosis by using a Gabor filter based approach. However, the classification performance of the Gabor feature was poor from our preliminary experimental results. In order accurately to classify liver cirrhosis, we examined to use the HLAC features for liver cirrhosis classification. The experimental results show the effectiveness of HLAC features compared with the Gabor feature. Furthermore, by using a binary image made by an adaptive thresholding method, the classification performance of HLAC features has improved.
Barth, Holger; Stiles, Bradley G
2008-01-01
Binary bacterial toxins are unique AB-type toxins, composed of two non-linked proteins that act as a binding/translocation component and an enzyme component. All known actin-ADP-ribosylating toxins from clostridia possess this binary structure. This toxin family is comprised of the prototypical Clostridium botulinum C2 toxin, Clostridium perfringens iota toxin, Clostridium difficile CDT, and Clostridium spiroforme toxin. Once in the cytosol of host cells, these toxins transfer an ADP-ribose moiety from nicotinamide-adenosine-dinucleotide onto G-actin that then leads to depolymerization of actin filaments. In recent years much progress has been made towards understanding the cellular uptake mechanism of binary actin-ADP-ribosylating toxins, and in particular that of C2 toxin. Both components act in a precisely concerted manner to intoxicate eukaryotic cells. The binding/translocation (B-) component forms a complex with the enzyme (A-) component and mediates toxin binding to a cell-surface receptor. Following receptor-mediated endocytosis, the enzyme component escapes from acidic endosomes into the cytosol. Acidification of endosomes triggers pore formation by the binding/translocation component in endosomal membranes and the enzyme component subsequently translocates through the pore. This step requires a host cell chaperone, Hsp90. Due to their unique structure, binary toxins are naturally "tailor made" for transporting foreign proteins into the cytosol of host cells. Several highly specific and cell-permeable recombinant fusion proteins have been designed and successfully used in experimental cell research. This review will focus on the recent progress in studying binary actin ADP-ribosylating toxins as highly effective virulence factors and innovative tools for cell physiology as well as pharmacology.
Chromospherically active stars. IV - HD 178450 = V478 Lyr: An early-type BY Draconis type binary
NASA Technical Reports Server (NTRS)
Fekel, Francis C.
1988-01-01
It is shown that the variable star HD 178450 = V478 Lyr is a chromospherically active G8 V single-lined spectroscopic binary with a period of 2.130514 days. This star is characterized by strong UV emission features and a filled-in H-alpha absorption line which is variable in strength. Classified as an early-type BY Draconis system, it is similar to the BY Dra star HD 175742 = V775 Her. The unseen secondary of HD 178450 has a mass of about 0.3 solar masses and is believed to be an M2-M3 dwarf.
Observational constraints on the inter-binary stellar flare hypothesis for the gamma-ray bursts
NASA Astrophysics Data System (ADS)
Rao, A. R.; Vahia, M. N.
1994-01-01
The Gamma Ray Observatory/Burst and Transient Source Experiment (GRO/BATSE) results on the Gamma Ray Bursts (GRBs) have given an internally consistent set of observations of about 260 GRBs which have been released for analysis by the BATSE team. Using this database we investigate our earlier suggestion (Vahia and Rao, 1988) that GRBs are inter-binary stellar flares from a group of objects classified as Magnetically Active Stellar Systems (MASS) which includes flare stars, RS CVn binaries and cataclysmic variables. We show that there exists an observationally consistent parameter space for the number density, scale height and flare luminosity of MASS which explains the complete log(N) - log(P) distribution of GRBs as also the observed isotropic distribution. We further use this model to predict anisotropy in the GRB distribution at intermediate luminosities. We make definite predictions under the stellar flare hypothesis that can be tested in the near future.
Double-lined M dwarf eclipsing binaries from Catalina Sky Survey and LAMOST
NASA Astrophysics Data System (ADS)
Lee, Chien-Hsiu; Lin, Chien-Cheng
2017-02-01
Eclipsing binaries provide a unique opportunity to determine fundamental stellar properties. In the era of wide-field cameras and all-sky imaging surveys, thousands of eclipsing binaries have been reported through light curve classification, yet their basic properties remain unexplored due to the extensive efforts needed to follow them up spectroscopically. In this paper we investigate three M2-M3 type double-lined eclipsing binaries discovered by cross-matching eclipsing binaries from the Catalina Sky Survey with spectroscopically classified M dwarfs from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope survey data release one and two. Because these three M dwarf binaries are faint, we further acquire radial velocity measurements using GMOS on the Gemini North telescope with R˜ 4000, enabling us to determine the mass and radius of individual stellar components. By jointly fitting the light and radial velocity curves of these systems, we derive the mass and radius of the primary and secondary components of these three systems, in the range between 0.28-0.42M_⊙ and 0.29-0.67R_⊙, respectively. Future observations with a high resolution spectrograph will help us pin down the uncertainties in their stellar parameters, and render these systems benchmarks to study M dwarfs, providing inputs to improving stellar models in the low mass regime, or establishing an empirical mass-radius relation for M dwarf stars.
NASA Astrophysics Data System (ADS)
da Silva, Flávio Altinier Maximiano; Pedrini, Helio
2015-03-01
Facial expressions are an important demonstration of humanity's humors and emotions. Algorithms capable of recognizing facial expressions and associating them with emotions were developed and employed to compare the expressions that different cultural groups use to show their emotions. Static pictures of predominantly occidental and oriental subjects from public datasets were used to train machine learning algorithms, whereas local binary patterns, histogram of oriented gradients (HOGs), and Gabor filters were employed to describe the facial expressions for six different basic emotions. The most consistent combination, formed by the association of HOG filter and support vector machines, was then used to classify the other cultural group: there was a strong drop in accuracy, meaning that the subtle differences of facial expressions of each culture affected the classifier performance. Finally, a classifier was trained with images from both occidental and oriental subjects and its accuracy was higher on multicultural data, evidencing the need of a multicultural training set to build an efficient classifier.
The evaluation of alternate methodologies for land cover classification in an urbanizing area
NASA Technical Reports Server (NTRS)
Smekofski, R. M.
1981-01-01
The usefulness of LANDSAT in classifying land cover and in identifying and classifying land use change was investigated using an urbanizing area as the study area. The question of what was the best technique for classification was the primary focus of the study. The many computer-assisted techniques available to analyze LANDSAT data were evaluated. Techniques of statistical training (polygons from CRT, unsupervised clustering, polygons from digitizer and binary masks) were tested with minimum distance to the mean, maximum likelihood and canonical analysis with minimum distance to the mean classifiers. The twelve output images were compared to photointerpreted samples, ground verified samples and a current land use data base. Results indicate that for a reconnaissance inventory, the unsupervised training with canonical analysis-minimum distance classifier is the most efficient. If more detailed ground truth and ground verification is available, the polygons from the digitizer training with the canonical analysis minimum distance is more accurate.
Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor
Alamedine, D.; Khalil, M.; Marque, C.
2013-01-01
Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536
Pärkkä, Juha; Cluitmans, Luc; Ermes, Miikka
2010-09-01
Inactive and sedentary lifestyle is a major problem in many industrialized countries today. Automatic recognition of type of physical activity can be used to show the user the distribution of his daily activities and to motivate him into more active lifestyle. In this study, an automatic activity-recognition system consisting of wireless motion bands and a PDA is evaluated. The system classifies raw sensor data into activity types online. It uses a decision tree classifier, which has low computational cost and low battery consumption. The classifier parameters can be personalized online by performing a short bout of an activity and by telling the system which activity is being performed. Data were collected with seven volunteers during five everyday activities: lying, sitting/standing, walking, running, and cycling. The online system can detect these activities with overall 86.6% accuracy and with 94.0% accuracy after classifier personalization.
Cho, Ming-Yuan; Hoang, Thi Thom
2017-01-01
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Wijaya, Sony Hartono; Afendi, Farit Mochamad; Batubara, Irmanida; Darusman, Latifah K; Altaf-Ul-Amin, Md; Kanaya, Shigehiko
2016-12-07
The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d.
Learning Instructor Intervention from MOOC Forums: Early Results and Issues
ERIC Educational Resources Information Center
Kumar, Muthu; Kan, Min-Yen; Tan, Bernard C. Y.; Ragupathi, Kiruthika
2015-01-01
With large student enrollment, MOOC instructors face the unique challenge in deciding when to intervene in forum discussions with their limited bandwidth. We study this problem of "instructor intervention." Using a large sample of forum data culled from 61 courses, we design a binary classifier to predict whether an instructor should…
NASA Astrophysics Data System (ADS)
Adi Putra, Januar
2018-04-01
In this paper, we propose a new mammogram classification scheme to classify the breast tissues as normal or abnormal. Feature matrix is generated using Local Binary Pattern to all the detailed coefficients from 2D-DWT of the region of interest (ROI) of a mammogram. Feature selection is done by selecting the relevant features that affect the classification. Feature selection is used to reduce the dimensionality of data and features that are not relevant, in this paper the F-test and Ttest will be performed to the results of the feature extraction dataset to reduce and select the relevant feature. The best features are used in a Neural Network classifier for classification. In this research we use MIAS and DDSM database. In addition to the suggested scheme, the competent schemes are also simulated for comparative analysis. It is observed that the proposed scheme has a better say with respect to accuracy, specificity and sensitivity. Based on experiments, the performance of the proposed scheme can produce high accuracy that is 92.71%, while the lowest accuracy obtained is 77.08%.
The construction of support vector machine classifier using the firefly algorithm.
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511
Filius, Anika; Scheltens, Marjan; Bosch, Hans G.; van Doorn, Pieter A.; Stam, Henk J.; Hovius, Steven E.R.; Amadio, Peter C.; Selles, Ruud W.
2015-01-01
Dynamics of structures within the carpal tunnel may alter in carpal tunnel syndrome (CTS) due to fibrotic changes and increased carpal tunnel pressure. Ultrasound can visualize these potential changes, making ultrasound potentially an accurate diagnostic tool. To study this, we imaged the carpal tunnel of 113 patients and 42 controls. CTS severity was classified according to validated clinical and nerve conduction study (NCS) classifications. Transversal and longitudinal displacement and shape (changes) were calculated for the median nerve, tendons and surrounding tissue. To predict diagnostic value binary logistic regression modeling was applied. Reduced longitudinal nerve displacement (p≤0.019), increased nerve cross-sectional area (p≤0.006) and perimeter (p≤0.007), and a trend of relatively changed tendon displacements were seen in patients. Changes were more convincing when CTS was classified as more severe. Binary logistic modeling to diagnose CTS using ultrasound showed a sensitivity of 70-71% and specificity of 80-84%. In conclusion, CTS patients have altered dynamics of structures within the carpal tunnel. PMID:25865180
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Foo, Brian; van der Schaar, Mihaela
2010-11-01
In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.
Fourier-based classification of protein secondary structures.
Shu, Jian-Jun; Yong, Kian Yan
2017-04-15
The correct prediction of protein secondary structures is one of the key issues in predicting the correct protein folded shape, which is used for determining gene function. Existing methods make use of amino acids properties as indices to classify protein secondary structures, but are faced with a significant number of misclassifications. The paper presents a technique for the classification of protein secondary structures based on protein "signal-plotting" and the use of the Fourier technique for digital signal processing. New indices are proposed to classify protein secondary structures by analyzing hydrophobicity profiles. The approach is simple and straightforward. Results show that the more types of protein secondary structures can be classified by means of these newly-proposed indices. Copyright © 2017 Elsevier Inc. All rights reserved.
Yang, Qing; Zhang, Jie; Li, Tianhui; Liu, Shen; Song, Ping; Nangong, Ziyan; Wang, Qinying
2017-09-01
PirAB (Photorhabdus insect-related proteins, PirAB) toxin was initially found in the Photorhabdus luminescens TT01 strain and has been shown to be a binary toxin with high insecticidal activity. Based on GenBank data, this gene was also found in the Xenorhabdus nematophila genome sequence. The predicted amino acid sequence of pirA and pirB in the genome of X. nematophila showed 51% and 50% identity with those gene sequences from P. luminescens. The purpose of this experiment is to identify the relevant information for this toxin gene in X. nematophila. The pirA, pirB and pirAB genes of X. nematophila HB310 were cloned and expressed in Escherichia coli BL21 (DE3) using the pET-28a vector. A PirAB-fusion protein (PirAB-F) was constructed by linking the pirA and pirB genes with the flexible linker (Gly) 4 DNA encoding sequence and then efficiently expressed in E. coli. The hemocoel and oral insecticidal activities of the recombinant proteins were analyzed against the larvae of Galleria mellonella. The results show that PirA/B alone, PirA/B mixture, co-expressed PirAB protein, and PirAB-F all had no oral insecticidal activity against the second-instar larvae of G. mellonella. Only PirA/B mixture and co-expressed PirAB protein had hemocoel insecticidal activity against G. mellonella fifth-instar larvae, with an LD 50 of 2.718μg/larva or 1.566μg/larva, respectively. Therefore, we confirmed that PirAB protein of X. nematophila HB310 is a binary insecticidal toxin. The successful expression and purification of PirAB laid a foundation for further studies on the function, insecticidal mechanism and expression regulation of the binary toxin. Copyright © 2017 Elsevier Inc. All rights reserved.
Zahiri, Javad; Mohammad-Noori, Morteza; Ebrahimpour, Reza; Saadat, Samaneh; Bozorgmehr, Joseph H; Goldberg, Tatyana; Masoudi-Nejad, Ali
2014-12-01
Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse. The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Lim, Jeong-Hwan; Hwang, Han-Jeong; Han, Chang-Hee; Jung, Ki-Young; Im, Chang-Hwan
2013-04-01
Objective. Some patients suffering from severe neuromuscular diseases have difficulty controlling not only their bodies but also their eyes. Since these patients have difficulty gazing at specific visual stimuli or keeping their eyes open for a long time, they are unable to use the typical steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) systems. In this study, we introduce a new paradigm for SSVEP-based BCI, which can be potentially suitable for disabled individuals with impaired oculomotor function. Approach. The proposed electroencephalography (EEG)-based BCI system allows users to express their binary intentions without needing to open their eyes. A pair of glasses with two light emitting diodes flickering at different frequencies was used to present visual stimuli to participants with their eyes closed, and we classified the recorded EEG patterns in the online experiments conducted with five healthy participants and one patient with severe amyotrophic lateral sclerosis (ALS). Main results. Through offline experiments performed with 11 participants, we confirmed that human SSVEP could be modulated by visual selective attention to a specific light stimulus penetrating through the eyelids. Furthermore, the recorded EEG patterns could be classified with accuracy high enough for use in a practical BCI system. After customizing the parameters of the proposed SSVEP-based BCI paradigm based on the offline analysis results, binary intentions of five healthy participants were classified in real time. The average information transfer rate of our online experiments reached 10.83 bits min-1. A preliminary online experiment conducted with an ALS patient showed a classification accuracy of 80%. Significance. The results of our offline and online experiments demonstrated the feasibility of our proposed SSVEP-based BCI paradigm. It is expected that our ‘eyes-closed’ SSVEP-based BCI system can be potentially used for communication of disabled individuals with impaired oculomotor function.
Lim, Jeong-Hwan; Hwang, Han-Jeong; Han, Chang-Hee; Jung, Ki-Young; Im, Chang-Hwan
2013-04-01
Some patients suffering from severe neuromuscular diseases have difficulty controlling not only their bodies but also their eyes. Since these patients have difficulty gazing at specific visual stimuli or keeping their eyes open for a long time, they are unable to use the typical steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) systems. In this study, we introduce a new paradigm for SSVEP-based BCI, which can be potentially suitable for disabled individuals with impaired oculomotor function. The proposed electroencephalography (EEG)-based BCI system allows users to express their binary intentions without needing to open their eyes. A pair of glasses with two light emitting diodes flickering at different frequencies was used to present visual stimuli to participants with their eyes closed, and we classified the recorded EEG patterns in the online experiments conducted with five healthy participants and one patient with severe amyotrophic lateral sclerosis (ALS). Through offline experiments performed with 11 participants, we confirmed that human SSVEP could be modulated by visual selective attention to a specific light stimulus penetrating through the eyelids. Furthermore, the recorded EEG patterns could be classified with accuracy high enough for use in a practical BCI system. After customizing the parameters of the proposed SSVEP-based BCI paradigm based on the offline analysis results, binary intentions of five healthy participants were classified in real time. The average information transfer rate of our online experiments reached 10.83 bits min(-1). A preliminary online experiment conducted with an ALS patient showed a classification accuracy of 80%. The results of our offline and online experiments demonstrated the feasibility of our proposed SSVEP-based BCI paradigm. It is expected that our 'eyes-closed' SSVEP-based BCI system can be potentially used for communication of disabled individuals with impaired oculomotor function.
Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.
Daberdaku, Sebastian; Ferrari, Carlo
2018-02-06
The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.
Khazendar, S.; Sayasneh, A.; Al-Assam, H.; Du, H.; Kaijser, J.; Ferrara, L.; Timmerman, D.; Jassim, S.; Bourne, T.
2015-01-01
Introduction: Preoperative characterisation of ovarian masses into benign or malignant is of paramount importance to optimise patient management. Objectives: In this study, we developed and validated a computerised model to characterise ovarian masses as benign or malignant. Materials and methods: Transvaginal 2D B mode static ultrasound images of 187 ovarian masses with known histological diagnosis were included. Images were first pre-processed and enhanced, and Local Binary Pattern Histograms were then extracted from 2 × 2 blocks of each image. A Support Vector Machine (SVM) was trained using stratified cross validation with randomised sampling. The process was repeated 15 times and in each round 100 images were randomly selected. Results: The SVM classified the original non-treated static images as benign or malignant masses with an average accuracy of 0.62 (95% CI: 0.59-0.65). This performance significantly improved to an average accuracy of 0.77 (95% CI: 0.75-0.79) when images were pre-processed, enhanced and treated with a Local Binary Pattern operator (mean difference 0.15: 95% 0.11-0.19, p < 0.0001, two-tailed t test). Conclusion: We have shown that an SVM can classify static 2D B mode ultrasound images of ovarian masses into benign and malignant categories. The accuracy improves if texture related LBP features extracted from the images are considered. PMID:25897367
Orbit classification in an equal-mass non-spinning binary black hole pseudo-Newtonian system
NASA Astrophysics Data System (ADS)
Zotos, Euaggelos E.; Dubeibe, F. L.; González, Guillermo A.
2018-04-01
The dynamics of a test particle in a non-spinning binary black hole system of equal masses is numerically investigated. The binary system is modeled in the context of the pseudo-Newtonian circular restricted three-body problem, such that the primaries are separated by a fixed distance and move in a circular orbit around each other. In particular, the Paczyński-Wiita potential is used for describing the gravitational field of the two non-Newtonian primaries. The orbital properties of the test particle are determined through the classification of the initial conditions of the orbits, using several values of the Jacobi constant, in the Hill's regions of possible motion. The initial conditions are classified into three main categories: (i) bounded, (ii) escaping and (iii) displaying close encounters. Using the smaller alignment index (SALI) chaos indicator, we further classify bounded orbits into regular, sticky or chaotic. To gain a complete view of the dynamics of the system, we define grids of initial conditions on different types of two-dimensional planes. The orbital structure of the configuration plane, along with the corresponding distributions of the escape and collision/close encounter times, allow us to observe the transition from the classical Newtonian to the pseudo-Newtonian regime. Our numerical results reveal a strong dependence of the properties of the considered basins with the Jacobi constant as well as with the Schwarzschild radius of the black holes.
Intelligent query by humming system based on score level fusion of multiple classifiers
NASA Astrophysics Data System (ADS)
Pyo Nam, Gi; Thu Trang Luong, Thi; Ha Nam, Hyun; Ryoung Park, Kang; Park, Sung-Joo
2011-12-01
Recently, the necessity for content-based music retrieval that can return results even if a user does not know information such as the title or singer has increased. Query-by-humming (QBH) systems have been introduced to address this need, as they allow the user to simply hum snatches of the tune to find the right song. Even though there have been many studies on QBH, few have combined multiple classifiers based on various fusion methods. Here we propose a new QBH system based on the score level fusion of multiple classifiers. This research is novel in the following three respects: three local classifiers [quantized binary (QB) code-based linear scaling (LS), pitch-based dynamic time warping (DTW), and LS] are employed; local maximum and minimum point-based LS and pitch distribution feature-based LS are used as global classifiers; and the combination of local and global classifiers based on the score level fusion by the PRODUCT rule is used to achieve enhanced matching accuracy. Experimental results with the 2006 MIREX QBSH and 2009 MIR-QBSH corpus databases show that the performance of the proposed method is better than that of single classifier and other fusion methods.
Low-mass Pre-He White Dwarf Stars in Kepler Eclipsing Binaries with Multi-periodic Pulsations
NASA Astrophysics Data System (ADS)
Zhang, X. B.; Fu, J. N.; Liu, N.; Luo, C. Q.; Ren, A. B.
2017-12-01
We report the discovery of two thermally bloated low-mass pre-He white dwarfs (WDs) in two eclipsing binaries, KIC 10989032 and KIC 8087799. Based on the Kepler long-cadence photometry, we determined comprehensive photometric solutions of the two binary systems. The light curve analysis reveals that KIC 10989032 is a partially eclipsed detached binary system containing a probable low-mass WD with the temperature of about 10,300 K. Having a WD with the temperature of about 13,300, KKIC 8087799 is typical of an EL CVn system. By utilizing radial velocity measurements available for the A-type primary star of KIC 10989032, the mass and radius of the WD component are determined to be 0.24+/- 0.02 {M}⊙ and 0.50+/- 0.01 {R}⊙ , respectively. The values of mass and radius of the WD in KIC 8087799 are estimated as 0.16 ± 0.02 M ⊙ and 0.21 ± 0.01 R ⊙, respectively, according to the effective temperature and mean density of the A-type star derived from the photometric solution. We therefore introduce KIC 10989032 and KIC 8087799 as the eleventh and twelfth dA+WD eclipsing binaries in the Kepler field. Moreover, both binaries display marked multi-periodic pulsations superimposed on binary effects. A preliminary frequency analysis is applied to the light residuals when subtracting the synthetic eclipsing light curves from the observations, revealing that the light pulsations of the two systems are both due to the δ Sct-type primaries. We hence classify KIC 10989032 and KIC 8087799 as two WD+δ Sct binaries.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
Mohammad-Noori, Morteza; Beer, Michael A.
2014-01-01
Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408
Enhanced regulatory sequence prediction using gapped k-mer features.
Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A
2014-07-01
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.
A Search for Black Holes and Neutron Stars in the Kepler Field
NASA Astrophysics Data System (ADS)
Orosz, Jerome; Short, Donald; Welsh, William; Windmiller, Gur; Dabney, David
2018-01-01
Black holes and neutron stars represent the final evolutionary stages of the most massive stars. In addition to their use as probes into the evolution of massive stars, black holes and neutron stars are ideal laboratories to test General Relativity in the strong field limit. The number of neutron stars and black holes in the Milky Way is not precisely known, but there are an estimated one billion neutron stars in the galaxy based on the observed numbers of radio pulsars. The number of black holes is about 100 million, based on the behavior of the Initial Mass Function at high stellar masses.All of the known steller-mass black holes (and a fair number of neutron stars) are in ``X-ray binaries'' that were discovered because of their luminous X-ray emission. The requirement to be in an X-ray-emitting binary places a strong observational bias on the discovery of stellar-mass black holes. Thus the 21 known black hole binaries represent only the very uppermost tip of the population iceberg.We have conducted an optical survey using Kepler data designed to uncover black holes and neutron stars in both ``quiescent'' X-ray binaries and ``pre-contact'' X-ray binaries. We discuss how the search was conducted, including how potentially interesting light curves were classified and the how variability types were identified. Although we did not find any convincing candidate neutron star or black hole systems, we did find a few noteworthy binary systems, including two binaries that contain low-mass stars with unusually low albedos.
An overview of the structures of protein-DNA complexes
Luscombe, Nicholas M; Austin, Susan E; Berman , Helen M; Thornton, Janet M
2000-01-01
On the basis of a structural analysis of 240 protein-DNA complexes contained in the Protein Data Bank (PDB), we have classified the DNA-binding proteins involved into eight different structural/functional groups, which are further classified into 54 structural families. Here we present this classification and review the functions, structures and binding interactions of these protein-DNA complexes. PMID:11104519
A clinical measure of DNA methylation predicts outcome in de novo acute myeloid leukemia
Luskin, Marlise R.; Gimotty, Phyllis A.; Smith, Catherine; Loren, Alison W.; Figueroa, Maria E.; Harrison, Jenna; Sun, Zhuoxin; Tallman, Martin S.; Paietta, Elisabeth M.; Litzow, Mark R.; Melnick, Ari M.; Levine, Ross L.; Fernandez, Hugo F.; Luger, Selina M.; Master, Stephen R.; Wertheim, Gerald B.W.
2016-01-01
BACKGROUND. Variable response to chemotherapy in acute myeloid leukemia (AML) represents a major treatment challenge. Clinical and genetic features incompletely predict outcome. The value of clinical epigenetic assays for risk classification has not been extensively explored. We assess the prognostic implications of a clinical assay for multilocus DNA methylation on adult patients with de novo AML. METHODS. We performed multilocus DNA methylation assessment using xMELP on samples and calculated a methylation statistic (M-score) for 166 patients from UPENN with de novo AML who received induction chemotherapy. The association of M-score with complete remission (CR) and overall survival (OS) was evaluated. The optimal M-score cut-point for identifying groups with differing survival was used to define a binary M-score classifier. This classifier was validated in an independent cohort of 383 patients from the Eastern Cooperative Oncology Group Trial 1900 (E1900; NCT00049517). RESULTS. A higher mean M-score was associated with death and failure to achieve CR. Multivariable analysis confirmed that a higher M-score was associated with death (P = 0.011) and failure to achieve CR (P = 0.034). Median survival was 26.6 months versus 10.6 months for low and high M-score groups. The ability of the M-score to perform as a classifier was confirmed in patients ≤ 60 years with intermediate cytogenetics and patients who achieved CR, as well as in the E1900 validation cohort. CONCLUSION. The M-score represents a valid binary prognostic classifier for patients with de novo AML. The xMELP assay and associated M-score can be used for prognosis and should be further investigated for clinical decision making in AML patients. PMID:27446991
Accuracy of automated classification of major depressive disorder as a function of symptom severity.
Ramasubbu, Rajamannar; Brown, Matthew R G; Cortese, Filmeno; Gaxiola, Ismael; Goodyear, Bradley; Greenshaw, Andrew J; Dursun, Serdar M; Greiner, Russell
2016-01-01
Growing evidence documents the potential of machine learning for developing brain based diagnostic methods for major depressive disorder (MDD). As symptom severity may influence brain activity, we investigated whether the severity of MDD affected the accuracies of machine learned MDD-vs-Control diagnostic classifiers. Forty-five medication-free patients with DSM-IV defined MDD and 19 healthy controls participated in the study. Based on depression severity as determined by the Hamilton Rating Scale for Depression (HRSD), MDD patients were sorted into three groups: mild to moderate depression (HRSD 14-19), severe depression (HRSD 20-23), and very severe depression (HRSD ≥ 24). We collected functional magnetic resonance imaging (fMRI) data during both resting-state and an emotional-face matching task. Patients in each of the three severity groups were compared against controls in separate analyses, using either the resting-state or task-based fMRI data. We use each of these six datasets with linear support vector machine (SVM) binary classifiers for identifying individuals as patients or controls. The resting-state fMRI data showed statistically significant classification accuracy only for the very severe depression group (accuracy 66%, p = 0.012 corrected), while mild to moderate (accuracy 58%, p = 1.0 corrected) and severe depression (accuracy 52%, p = 1.0 corrected) were only at chance. With task-based fMRI data, the automated classifier performed at chance in all three severity groups. Binary linear SVM classifiers achieved significant classification of very severe depression with resting-state fMRI, but the contribution of brain measurements may have limited potential in differentiating patients with less severe depression from healthy controls.
Hannibal, Charlotte Gerd; Vang, Russell; Junge, Jette; Kjaerbye-Thygesen, Anette; Kurman, Robert J; Kjaer, Susanne K
2012-06-01
To evaluate the prognostic significance of histologic grade on survival of ovarian serous cancer in Denmark during nearly 30 years. Using the nationwide Danish Pathology Data Bank, we evaluated 4317 women with ovarian serous carcinoma in 1978-2006. All pathology reports were scrutinized and tumors classified as either low-grade serous carcinomas (LGSC) or high-grade serous carcinomas (HGSC). Tumors in which the original pathology reports were described as well-differentiated were classified as LGSC, and those that were described as moderately or poorly differentiated were classified as HGSC. We obtained histologic slides from the pathology departments for women with a diagnosis of well-differentiated serous carcinoma during 1997-2006, which were then reviewed by expert gynecologic pathologists. Data were analyzed using Kaplan-Meier methods and Cox proportional hazards regression analysis with follow-up through June 2009. Women with HGSC had a significantly increased risk of dying (HR=1.9; 95% CI: 1.6-2.3) compared with women with LGSC while adjusting for age and stage. Expert review of 171 women originally classified as well-differentiated in 1997-2006 were interpreted as LGSC in 30% of cases, whereas 12% were interpreted as HGSC and 50% as serous borderline ovarian tumors (SBT). Compared with women with confirmed LGSC, women with SBT at review had a significantly lower risk of dying (HR=0.5; 95% CI: 0.22-0.99), and women with HGSC at review had a non-significantly increased risk of dying (HR=1.6; 95% CI: 0.7-3.4). A binary grading system is a significant predictor of survival for ovarian serous carcinoma. Copyright © 2012 Elsevier Inc. All rights reserved.
Matching Matched Filtering with Deep Networks for Gravitational-Wave Astronomy
NASA Astrophysics Data System (ADS)
Gabbard, Hunter; Williams, Michael; Hayes, Fergus; Messenger, Chris
2018-04-01
We report on the construction of a deep convolutional neural network that can reproduce the sensitivity of a matched-filtering search for binary black hole gravitational-wave signals. The standard method for the detection of well-modeled transient gravitational-wave signals is matched filtering. We use only whitened time series of measured gravitational-wave strain as an input, and we train and test on simulated binary black hole signals in synthetic Gaussian noise representative of Advanced LIGO sensitivity. We show that our network can classify signal from noise with a performance that emulates that of match filtering applied to the same data sets when considering the sensitivity defined by receiver-operator characteristics.
Matching Matched Filtering with Deep Networks for Gravitational-Wave Astronomy.
Gabbard, Hunter; Williams, Michael; Hayes, Fergus; Messenger, Chris
2018-04-06
We report on the construction of a deep convolutional neural network that can reproduce the sensitivity of a matched-filtering search for binary black hole gravitational-wave signals. The standard method for the detection of well-modeled transient gravitational-wave signals is matched filtering. We use only whitened time series of measured gravitational-wave strain as an input, and we train and test on simulated binary black hole signals in synthetic Gaussian noise representative of Advanced LIGO sensitivity. We show that our network can classify signal from noise with a performance that emulates that of match filtering applied to the same data sets when considering the sensitivity defined by receiver-operator characteristics.
ERIC Educational Resources Information Center
Birioukov, Anton
2016-01-01
Student absenteeism in secondary schools has received international academic attention for quite some time. Absenteeism has been linked to diminished academic outcomes and is one of the leading causes of high school dropout. Although absenteeism is a serious concern for educational scholars, the definitions of absences and their subtypes are…
Machine Learning Through Signature Trees. Applications to Human Speech.
ERIC Educational Resources Information Center
White, George M.
A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…
ERIC Educational Resources Information Center
Albaqshi, Amani Mohammed H.
2017-01-01
Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional…
Chen, Yen-Kuang; Li, Kuo-Bin
2013-02-07
The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt. Copyright © 2012 Elsevier Ltd. All rights reserved.
Yang, Ming; Ge, Yan; Wu, Jiayan; Xiao, Jingfa; Yu, Jun
2011-05-20
Coevolution can be seen as the interdependency between evolutionary histories. In the context of protein evolution, functional correlation proteins are ever-present coordinated evolutionary characters without disruption of organismal integrity. As to complex system, there are two forms of protein--protein interactions in vivo, which refer to inter-complex interaction and intra-complex interaction. In this paper, we studied the difference of coevolution characters between inter-complex interaction and intra-complex interaction using "Mirror tree" method on the respiratory chain (RC) proteins. We divided the correlation coefficients of every pairwise RC proteins into two groups corresponding to the binary protein--protein interaction in intra-complex and the binary protein--protein interaction in inter-complex, respectively. A dramatical discrepancy is detected between the coevolution characters of the two sets of protein interactions (Wilcoxon test, p-value = 4.4 × 10(-6)). Our finding reveals some critical information on coevolutionary study and assists the mechanical investigation of protein--protein interaction. Furthermore, the results also provide some unique clue for supramolecular organization of protein complexes in the mitochondrial inner membrane. More detailed binding sites map and genome information of nuclear encoded RC proteins will be extraordinary valuable for the further mitochondria dynamics study. Copyright © 2011. Published by Elsevier Ltd.
Jiang, Xiaoying; Wei, Rong; Zhao, Yanjun; Zhang, Tongliang
2008-05-01
The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.
Okura, Hiromichi; Takahashi, Tsuyoshi; Mihara, Hisakazu
2012-06-01
Successful approaches of de novo protein design suggest a great potential to create novel structural folds and to understand natural rules of protein folding. For these purposes, smaller and simpler de novo proteins have been developed. Here, we constructed smaller proteins by removing the terminal sequences from stable de novo vTAJ proteins and compared stabilities between mutant and original proteins. vTAJ proteins were screened from an α3β3 binary-patterned library which was designed with polar/ nonpolar periodicities of α-helix and β-sheet. vTAJ proteins have the additional terminal sequences due to the method of constructing the genetically repeated library sequences. By removing the parts of the sequences, we successfully obtained the stable smaller de novo protein mutants with fewer amino acid alphabets than the originals. However, these mutants showed the differences on ANS binding properties and stabilities against denaturant and pH change. The terminal sequences, which were designed just as flexible linkers not as secondary structure units, sufficiently affected these physicochemical details. This study showed implications for adjusting protein stabilities by designing N- and C-terminal sequences.
Binary stress induces an increase in indole alkaloid biosynthesis in Catharanthus roseus
Zhu, Wei; Yang, Bingxian; Komatsu, Setsuko; Lu, Xiaoping; Li, Ximin; Tian, Jingkui
2015-01-01
Catharanthus roseus is an important medicinal plant, which produces a variety of indole alkaloids of significant pharmaceutical relevance. In the present study, we aimed to investigate the potential stress-induced increase of indole alkaloid biosynthesis in C. roseus using proteomic technique. The contents of the detectable alkaloids ajmalicine, vindoline, catharanthine, and strictosidine in C. roseus were significantly increased under binary stress. Proteomic analysis revealed that the abundance of proteins related to tricarboxylic acid cycle and cell wall was largely increased; while, that of proteins related to tetrapyrrole synthesis and photosynthesis was decreased. Of note, 10-hydroxygeraniol oxidoreductase, which is involved in the biosynthesis of indole alkaloid was two-fold more abundant in treated group compared to the control. In addition, mRNA expression levels of genes involved in the indole alkaloid biosynthetic pathway indicated an up-regulation in their transcription in C. roseus under UV-B irradiation. These results suggest that binary stress might negatively affect the process of photosynthesis in C. roseus. In addition, the induction of alkaloid biosynthesis appears to be responsive to binary stress. PMID:26284098
Modeling of protein binary complexes using structural mass spectrometry data
Kamal, J.K. Amisha; Chance, Mark R.
2008-01-01
In this article, we describe a general approach to modeling the structure of binary protein complexes using structural mass spectrometry data combined with molecular docking. In the first step, hydroxyl radical mediated oxidative protein footprinting is used to identify residues that experience conformational reorganization due to binding or participate in the binding interface. In the second step, a three-dimensional atomic structure of the complex is derived by computational modeling. Homology modeling approaches are used to define the structures of the individual proteins if footprinting detects significant conformational reorganization as a function of complex formation. A three-dimensional model of the complex is constructed from these binary partners using the ClusPro program, which is composed of docking, energy filtering, and clustering steps. Footprinting data are used to incorporate constraints—positive and/or negative—in the docking step and are also used to decide the type of energy filter—electrostatics or desolvation—in the successive energy-filtering step. By using this approach, we examine the structure of a number of binary complexes of monomeric actin and compare the results to crystallographic data. Based on docking alone, a number of competing models with widely varying structures are observed, one of which is likely to agree with crystallographic data. When the docking steps are guided by footprinting data, accurate models emerge as top scoring. We demonstrate this method with the actin/gelsolin segment-1 complex. We also provide a structural model for the actin/cofilin complex using this approach which does not have a crystal or NMR structure. PMID:18042684
Mazloom, Amin R.; Dannenfelser, Ruth; Clark, Neil R.; Grigoryan, Arsen V.; Linder, Kathryn M.; Cardozo, Timothy J.; Bond, Julia C.; Boran, Aislyn D. W.; Iyengar, Ravi; Malovannaya, Anna; Lanz, Rainer B.; Ma'ayan, Avi
2011-01-01
Coregulator proteins (CoRegs) are part of multi-protein complexes that transiently assemble with transcription factors and chromatin modifiers to regulate gene expression. In this study we analyzed data from 3,290 immuno-precipitations (IP) followed by mass spectrometry (MS) applied to human cell lines aimed at identifying CoRegs complexes. Using the semi-quantitative spectral counts, we scored binary protein-protein and domain-domain associations with several equations. Unlike previous applications, our methods scored prey-prey protein-protein interactions regardless of the baits used. We also predicted domain-domain interactions underlying predicted protein-protein interactions. The quality of predicted protein-protein and domain-domain interactions was evaluated using known binary interactions from the literature, whereas one protein-protein interaction, between STRN and CTTNBP2NL, was validated experimentally; and one domain-domain interaction, between the HEAT domain of PPP2R1A and the Pkinase domain of STK25, was validated using molecular docking simulations. The scoring schemes presented here recovered known, and predicted many new, complexes, protein-protein, and domain-domain interactions. The networks that resulted from the predictions are provided as a web-based interactive application at http://maayanlab.net/HT-IP-MS-2-PPI-DDI/. PMID:22219718
The Nature of Double-peaked [O III] Active Galactic Nuclei
NASA Astrophysics Data System (ADS)
Fu, Hai; Yan, Lin; Myers, Adam D.; Stockton, Alan; Djorgovski, S. G.; Aldering, G.; Rich, Jeffrey A.
2012-01-01
Active galactic nuclei (AGNs) with double-peaked [O III] lines are suspected to be sub-kpc or kpc-scale binary AGNs. However, pure gas kinematics can produce the same double-peaked line profile in spatially integrated spectra. Here we combine integral-field spectroscopy and high-resolution imaging of 42 double-peaked [O III] AGNs from the Sloan Digital Sky Survey to investigate the constituents of the population. We find two binary AGNs where the line splitting is driven by the orbital motion of the merging nuclei. Such objects account for only ~2% of the double-peaked AGNs. Almost all (~98%) of the double-peaked AGNs were selected because of gas kinematics; and half of those show spatially resolved narrow-line regions that extend 4-20 kpc from the nuclei. Serendipitously, we find two spectrally unresolved binary AGNs where gas kinematics produced the double-peaked [O III] lines. The relatively frequent serendipitous discoveries indicate that only ~1% of binary AGNs would appear double-peaked in Sloan spectra and 2.2+2.5 -0.8% of all Sloan AGNs are binary AGNs. Therefore, the double-peaked sample does not offer much advantage over any other AGN samples in finding binary AGNs. The binary AGN fraction implies an elevated AGN duty cycle (8+8 -3%), suggesting galaxy interactions enhance nuclear accretion. We illustrate that integral-field spectroscopy is crucial for identifying binary AGNs: several objects previously classified as "binary AGNs" with long-slit spectra are most likely single AGNs with extended narrow-line regions (ENLRs). The formation of ENLRs driven by radiation pressure is also discussed. Some of the data presented herein were obtained at the W.M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California, and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W. M. Keck Foundation.
CD44 Promotes intoxication by the clostridial iota-family toxins.
Wigelsworth, Darran J; Ruthel, Gordon; Schnell, Leonie; Herrlich, Peter; Blonder, Josip; Veenstra, Timothy D; Carman, Robert J; Wilkins, Tracy D; Van Nhieu, Guy Tran; Pauillac, Serge; Gibert, Maryse; Sauvonnet, Nathalie; Stiles, Bradley G; Popoff, Michel R; Barth, Holger
2012-01-01
Various pathogenic clostridia produce binary protein toxins associated with enteric diseases of humans and animals. Separate binding/translocation (B) components bind to a protein receptor on the cell surface, assemble with enzymatic (A) component(s), and mediate endocytosis of the toxin complex. Ultimately there is translocation of A component(s) from acidified endosomes into the cytosol, leading to destruction of the actin cytoskeleton. Our results revealed that CD44, a multifunctional surface protein of mammalian cells, facilitates intoxication by the iota family of clostridial binary toxins. Specific antibody against CD44 inhibited cytotoxicity of the prototypical Clostridium perfringens iota toxin. Versus CD44(+) melanoma cells, those lacking CD44 bound less toxin and were dose-dependently resistant to C. perfringens iota, as well as Clostridium difficile and Clostridium spiroforme iota-like, toxins. Purified CD44 specifically interacted in vitro with iota and iota-like, but not related Clostridium botulinum C2, toxins. Furthermore, CD44 knockout mice were resistant to iota toxin lethality. Collective data reveal an important role for CD44 during intoxication by a family of clostridial binary toxins.
CD44 Promotes Intoxication by the Clostridial Iota-Family Toxins
Wigelsworth, Darran J.; Ruthel, Gordon; Schnell, Leonie; Herrlich, Peter; Blonder, Josip; Veenstra, Timothy D.; Carman, Robert J.; Wilkins, Tracy D.; Van Nhieu, Guy Tran; Pauillac, Serge; Gibert, Maryse; Sauvonnet, Nathalie; Stiles, Bradley G.; Popoff, Michel R.; Barth, Holger
2012-01-01
Various pathogenic clostridia produce binary protein toxins associated with enteric diseases of humans and animals. Separate binding/translocation (B) components bind to a protein receptor on the cell surface, assemble with enzymatic (A) component(s), and mediate endocytosis of the toxin complex. Ultimately there is translocation of A component(s) from acidified endosomes into the cytosol, leading to destruction of the actin cytoskeleton. Our results revealed that CD44, a multifunctional surface protein of mammalian cells, facilitates intoxication by the iota family of clostridial binary toxins. Specific antibody against CD44 inhibited cytotoxicity of the prototypical Clostridium perfringens iota toxin. Versus CD44+ melanoma cells, those lacking CD44 bound less toxin and were dose-dependently resistant to C. perfringens iota, as well as Clostridium difficile and Clostridium spiroforme iota-like, toxins. Purified CD44 specifically interacted in vitro with iota and iota-like, but not related Clostridium botulinum C2, toxins. Furthermore, CD44 knockout mice were resistant to iota toxin lethality. Collective data reveal an important role for CD44 during intoxication by a family of clostridial binary toxins. PMID:23236484
Exploiting three kinds of interface propensities to identify protein binding sites.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2009-08-01
Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.
Dynamical genetic programming in XCSF.
Preen, Richard J; Bull, Larry
2013-01-01
A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to artificial neural networks. This paper presents results from an investigation into using a temporally dynamic symbolic representation within the XCSF learning classifier system. In particular, dynamical arithmetic networks are used to represent the traditional condition-action production system rules to solve continuous-valued reinforcement learning problems and to perform symbolic regression, finding competitive performance with traditional genetic programming on a number of composite polynomial tasks. In addition, the network outputs are later repeatedly sampled at varying temporal intervals to perform multistep-ahead predictions of a financial time series.
Millennial Filipino Student Engagement Analyzer Using Facial Feature Classification
NASA Astrophysics Data System (ADS)
Manseras, R.; Eugenio, F.; Palaoag, T.
2018-03-01
Millennials has been a word of mouth of everybody and a target market of various companies nowadays. In the Philippines, they comprise one third of the total population and most of them are still in school. Having a good education system is important for this generation to prepare them for better careers. And a good education system means having quality instruction as one of the input component indicators. In a classroom environment, teachers use facial features to measure the affect state of the class. Emerging technologies like Affective Computing is one of today’s trends to improve quality instruction delivery. This, together with computer vision, can be used in analyzing affect states of the students and improve quality instruction delivery. This paper proposed a system of classifying student engagement using facial features. Identifying affect state, specifically Millennial Filipino student engagement, is one of the main priorities of every educator and this directed the authors to develop a tool to assess engagement percentage. Multiple face detection framework using Face API was employed to detect as many student faces as possible to gauge current engagement percentage of the whole class. The binary classifier model using Support Vector Machine (SVM) was primarily set in the conceptual framework of this study. To achieve the most accuracy performance of this model, a comparison of SVM to two of the most widely used binary classifiers were tested. Results show that SVM bested RandomForest and Naive Bayesian algorithms in most of the experiments from the different test datasets.
Local binary pattern texture-based classification of solid masses in ultrasound breast images
NASA Astrophysics Data System (ADS)
Matsumoto, Monica M. S.; Sehgal, Chandra M.; Udupa, Jayaram K.
2012-03-01
Breast cancer is one of the leading causes of cancer mortality among women. Ultrasound examination can be used to assess breast masses, complementarily to mammography. Ultrasound images reveal tissue information in its echoic patterns. Therefore, pattern recognition techniques can facilitate classification of lesions and thereby reduce the number of unnecessary biopsies. Our hypothesis was that image texture features on the boundary of a lesion and its vicinity can be used to classify masses. We have used intensity-independent and rotation-invariant texture features, known as Local Binary Patterns (LBP). The classifier selected was K-nearest neighbors. Our breast ultrasound image database consisted of 100 patient images (50 benign and 50 malignant cases). The determination of whether the mass was benign or malignant was done through biopsy and pathology assessment. The training set consisted of sixty images, randomly chosen from the database of 100 patients. The testing set consisted of forty images to be classified. The results with a multi-fold cross validation of 100 iterations produced a robust evaluation. The highest performance was observed for feature LBP with 24 symmetrically distributed neighbors over a circle of radius 3 (LBP24,3) with an accuracy rate of 81.0%. We also investigated an approach with a score of malignancy assigned to the images in the test set. This approach provided an ROC curve with Az of 0.803. The analysis of texture features over the boundary of solid masses showed promise for malignancy classification in ultrasound breast images.
Pixel-based skin segmentation in psoriasis images.
George, Y; Aldeen, M; Garnavi, R
2016-08-01
In this paper, we present a detailed comparison study of skin segmentation methods for psoriasis images. Different techniques are modified and then applied to a set of psoriasis images acquired from the Royal Melbourne Hospital, Melbourne, Australia, with aim of finding the best technique suited for application to psoriasis images. We investigate the effect of different colour transformations on skin detection performance. In this respect, explicit skin thresholding is evaluated with three different decision boundaries (CbCr, HS and rgHSV). Histogram-based Bayesian classifier is applied to extract skin probability maps (SPMs) for different colour channels. This is then followed by using different approaches to find a binary skin map (SM) image from the SPMs. The approaches used include binary decision tree (DT) and Otsu's thresholding. Finally, a set of morphological operations are implemented to refine the resulted SM image. The paper provides detailed analysis and comparison of the performance of the Bayesian classifier in five different colour spaces (YCbCr, HSV, RGB, XYZ and CIELab). The results show that histogram-based Bayesian classifier is more effective than explicit thresholding, when applied to psoriasis images. It is also found that decision boundary CbCr outperforms HS and rgHSV. Another finding is that the SPMs of Cb, Cr, H and B-CIELab colour bands yield the best SMs for psoriasis images. In this study, we used a set of 100 psoriasis images for training and testing the presented methods. True Positive (TP) and True Negative (TN) are used as statistical evaluation measures.
Improvements on ν-Twin Support Vector Machine.
Khemchandani, Reshma; Saigal, Pooja; Chandra, Suresh
2016-07-01
In this paper, we propose two novel binary classifiers termed as "Improvements on ν-Twin Support Vector Machine: Iν-TWSVM and Iν-TWSVM (Fast)" that are motivated by ν-Twin Support Vector Machine (ν-TWSVM). Similar to ν-TWSVM, Iν-TWSVM determines two nonparallel hyperplanes such that they are closer to their respective classes and are at least ρ distance away from the other class. The significant advantage of Iν-TWSVM over ν-TWSVM is that Iν-TWSVM solves one smaller-sized Quadratic Programming Problem (QPP) and one Unconstrained Minimization Problem (UMP); as compared to solving two related QPPs in ν-TWSVM. Further, Iν-TWSVM (Fast) avoids solving a smaller sized QPP and transforms it as a unimodal function, which can be solved using line search methods and similar to Iν-TWSVM, the other problem is solved as a UMP. Due to their novel formulation, the proposed classifiers are faster than ν-TWSVM and have comparable generalization ability. Iν-TWSVM also implements structural risk minimization (SRM) principle by introducing a regularization term, along with minimizing the empirical risk. The other properties of Iν-TWSVM, related to support vectors (SVs), are similar to that of ν-TWSVM. To test the efficacy of the proposed method, experiments have been conducted on a wide range of UCI and a skewed variation of NDC datasets. We have also given the application of Iν-TWSVM as a binary classifier for pixel classification of color images. Copyright © 2016 Elsevier Ltd. All rights reserved.
Thiéry, I.; Hamon, S.; Delécluse, A.; Orduz, S.
1998-01-01
The fragment containing the gene encoding the cytolytic Cyt1Ab1 protein from Bacillus thuringiensis subsp. medellin and its flanking sequences (I. Thiery, A. Delécluse, M. C. Tamayo, and S. Orduz, Appl. Environ. Microbiol. 63:468–473, 1997) was introduced into Bacillus sphaericus toxic strains 2362, 2297, and Iab872 by electroporation with the shuttle vector pMK3. Only small amounts of the protein were produced in recombinant strains 2362 and Iab872. The protein was detected in these strains only by Western blotting and immunodetection with antibody raised against Cyt1Ab1 protein. Large amounts of Cyt1Ab1 protein were produced in B. sphaericus recombinant strain 2297, and there was an additional crystal, other than that of the binary toxin, within the exosporium. The production of the Cyt1Ab1 protein in addition to the binary toxin did not increase the larvicidal activity of the B. sphaericus recombinant strain against susceptible mosquito populations of Culex pipiens or Aedes aegypti. However, it partially restored (10 to 20 times) susceptibility of the resistant mosquito populations of C. pipiens (SPHAE) and Culex quinquefasciatus (GeoR) to the binary toxin. The Cyt1Ab1 protein produced in recombinant B. thuringiensis SPL407(pcyt1Ab1) was synthesized in two types of crystal—one round and with various dense areas, surrounded by an envelope, and the other a regular cuboid crystal, very similar to that found in the B. sphaericus recombinant strain. PMID:9758818
Multicategory Composite Least Squares Classifiers
Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul
2010-01-01
Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128
Reliable binary cell-fate decisions based on oscillations
NASA Astrophysics Data System (ADS)
Pfeuty, B.; Kaneko, K.
2014-02-01
Biological systems have often to perform binary decisions under highly dynamic and noisy environments, such as during cell-fate determination. These decisions can be implemented by two main bifurcation mechanisms based on the transitions from either monostability or oscillation to bistability. We compare these two mechanisms by using stochastic models with time-varying fields and by establishing asymptotic formulas for the choice probabilities. Different scaling laws for decision sensitivity with respect to noise strength and signal timescale are obtained, supporting a role for oscillatory dynamics in performing noise-robust and temporally tunable binary decision-making. This result provides a rationale for recent experimental evidences showing that oscillatory expression of proteins often precedes binary cell-fate decisions.
IGR J19294+1816: a new Be-X-ray binary revealed through infrared spectroscopy
NASA Astrophysics Data System (ADS)
Rodes-Roca, J. J.; Bernabeu, G.; Magazzù, A.; Torrejón, J. M.; Solano, E.
2018-05-01
The aim of this work is to characterize the counterpart to the INTErnational Gamma-Ray Astrophysics Laboratory high-mass X-ray binary candidate IGR J19294+1816 so as to establish its true nature. We obtained H-band spectra of the selected counterpart acquired with the Near Infrared Camera and Spectrograph instrument mounted on the Telescopio Nazionale Galileo 3.5-m telescope which represents the first infrared spectrum ever taken of this source. We complement the spectral analysis with infrared photometry from UKIDSS, 2MASS, WISE, and NEOWISE data bases. We classify the mass donor as a Be star. Subsequently, we compute its distance by properly taking into account the contamination produced by the circumstellar envelope. The findings indicate that IGR J19294+1816 is a transient source with a B1Ve donor at a distance of d = 11 ± 1 kpc, and luminosities of the order of 1036-37 erg s-1, displaying the typical behaviour of a Be-X-ray binary.
Ansara, Y Gavriel
2015-10-01
Recent Australian legislative and policy changes can benefit people of trans and/or non-binary experience (e.g. men assigned female with stereotypically 'female' bodies, women assigned male with stereotypically 'male' bodies, and people who identify as genderqueer, agender [having no gender], bi-gender [having two genders] or another gender option). These populations often experience cisgenderism, which previous research defined as 'the ideology that invalidates people's own understanding of their genders and bodies'. Some documented forms of cisgenderism include pathologising (treating people's genders and bodies as disordered) and misgendering (disregarding people's own understanding and classifications of their genders and bodies). This system of classifying people's lived experiences of gender and body invalidation is called the cisgenderism framework. Applying the cisgenderism framework in the ageing and aged care sector can enhance service providers' ability to meet the needs of older people of trans and/or non-binary experience. © 2015 AJA Inc.
Kavianpour, Hamidreza; Vasighi, Mahdi
2017-02-01
Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.
Area under precision-recall curves for weighted and unweighted data.
Keilwagen, Jens; Grosse, Ivo; Grau, Jan
2014-01-01
Precision-recall curves are highly informative about the performance of binary classifiers, and the area under these curves is a popular scalar performance measure for comparing different classifiers. However, for many applications class labels are not provided with absolute certainty, but with some degree of confidence, often reflected by weights or soft labels assigned to data points. Computing the area under the precision-recall curve requires interpolating between adjacent supporting points, but previous interpolation schemes are not directly applicable to weighted data. Hence, even in cases where weights were available, they had to be neglected for assessing classifiers using precision-recall curves. Here, we propose an interpolation for precision-recall curves that can also be used for weighted data, and we derive conditions for classification scores yielding the maximum and minimum area under the precision-recall curve. We investigate accordances and differences of the proposed interpolation and previous ones, and we demonstrate that taking into account existing weights of test data is important for the comparison of classifiers.
Area under Precision-Recall Curves for Weighted and Unweighted Data
Grosse, Ivo
2014-01-01
Precision-recall curves are highly informative about the performance of binary classifiers, and the area under these curves is a popular scalar performance measure for comparing different classifiers. However, for many applications class labels are not provided with absolute certainty, but with some degree of confidence, often reflected by weights or soft labels assigned to data points. Computing the area under the precision-recall curve requires interpolating between adjacent supporting points, but previous interpolation schemes are not directly applicable to weighted data. Hence, even in cases where weights were available, they had to be neglected for assessing classifiers using precision-recall curves. Here, we propose an interpolation for precision-recall curves that can also be used for weighted data, and we derive conditions for classification scores yielding the maximum and minimum area under the precision-recall curve. We investigate accordances and differences of the proposed interpolation and previous ones, and we demonstrate that taking into account existing weights of test data is important for the comparison of classifiers. PMID:24651729
VizieR Online Data Catalog: SLoWPoKES-II catalog (Dhital+, 2015)
NASA Astrophysics Data System (ADS)
Dhital, S.; West, A. A.; Stassun, K. G.; Schluns, K. J.; Massey, A. P.
2015-11-01
We have identified the Sloan Low-mass Wide Pairs of Kinematically Equivalent Systems (SLoWPoKES)-II catalog of 105537 wide, low-mass binaries without using proper motions. We extend the SLoWPoKES catalog (Paper I; Dhital et al. 2010, cat. J/AJ/139/2566) by identifying binary systems with angular separations of 1-20'' based entirely on SDSS photometry and astrometry. As in Paper I, we used the Catalog Archive Server query tool (CasJobs6; http://skyserver.sdss3.org/CasJobs/) to select the sample of low-mass stars from the SDSS-DR8 star table as having r-i>=0.3 and i-z>=0.2, consistent with spectral types of K5 or later. Following Paper I (Dhital et al. 2010, cat. J/AJ/139/2566) we classified candidate pairs with a probability of chance alignment Pf{<=}0.05 as real binaries. We note that this limit does not have any physical motivation but was chosen to minimize the number of spurious pairs. This cut results in 105537 M dwarf (dM)+MS (see Table3), 78 white dwarf (WD)+dM (see Table5), and 184 sdM+sdM (see Table6) binary systems with separations of 1-20''. Of the dM+MS binaries, 44 are very low-mass (VLM) binary candidates (see Table4), with colors redder than the median M7 dwarf for both components. This represents a significant increase over the SLoWPoKES catalog of 1342 common proper motion (CPM) binaries that we presented in Paper I (Dhital et al. 2010, cat. J/AJ/139/2566). The SLoWPoKES and SLoWPoKES-II catalogs are available on the Filtergraph portal (http://slowpokes.vanderbilt.edu/). (4 data files).
Rekaya, Romdhane; Smith, Shannon; Hay, El Hamidi; Farhat, Nourhene; Aggrey, Samuel E
2016-01-01
Errors in the binary status of some response traits are frequent in human, animal, and plant applications. These error rates tend to differ between cases and controls because diagnostic and screening tests have different sensitivity and specificity. This increases the inaccuracies of classifying individuals into correct groups, giving rise to both false-positive and false-negative cases. The analysis of these noisy binary responses due to misclassification will undoubtedly reduce the statistical power of genome-wide association studies (GWAS). A threshold model that accommodates varying diagnostic errors between cases and controls was investigated. A simulation study was carried out where several binary data sets (case-control) were generated with varying effects for the most influential single nucleotide polymorphisms (SNPs) and different diagnostic error rate for cases and controls. Each simulated data set consisted of 2000 individuals. Ignoring misclassification resulted in biased estimates of true influential SNP effects and inflated estimates for true noninfluential markers. A substantial reduction in bias and increase in accuracy ranging from 12% to 32% was observed when the misclassification procedure was invoked. In fact, the majority of influential SNPs that were not identified using the noisy data were captured using the proposed method. Additionally, truly misclassified binary records were identified with high probability using the proposed method. The superiority of the proposed method was maintained across different simulation parameters (misclassification rates and odds ratios) attesting to its robustness.
Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa
2017-03-01
Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Danandeh Mehr, Ali; Nourani, Vahid; Hrnjica, Bahrudin; Molajou, Amir
2017-12-01
The effectiveness of genetic programming (GP) for solving regression problems in hydrology has been recognized in recent studies. However, its capability to solve classification problems has not been sufficiently explored so far. This study develops and applies a novel classification-forecasting model, namely Binary GP (BGP), for teleconnection studies between sea surface temperature (SST) variations and maximum monthly rainfall (MMR) events. The BGP integrates certain types of data pre-processing and post-processing methods with conventional GP engine to enhance its ability to solve both regression and classification problems simultaneously. The model was trained and tested using SST series of Black Sea, Mediterranean Sea, and Red Sea as potential predictors as well as classified MMR events at two locations in Iran as predictand. Skill of the model was measured in regard to different rainfall thresholds and SST lags and compared to that of the hybrid decision tree-association rule (DTAR) model available in the literature. The results indicated that the proposed model can identify potential teleconnection signals of surrounding seas beneficial to long-term forecasting of the occurrence of the classified MMR events.
Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin
2013-01-01
DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
A Factor Graph Approach to Automated GO Annotation
Spetale, Flavio E.; Tapia, Elizabeth; Krsticevic, Flavia; Roda, Fernando; Bulacio, Pilar
2016-01-01
As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum. PMID:26771463
A Factor Graph Approach to Automated GO Annotation.
Spetale, Flavio E; Tapia, Elizabeth; Krsticevic, Flavia; Roda, Fernando; Bulacio, Pilar
2016-01-01
As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.
Bulashevska, Alla; Eils, Roland
2006-06-14
The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.
Classifying Black Hole States with Machine Learning
NASA Astrophysics Data System (ADS)
Huppenkothen, Daniela
2018-01-01
Galactic black hole binaries are known to go through different states with apparent signatures in both X-ray light curves and spectra, leading to important implications for accretion physics as well as our knowledge of General Relativity. Existing frameworks of classification are usually based on human interpretation of low-dimensional representations of the data, and generally only apply to fairly small data sets. Machine learning, in contrast, allows for rapid classification of large, high-dimensional data sets. In this talk, I will report on advances made in classification of states observed in Black Hole X-ray Binaries, focusing on the two sources GRS 1915+105 and Cygnus X-1, and show both the successes and limitations of using machine learning to derive physical constraints on these systems.
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.
Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao
2015-01-01
For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as "bind" or "interact" plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.
de Moraes, Fábio R; Neshich, Izabella A P; Mazoni, Ivan; Yano, Inácio H; Pereira, José G C; Salim, José A; Jardine, José G; Neshich, Goran
2014-01-01
Protein-protein interactions are involved in nearly all regulatory processes in the cell and are considered one of the most important issues in molecular biology and pharmaceutical sciences but are still not fully understood. Structural and computational biology contributed greatly to the elucidation of the mechanism of protein interactions. In this paper, we present a collection of the physicochemical and structural characteristics that distinguish interface-forming residues (IFR) from free surface residues (FSR). We formulated a linear discriminative analysis (LDA) classifier to assess whether chosen descriptors from the BlueStar STING database (http://www.cbi.cnptia.embrapa.br/SMS/) are suitable for such a task. Receiver operating characteristic (ROC) analysis indicates that the particular physicochemical and structural descriptors used for building the linear classifier perform much better than a random classifier and in fact, successfully outperform some of the previously published procedures, whose performance indicators were recently compared by other research groups. The results presented here show that the selected set of descriptors can be utilized to predict IFRs, even when homologue proteins are missing (particularly important for orphan proteins where no homologue is available for comparative analysis/indication) or, when certain conformational changes accompany interface formation. The development of amino acid type specific classifiers is shown to increase IFR classification performance. Also, we found that the addition of an amino acid conservation attribute did not improve the classification prediction. This result indicates that the increase in predictive power associated with amino acid conservation is exhausted by adequate use of an extensive list of independent physicochemical and structural parameters that, by themselves, fully describe the nano-environment at protein-protein interfaces. The IFR classifier developed in this study is now integrated into the BlueStar STING suite of programs. Consequently, the prediction of protein-protein interfaces for all proteins available in the PDB is possible through STING_interfaces module, accessible at the following website: (http://www.cbi.cnptia.embrapa.br/SMS/predictions/index.html).
de Moraes, Fábio R.; Neshich, Izabella A. P.; Mazoni, Ivan; Yano, Inácio H.; Pereira, José G. C.; Salim, José A.; Jardine, José G.; Neshich, Goran
2014-01-01
Protein-protein interactions are involved in nearly all regulatory processes in the cell and are considered one of the most important issues in molecular biology and pharmaceutical sciences but are still not fully understood. Structural and computational biology contributed greatly to the elucidation of the mechanism of protein interactions. In this paper, we present a collection of the physicochemical and structural characteristics that distinguish interface-forming residues (IFR) from free surface residues (FSR). We formulated a linear discriminative analysis (LDA) classifier to assess whether chosen descriptors from the BlueStar STING database (http://www.cbi.cnptia.embrapa.br/SMS/) are suitable for such a task. Receiver operating characteristic (ROC) analysis indicates that the particular physicochemical and structural descriptors used for building the linear classifier perform much better than a random classifier and in fact, successfully outperform some of the previously published procedures, whose performance indicators were recently compared by other research groups. The results presented here show that the selected set of descriptors can be utilized to predict IFRs, even when homologue proteins are missing (particularly important for orphan proteins where no homologue is available for comparative analysis/indication) or, when certain conformational changes accompany interface formation. The development of amino acid type specific classifiers is shown to increase IFR classification performance. Also, we found that the addition of an amino acid conservation attribute did not improve the classification prediction. This result indicates that the increase in predictive power associated with amino acid conservation is exhausted by adequate use of an extensive list of independent physicochemical and structural parameters that, by themselves, fully describe the nano-environment at protein-protein interfaces. The IFR classifier developed in this study is now integrated into the BlueStar STING suite of programs. Consequently, the prediction of protein-protein interfaces for all proteins available in the PDB is possible through STING_interfaces module, accessible at the following website: (http://www.cbi.cnptia.embrapa.br/SMS/predictions/index.html). PMID:24489849
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
2014-01-01
Motivation Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. Results We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set. PMID:24646119
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework.
Simha, Ramanuja; Shatkay, Hagit
2014-03-19
Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set.
Stiles, Bradley G
2017-01-01
Clostridium species can make a remarkable number of different protein toxins, causing many diverse diseases in humans and animals. The binary toxins of Clostridium botulinum, C. difficile, C. perfringens, and C. spiroforme are one group of enteric-acting toxins that attack the actin cytoskeleton of various cell types. These enterotoxins consist of A (enzymatic) and B (cell binding/membrane translocation) components that assemble on the targeted cell surface or in solution, forming a multimeric complex. Once translocated into the cytosol via endosomal trafficking and acidification, the A component dismantles the filamentous actin-based cytoskeleton via mono-ADP-ribosylation of globular actin. Knowledge of cell surface receptors and how these usurped, host-derived molecules facilitate intoxication can lead to novel ways of defending against these clostridial binary toxins. A molecular-based understanding of the various steps involved in toxin internalization can also unveil therapeutic intervention points that stop the intoxication process. Furthermore, using these bacterial proteins as medicinal shuttle systems into cells provides intriguing possibilities in the future. The pertinent past and state-of-the-art present, regarding clostridial binary toxins, will be evident in this chapter.
A blood-based proteomic classifier for the molecular characterization of pulmonary nodules.
Li, Xiao-jun; Hayward, Clive; Fong, Pui-Yee; Dominguez, Michel; Hunsucker, Stephen W; Lee, Lik Wee; McLean, Matthew; Law, Scott; Butler, Heather; Schirm, Michael; Gingras, Olivier; Lamontagne, Julie; Allard, Rene; Chelsky, Daniel; Price, Nathan D; Lam, Stephen; Massion, Pierre P; Pass, Harvey; Rom, William N; Vachani, Anil; Fang, Kenneth C; Hood, Leroy; Kearney, Paul
2013-10-16
Each year, millions of pulmonary nodules are discovered by computed tomography and subsequently biopsied. Because most of these nodules are benign, many patients undergo unnecessary and costly invasive procedures. We present a 13-protein blood-based classifier that differentiates malignant and benign nodules with high confidence, thereby providing a diagnostic tool to avoid invasive biopsy on benign nodules. Using a systems biology strategy, we identified 371 protein candidates and developed a multiple reaction monitoring (MRM) assay for each. The MRM assays were applied in a three-site discovery study (n = 143) on plasma samples from patients with benign and stage IA lung cancer matched for nodule size, age, gender, and clinical site, producing a 13-protein classifier. The classifier was validated on an independent set of plasma samples (n = 104), exhibiting a negative predictive value (NPV) of 90%. Validation performance on samples from a nondiscovery clinical site showed an NPV of 94%, indicating the general effectiveness of the classifier. A pathway analysis demonstrated that the classifier proteins are likely modulated by a few transcription regulators (NF2L2, AHR, MYC, and FOS) that are associated with lung cancer, lung inflammation, and oxidative stress networks. The classifier score was independent of patient nodule size, smoking history, and age, which are risk factors used for clinical management of pulmonary nodules. Thus, this molecular test provides a potential complementary tool to help physicians in lung cancer diagnosis.
Stynen, Bram; Tournu, Hélène; Tavernier, Jan
2012-01-01
Summary: The yeast two-hybrid system pioneered the field of in vivo protein-protein interaction methods and undisputedly gave rise to a palette of ingenious techniques that are constantly pushing further the limits of the original method. Sensitivity and selectivity have improved because of various technical tricks and experimental designs. Here we present an exhaustive overview of the genetic approaches available to study in vivo binary protein interactions, based on two-hybrid and protein fragment complementation assays. These methods have been engineered and employed successfully in microorganisms such as Saccharomyces cerevisiae and Escherichia coli, but also in higher eukaryotes. From single binary pairwise interactions to whole-genome interactome mapping, the self-reassembly concept has been employed widely. Innovative studies report the use of proteins such as ubiquitin, dihydrofolate reductase, and adenylate cyclase as reconstituted reporters. Protein fragment complementation assays have extended the possibilities in protein-protein interaction studies, with technologies that enable spatial and temporal analyses of protein complexes. In addition, one-hybrid and three-hybrid systems have broadened the types of interactions that can be studied and the findings that can be obtained. Applications of these technologies are discussed, together with the advantages and limitations of the available assays. PMID:22688816
Identification and characterization of neutrophil extracellular trap shapes in flow cytometry
NASA Astrophysics Data System (ADS)
Ginley, Brandon; Emmons, Tiffany; Sasankan, Prabhu; Urban, Constantin; Segal, Brahm H.; Sarder, Pinaki
2017-03-01
Neutrophil extracellular trap (NET) formation is an alternate immunologic weapon used mainly by neutrophils. Chromatin backbones fused with proteins derived from granules are shot like projectiles onto foreign invaders. It is thought that this mechanism is highly anti-microbial, aids in preventing bacterial dissemination, is used to break down structures several sizes larger than neutrophils themselves, and may have several more uses yet unknown. NETs have been implied to be involved in a wide array of systemic host immune defenses, including sepsis, autoimmune diseases, and cancer. Existing methods used to visually quantify NETotic versus non-NETotic shapes are extremely time-consuming and subject to user bias. These limitations are obstacles to developing NETs as prognostic biomarkers and therapeutic targets. We propose an automated pipeline for quantitatively detecting neutrophil and NET shapes captured using a flow cytometry-imaging system. Our method uses contrast limited adaptive histogram equalization to improve signal intensity in dimly illuminated NETs. From the contrast improved image, fixed value thresholding is applied to convert the image to binary. Feature extraction is performed on the resulting binary image, by calculating region properties of the resulting foreground structures. Classification of the resulting features is performed using Support Vector Machine. Our method classifies NETs from neutrophils without traps at 0.97/0.96 sensitivity/specificity on n = 387 images, and is 1500X faster than manual classification, per sample. Our method can be extended to rapidly analyze whole-slide immunofluorescence tissue images for NET classification, and has potential to streamline the quantification of NETs for patients with diseases associated with cancer and autoimmunity.
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.
Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi
2014-01-01
In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Rough Set Based Splitting Criterion for Binary Decision Tree Classifiers
2006-09-26
Alata O. Fernandez-Maloigne C., and Ferrie J.C. (2001). Unsupervised Algorithm for the Segmentation of Three-Dimensional Magnetic Resonance Brain ...instinctual and learned responses in the brain , causing it to make decisions based on patterns in the stimuli. Using this deceptively simple process...2001. [2] Bohn C. (1997). An Incremental Unsupervised Learning Scheme for Function Approximation. In: Proceedings of the 1997 IEEE International
O'Brien, J R; Murphy, J M
1993-01-01
Pink pigmented bacteria were isolated from a blood bank water purification unit, a municipal town water supply (tap water), and an island (untreated) ground water source. A total of thirteen strains including two reference strains of pink pigmented bacteria were compared in a numerical phenotypic study using 119 binary characters. Three clusters were derived, one major cluster of eleven strains was subdivided into two sub-clusters on the basis of methanol utilization. Five strains were facultative methylotrophs and were classified as Methylobacterium mesophilicum biovar 1. The other six strains did not utilize methanol, but on the basis of high phenotypic similarity of 83.6% were classified as M. mesophilicum biovar 2. The single reference strain comprising cluster 2 Pseudomonas extorquens NCIB 9399 was assigned to the genus Methylobacterium and classified as M. extorquens. Cluster 3 was the single reference strain Rhizobium CB 376.
NASA Astrophysics Data System (ADS)
Bagchi, Biman; Roy, Susmita; Ghosh, Rikhia
2014-03-01
Aqueous binary mixtures such as water-DMSO, water-urea, and water-ethanol are known to serve as denaturants of a host of proteins, although the detailed mechanism is often not known. Here we combine studies on several proteins in multiple binary mixtures to obtain a unified understanding of the phenomenon. We compare with experiments to support the simulation findings. The proteins considered include (i) chicken villin head piece (HP-36), (ii) immunoglobulin binding protein G (GB1), (iii) myoglobin and (iv) lysozyme. We find that for amphiphilic solvents like DMSO, the hydrophobic groups and the strong hydrogen bonding ability of the >S =O oxygen atom act together to facilitate the unfolding. However, the hydrophilic solvents like urea, due to the presence of more hydrophilic ends (C =O and two NH2) has a high propensity of forming hydrogen bonds with the side-chain residues and backbone of beta-sheet than the same of alpha helix. Such diversity among the unfolding pathways of a given protein in different chemical environments is especially characterized by the preferential solvation of a particular secondary structure.
Homology search with binary and trinary scoring matrices.
Smith, Scott F
2006-01-01
Protein homology search can be accelerated with the use of bit-parallel algorithms in conjunction with constraints on the values contained in the scoring matrices. Trinary scoring matrices (containing only the values -1, 0, and 1) allow for significant acceleration without significant reduction in the receiver operating characteristic (ROC) score of a Smith-Waterman search. Binary scoring matrices (containing the values 0 and 1) result in some reduction in ROC score, but result in even more acceleration. Binary scoring matrices and five-bit saturating scores can be used for fast prefilters to the Smith-Waterman algorithm.
Binary Oscillatory Crossflow Electrophoresis
NASA Technical Reports Server (NTRS)
Molloy, Richard F.; Gallagher, Christopher T.; Leighton, David T., Jr.
1996-01-01
We present preliminary results of our implementation of a novel electrophoresis separation technique: Binary Oscillatory Cross flow Electrophoresis (BOCE). The technique utilizes the interaction of two driving forces, an oscillatory electric field and an oscillatory shear flow, to create an active binary filter for the separation of charged species. Analytical and numerical studies have indicated that this technique is capable of separating proteins with electrophoretic mobilities differing by less than 10%. With an experimental device containing a separation chamber 20 cm long, 5 cm wide, and 1 mm thick, an order of magnitude increase in throughput over commercially available electrophoresis devices is theoretically possible.
Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-01-01
Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205
Bayesian Redshift Classification of Emission-line Galaxies with Photometric Equivalent Widths
NASA Astrophysics Data System (ADS)
Leung, Andrew S.; Acquaviva, Viviana; Gawiser, Eric; Ciardullo, Robin; Komatsu, Eiichiro; Malz, A. I.; Zeimann, Gregory R.; Bridge, Joanna S.; Drory, Niv; Feldmeier, John J.; Finkelstein, Steven L.; Gebhardt, Karl; Gronwall, Caryl; Hagen, Alex; Hill, Gary J.; Schneider, Donald P.
2017-07-01
We present a Bayesian approach to the redshift classification of emission-line galaxies when only a single emission line is detected spectroscopically. We consider the case of surveys for high-redshift Lyα-emitting galaxies (LAEs), which have traditionally been classified via an inferred rest-frame equivalent width (EW {W}{Lyα }) greater than 20 Å. Our Bayesian method relies on known prior probabilities in measured emission-line luminosity functions and EW distributions for the galaxy populations, and returns the probability that an object in question is an LAE given the characteristics observed. This approach will be directly relevant for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), which seeks to classify ˜106 emission-line galaxies into LAEs and low-redshift [{{O}} {{II}}] emitters. For a simulated HETDEX catalog with realistic measurement noise, our Bayesian method recovers 86% of LAEs missed by the traditional {W}{Lyα } > 20 Å cutoff over 2 < z < 3, outperforming the EW cut in both contamination and incompleteness. This is due to the method’s ability to trade off between the two types of binary classification error by adjusting the stringency of the probability requirement for classifying an observed object as an LAE. In our simulations of HETDEX, this method reduces the uncertainty in cosmological distance measurements by 14% with respect to the EW cut, equivalent to recovering 29% more cosmological information. Rather than using binary object labels, this method enables the use of classification probabilities in large-scale structure analyses. It can be applied to narrowband emission-line surveys as well as upcoming large spectroscopic surveys including Euclid and WFIRST.
Classification of brain tumours using short echo time 1H MR spectra
NASA Astrophysics Data System (ADS)
Devos, A.; Lukas, L.; Suykens, J. A. K.; Vanhamme, L.; Tate, A. R.; Howe, F. A.; Majós, C.; Moreno-Torres, A.; van der Graaf, M.; Arús, C.; Van Huffel, S.
2004-09-01
The purpose was to objectively compare the application of several techniques and the use of several input features for brain tumour classification using Magnetic Resonance Spectroscopy (MRS). Short echo time 1H MRS signals from patients with glioblastomas ( n = 87), meningiomas ( n = 57), metastases ( n = 39), and astrocytomas grade II ( n = 22) were provided by six centres in the European Union funded INTERPRET project. Linear discriminant analysis, least squares support vector machines (LS-SVM) with a linear kernel and LS-SVM with radial basis function kernel were applied and evaluated over 100 stratified random splittings of the dataset into training and test sets. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of binary classifiers, while the percentage of correct classifications was used to evaluate the multiclass classifiers. The influence of several factors on the classification performance has been tested: L2- vs. water normalization, magnitude vs. real spectra and baseline correction. The effect of input feature reduction was also investigated by using only the selected frequency regions containing the most discriminatory information, and peak integrated values. Using L2-normalized complete spectra the automated binary classifiers reached a mean test AUC of more than 0.95, except for glioblastomas vs. metastases. Similar results were obtained for all classification techniques and input features except for water normalized spectra, where classification performance was lower. This indicates that data acquisition and processing can be simplified for classification purposes, excluding the need for separate water signal acquisition, baseline correction or phasing.
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords
Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao
2015-01-01
For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction. PMID:26783534
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Classifier for gravitational-wave inspiral signals in nonideal single-detector data
NASA Astrophysics Data System (ADS)
Kapadia, S. J.; Dent, T.; Dal Canton, T.
2017-11-01
We describe a multivariate classifier for candidate events in a templated search for gravitational-wave (GW) inspiral signals from neutron-star-black-hole (NS-BH) binaries, in data from ground-based detectors where sensitivity is limited by non-Gaussian noise transients. The standard signal-to-noise ratio (SNR) and chi-squared test for inspiral searches use only properties of a single matched filter at the time of an event; instead, we propose a classifier using features derived from a bank of inspiral templates around the time of each event, and also from a search using approximate sine-Gaussian templates. The classifier thus extracts additional information from strain data to discriminate inspiral signals from noise transients. We evaluate a random forest classifier on a set of single-detector events obtained from realistic simulated advanced LIGO data, using simulated NS-BH signals added to the data. The new classifier detects a factor of 1.5-2 more signals at low false positive rates as compared to the standard "reweighted SNR" statistic, and does not require the chi-squared test to be computed. Conversely, if only the SNR and chi-squared values of single-detector events are available, random forest classification performs nearly identically to the reweighted SNR.
Gossage, Lucy; Pires, Douglas E. V.; Olivera-Nappa, Álvaro; Asenjo, Juan; Bycroft, Mark; Blundell, Tom L.; Eisen, Tim
2014-01-01
Mutations in the von Hippel–Lindau (VHL) gene are pathogenic in VHL disease, congenital polycythaemia and clear cell renal carcinoma (ccRCC). pVHL forms a ternary complex with elongin C and elongin B, critical for pVHL stability and function, which interacts with Cullin-2 and RING-box protein 1 to target hypoxia-inducible factor for polyubiquitination and proteasomal degradation. We describe a comprehensive database of missense VHL mutations linked to experimental and clinical data. We use predictions from in silico tools to link the functional effects of missense VHL mutations to phenotype. The risk of ccRCC in VHL disease is linked to the degree of destabilization resulting from missense mutations. An optimized binary classification system (symphony), which integrates predictions from five in silico methods, can predict the risk of ccRCC associated with VHL missense mutations with high sensitivity and specificity. We use symphony to generate predictions for risk of ccRCC for all possible VHL missense mutations and present these predictions, in association with clinical and experimental data, in a publically available, searchable web server. PMID:24969085
NASA Astrophysics Data System (ADS)
Takeda, Nanami; Hoshino, Satoshi; Xie, Lixin; Chen, Shuo; Ikeuchi, Issei; Natsui, Ryuichi; Nakura, Kensuke; Yabuuchi, Naoaki
2017-11-01
A binary system of LiMoO2 - x LiF (0 ≤ x ≤ 2), Li1+xMoO2Fx, is systematically studied as potential positive electrode materials for rechargeable Li batteries. Single phase and nanosized samples on this binary system are successfully prepared by using a mechanical milling route. Crystal structures and Li storage properties on the binary system are also examined. Li2MoO2F (x = 1), which is classified as a cation-/anion-disordered rocksalt-type structure and is a thermodynamically metastable phase, delivers a large reversible capacity of over 300 mAh g-1 in Li cells with good reversibility. Highly reversible Li storage is realized for Li2MoO2F consisting of nanosized particles based on Mo3+/Mo5+ two-electron redox as evidenced by ex-situ X-ray absorption spectroscopy coupled with ex-situ X-ray diffractometry. Moreover, the presence of the most electronegative element in the framework structure effectively increases the electrode potential of Mo redox through an inductive effect. From these results, potential of nanosized lithium molybdenum oxyfluorides for high-capacity positive electrode materials of rechargeable Li batteries are discussed.
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au
In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Xia, Jie; Hsieh, Jui-Hua; Hu, Huabin; Wu, Song; Wang, Xiang Simon
2017-06-26
Structure-based virtual screening (SBVS) has become an indispensable technique for hit identification at the early stage of drug discovery. However, the accuracy of current scoring functions is not high enough to confer success to every target and thus remains to be improved. Previously, we had developed binary pose filters (PFs) using knowledge derived from the protein-ligand interface of a single X-ray structure of a specific target. This novel approach had been validated as an effective way to improve ligand enrichment. Continuing from it, in the present work we attempted to incorporate knowledge collected from diverse protein-ligand interfaces of multiple crystal structures of the same target to build PF ensembles (PFEs). Toward this end, we first constructed a comprehensive data set to meet the requirements of ensemble modeling and validation. This set contains 10 diverse targets, 118 well-prepared X-ray structures of protein-ligand complexes, and large benchmarking actives/decoys sets. Notably, we designed a unique workflow of two-layer classifiers based on the concept of ensemble learning and applied it to the construction of PFEs for all of the targets. Through extensive benchmarking studies, we demonstrated that (1) coupling PFE with Chemgauss4 significantly improves the early enrichment of Chemgauss4 itself and (2) PFEs show greater consistency in boosting early enrichment and larger overall enrichment than our prior PFs. In addition, we analyzed the pairwise topological similarities among cognate ligands used to construct PFEs and found that it is the higher chemical diversity of the cognate ligands that leads to the improved performance of PFEs. Taken together, the results so far prove that the incorporation of knowledge from diverse protein-ligand interfaces by ensemble modeling is able to enhance the screening competence of SBVS scoring functions.
Pai, Priyadarshini P; Mondal, Sukanta
2016-10-01
Proteins interact with carbohydrates to perform various cellular interactions. Of the many carbohydrate ligands that proteins bind with, mannose constitute an important class, playing important roles in host defense mechanisms. Accurate identification of mannose-interacting residues (MIR) may provide important clues to decipher the underlying mechanisms of protein-mannose interactions during infections. This study proposes an approach using an ensemble of base classifiers for prediction of MIR using their evolutionary information in the form of position-specific scoring matrix. The base classifiers are random forests trained by different subsets of training data set Dset128 using 10-fold cross-validation. The optimized ensemble of base classifiers, MOWGLI, is then used to predict MIR on protein chains of the test data set Dtestset29 which showed a promising performance with 92.0% accurate prediction. An overall improvement of 26.6% in precision was observed upon comparison with the state-of-art. It is hoped that this approach, yielding enhanced predictions, could be eventually used for applications in drug design and vaccine development.
Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-04-10
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
CXOGBS J173620.2-293338: A candidate symbiotic X-ray binary associated with a bulge carbon star
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hynes, Robert I.; Britt, C. T.; Johnson, C. B.
2014-01-01
The Galactic Bulge Survey (GBS) is a wide but shallow X-ray survey of regions above and below the Plane in the Galactic Bulge. It was performed using the Chandra X-ray Observatory's ACIS camera. The survey is primarily designed to find and classify low luminosity X-ray binaries. The combination of the X-ray depth of the survey and the accessibility of optical and infrared counterparts makes this survey ideally suited to identification of new symbiotic X-ray binaries (SyXBs) in the Bulge. We consider the specific case of the X-ray source CXOGBS J173620.2-293338. It is coincident to within 1 arcsec with a verymore » red star, showing a carbon star spectrum and irregular variability in the Optical Gravitational Lensing Experiment data. We classify the star as a late C-R type carbon star based on its spectral features, photometric properties, and variability characteristics, although a low-luminosity C-N type cannot be ruled out. The brightness of the star implies it is located in the Bulge, and its photometric properties are overall consistent with the Bulge carbon star population. Given the rarity of carbon stars in the Bulge, we estimate the probability of such a close chance alignment of any GBS source with a carbon star to be ≲ 10{sup –3}, suggesting that this is likely to be a real match. If the X-ray source is indeed associated with the carbon star, then the X-ray luminosity is around 9 × 10{sup 32} erg s{sup –1}. Its characteristics are consistent with a low luminosity SyXB, or possibly a low accretion rate white dwarf symbiotic.« less
Brain amyloidosis ascertainment from cognitive, imaging, and peripheral blood protein measures
Hwang, Kristy S.; Avila, David; Elashoff, David; Kohannim, Omid; Teng, Edmond; Sokolow, Sophie; Jack, Clifford R.; Jagust, William J.; Shaw, Leslie; Trojanowski, John Q.; Weiner, Michael W.; Thompson, Paul M.
2015-01-01
Background: The goal of this study was to identify a clinical biomarker signature of brain amyloidosis in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI1) mild cognitive impairment (MCI) cohort. Methods: We developed a multimodal biomarker classifier for predicting brain amyloidosis using cognitive, imaging, and peripheral blood protein ADNI1 MCI data. We used CSF β-amyloid 1–42 (Aβ42) ≤192 pg/mL as proxy measure for Pittsburgh compound B (PiB)-PET standard uptake value ratio ≥1.5. We trained our classifier in the subcohort with CSF Aβ42 but no PiB-PET data and tested its performance in the subcohort with PiB-PET but no CSF Aβ42 data. We also examined the utility of our biomarker signature for predicting disease progression from MCI to Alzheimer dementia. Results: The CSF training classifier selected Mini-Mental State Examination, Trails B, Auditory Verbal Learning Test delayed recall, education, APOE genotype, interleukin 6 receptor, clusterin, and ApoE protein, and achieved leave-one-out accuracy of 85% (area under the curve [AUC] = 0.8). The PiB testing classifier achieved an AUC of 0.72, and when classifier self-tuning was allowed, AUC = 0.74. The 36-month disease-progression classifier achieved AUC = 0.75 and accuracy = 71%. Conclusions: Automated classifiers based on cognitive and peripheral blood protein variables can identify the presence of brain amyloidosis with a modest level of accuracy. Such methods could have implications for clinical trial design and enrollment in the near future. Classification of evidence: This study provides Class II evidence that a classification algorithm based on cognitive, imaging, and peripheral blood protein measures identifies patients with brain amyloid on PiB-PET with moderate accuracy (sensitivity 68%, specificity 78%). PMID:25609767
Brain amyloidosis ascertainment from cognitive, imaging, and peripheral blood protein measures.
Apostolova, Liana G; Hwang, Kristy S; Avila, David; Elashoff, David; Kohannim, Omid; Teng, Edmond; Sokolow, Sophie; Jack, Clifford R; Jagust, William J; Shaw, Leslie; Trojanowski, John Q; Weiner, Michael W; Thompson, Paul M
2015-02-17
The goal of this study was to identify a clinical biomarker signature of brain amyloidosis in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI1) mild cognitive impairment (MCI) cohort. We developed a multimodal biomarker classifier for predicting brain amyloidosis using cognitive, imaging, and peripheral blood protein ADNI1 MCI data. We used CSF β-amyloid 1-42 (Aβ42) ≤ 192 pg/mL as proxy measure for Pittsburgh compound B (PiB)-PET standard uptake value ratio ≥ 1.5. We trained our classifier in the subcohort with CSF Aβ42 but no PiB-PET data and tested its performance in the subcohort with PiB-PET but no CSF Aβ42 data. We also examined the utility of our biomarker signature for predicting disease progression from MCI to Alzheimer dementia. The CSF training classifier selected Mini-Mental State Examination, Trails B, Auditory Verbal Learning Test delayed recall, education, APOE genotype, interleukin 6 receptor, clusterin, and ApoE protein, and achieved leave-one-out accuracy of 85% (area under the curve [AUC] = 0.8). The PiB testing classifier achieved an AUC of 0.72, and when classifier self-tuning was allowed, AUC = 0.74. The 36-month disease-progression classifier achieved AUC = 0.75 and accuracy = 71%. Automated classifiers based on cognitive and peripheral blood protein variables can identify the presence of brain amyloidosis with a modest level of accuracy. Such methods could have implications for clinical trial design and enrollment in the near future. This study provides Class II evidence that a classification algorithm based on cognitive, imaging, and peripheral blood protein measures identifies patients with brain amyloid on PiB-PET with moderate accuracy (sensitivity 68%, specificity 78%). © 2015 American Academy of Neurology.
Anavi, Yaron; Kogan, Ilya; Gelbart, Elad; Geva, Ofer; Greenspan, Hayit
2015-08-01
In this work various approaches are investigated for X-ray image retrieval and specifically chest pathology retrieval. Given a query image taken from a data set of 443 images, the objective is to rank images according to similarity. Different features, including binary features, texture features, and deep learning (CNN) features are examined. In addition, two approaches are investigated for the retrieval task. One approach is based on the distance of image descriptors using the above features (hereon termed the "descriptor"-based approach); the second approach ("classification"-based approach) is based on a probability descriptor, generated by a pair-wise classification of each two classes (pathologies) and their decision values using an SVM classifier. Best results are achieved using deep learning features in a classification scheme.
Identifying Drug-Target Interactions with Decision Templates.
Yan, Xiao-Ying; Zhang, Shao-Wu
2018-01-01
During the development process of new drugs, identification of the drug-target interactions wins primary concerns. However, the chemical or biological experiments bear the limitation in coverage as well as the huge cost of both time and money. Based on drug similarity and target similarity, chemogenomic methods can be able to predict potential drug-target interactions (DTIs) on a large scale and have no luxurious need about target structures or ligand entries. In order to reflect the cases that the drugs having variant structures interact with common targets and the targets having dissimilar sequences interact with same drugs. In addition, though several other similarity metrics have been developed to predict DTIs, the combination of multiple similarity metrics (especially heterogeneous similarities) is too naïve to sufficiently explore the multiple similarities. In this paper, based on Gene Ontology and pathway annotation, we introduce two novel target similarity metrics to address above issues. More importantly, we propose a more effective strategy via decision template to integrate multiple classifiers designed with multiple similarity metrics. In the scenarios that predict existing targets for new drugs and predict approved drugs for new protein targets, the results on the DTI benchmark datasets show that our target similarity metrics are able to enhance the predictive accuracies in two scenarios. And the elaborate fusion strategy of multiple classifiers has better predictive power than the naïve combination of multiple similarity metrics. Compared with other two state-of-the-art approaches on the four popular benchmark datasets of binary drug-target interactions, our method achieves the best results in terms of AUC and AUPR for predicting available targets for new drugs (S2), and predicting approved drugs for new protein targets (S3).These results demonstrate that our method can effectively predict the drug-target interactions. The software package can freely available at https://github.com/NwpuSY/DT_all.git for academic users. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
How I Learned to Stop Worrying and Love Eclipsing Binaries
NASA Astrophysics Data System (ADS)
Moe, Maxwell Cassady
Relatively massive B-type stars with closely orbiting stellar companions can evolve to produce Type Ia supernovae, X-ray binaries, millisecond pulsars, mergers of neutron stars, gamma ray bursts, and sources of gravitational waves. However, the formation mechanism, intrinsic frequency, and evolutionary processes of B-type binaries are poorly understood. As of 2012, the binary statistics of massive stars had not been measured at low metallicities, extreme mass ratios, or intermediate orbital periods. This thesis utilizes large data sets of eclipsing binaries to measure the physical properties of B-type binaries in these previously unexplored portions of the parameter space. The updated binary statistics provide invaluable insight into the formation of massive stars and binaries as well as reliable initial conditions for population synthesis studies of binary star evolution. We first compare the properties of B-type eclipsing binaries in our Milky Way Galaxy and the nearby Magellanic Cloud Galaxies. We model the eclipsing binary light curves and perform detailed Monte Carlo simulations to recover the intrinsic properties and distributions of the close binary population. We find the frequency, period distribution, and mass-ratio distribution of close B-type binaries do not significantly depend on metallicity or environment. These results indicate the formation of massive binaries are relatively insensitive to their chemical abundances or immediate surroundings. Second, we search for low-mass eclipsing companions to massive B-type stars in the Large Magellanic Cloud Galaxy. In addition to finding such extreme mass-ratio binaries, we serendipitously discover a new class of eclipsing binaries. Each system comprises a massive B-type star that is fully formed and a nascent low-mass companion that is still contracting toward its normal phase of evolution. The large low-mass secondaries discernibly reflect much of the light they intercept from the hot B-type stars, thereby producing sinusoidal variations in perceived brightness as they orbit. These nascent eclipsing binaries are embedded in the hearts of star-forming emission nebulae, and therefore provide a unique snapshot into the formation and evolution of massive binaries and stellar nurseries. We next examine a large sample of B-type eclipsing binaries with intermediate orbital periods. To achieve such a task, we develop an automated pipeline to classify the eclipsing binaries, measure their physical properties from the observed light curves, and recover the intrinsic binary statistics by correcting for selection effects. We find the population of massive binaries at intermediate separations differ from those orbiting in close proximity. Close massive binaries favor small eccentricities and have correlated component masses, demonstrating they coevolved via competitive accretion during their formation in the circumbinary disk. Meanwhile, B-type binaries at slightly wider separations are born with large eccentricities and are weighted toward extreme mass ratios, indicating the components formed relatively independently and subsequently evolved to their current configurations via dynamical interactions. By using eclipsing binaries as accurate age indicators, we also reveal that the binary orbital eccentricities and the line-of-sight dust extinctions are anticorrelated with respect to time. These empirical relations provide robust constraints for tidal evolution in massive binaries and the evolution of the dust content in their surrounding environments. Finally, we compile observations of early-type binaries identified via spectroscopy, eclipses, long-baseline interferometry, adaptive optics, lucky imaging, high-contrast photometry, and common proper motion. We combine the samples from the various surveys and correct for their respective selection effects to determine a comprehensive nature of the intrinsic binary statistics of massive stars. We find the probability distributions of primary mass, secondary mass, orbital period, and orbital eccentricity are all interrelated. These updated multiplicity statistics imply a greater frequency of low-mass X-ray binaries, millisecond pulsars, and Type Ia supernovae than previously predicted.
Simões, M; Pereira, M O; Vieira, M J
2007-01-01
This study investigates the phenotype of turbulent (Re = 5,200) and laminar (Re = 2,000) flow-generated Pseudomonas fluorescens biofilms. Three P. fluorescens strains, the type strain ATCC 13525 and two strains isolated from an industrial processing plant, D3-348 and D3-350, were used throughout this study. The isolated strains were used to form single and binary biofilms. The biofilm physiology (metabolic activity, cellular density, mass, extracellular polymeric substances, structural characteristics and outer membrane proteins [OMP] expression) was compared. The results indicate that, for every situation, turbulent flow-generated biofilms were more active (p < 0.05), had more mass per cm(2) (p < 0.05), a higher cellular density (p < 0.05), distinct morphology, similar matrix proteins (p > 0.1) and identical (isolated strains -single and binary biofilms) and higher (type strain) matrix polysaccharides contents (p < 0.05) than laminar flow-generated biofilms. Flow-generated biofilms formed by the type strain revealed a considerably higher cellular density and amount of matrix polysaccharides than single and binary biofilms formed by the isolated strains (p < 0.05). Similar OMP expression was detected for the several single strains and for the binary situation, not dependent on the hydrodynamic conditions. Binary biofilms revealed an equal coexistence of the isolated strains with apparent neutral interactions. In summary, the biofilms formed by the type strain represent, apparently, the worst situation in a context of control. The results obtained clearly illustrate the importance of considering strain variation and hydrodynamics in biofilm development, and complement previous studies which have focused on physical aspects of structural and density differences.
Application of succulent plant leaves for Agrobacterium infiltration-mediated protein production
USDA-ARS?s Scientific Manuscript database
Infiltration of tobacco leaves with a suspension of Agrobacterium tumefaciens harboring a binary plant expression plasmid provides a convenient method for laboratory scale protein production. When expressing plant cell wall degrading enzymes in the widely used tobacco (Nicotiana benthamiana), diffic...
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.
Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit
2016-10-24
To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.
Sharma, Mahima; Gupta, Gagan D; Kumar, Vinay
2018-02-01
The activated binary toxin (BinAB) from Lysinibacillus sphaericus binds to surface receptor protein (Cqm1) on the midgut cell membrane and kills Culex quinquefasciatus larvae on internalization. Cqm1 is attached to cells via a glycosyl-phosphatidylinositol (GPI) anchor. It has been classified as a member of glycoside hydrolase family 13 of the CAZy database. Here, we report characterization of the ordered domain (residues 23-560) of Cqm1. Gene expressing Cqm1 of BinAB susceptible mosquito was chemically synthesized and the protein was purified using E. coli expression system. Values for the Michaelis-Menten kinetics parameters towards 4-nitrophenyl α-D-glucopyranoside (α-pNPG) substrate were estimated to be 0.44 mM (Km) and 1.9 s -1 (kcat). Thin layer chromatography experiments established Cqm1 as α-glucosidase competent to cleave α-1,4-glycosidic bonds of maltose and maltotriose with high glycosyltransferase activity to form glucose-oligomers. The observed hydrolysis and synthesis of glucose-oligomers is consistent with open and accessible active-site in the structural model. The protein also hydrolyses glycogen and sucrose. These activities suggest that Cqm1 may be involved in carbohydrate metabolism in mosquitoes. Further, toxic BinA component does not inhibit α-glucosidase activity of Cqm1, while BinB reduced the activity by nearly 50%. The surface plasmon resonance study reveals strong binding of BinB with Cqm1 (Kd, 9.8 nM). BinA interaction with Cqm1 however, is 1000-fold weaker. Notably the estimated Kd values match well with dissociation constants reported earlier with larvae brush border membrane fractions. The Cqm1 protein forms a stable dimer that is consistent with its apical localization in lipid rafts. Its melting temperature (T m ) as observed by thermofluor-shift assay is 51.5 °C and Ca 2+ provides structural stability to the protein. Copyright © 2017 Elsevier Ltd. All rights reserved.
Semi-supervised protein subcellular localization.
Xu, Qian; Hu, Derek Hao; Xue, Hong; Yu, Weichuan; Yang, Qiang
2009-01-30
Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data. In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions. Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.
NASA Astrophysics Data System (ADS)
Hansen, C. J.; Jofré, P.; Koch, A.; McWilliam, A.; Sneden, C. S.
2017-02-01
Blue metal-poor (BMP) stars are main sequence stars that appear bluer and more luminous than normal turnoff stars. They were originally singled out by using B-V and U-B colour cuts.Early studies found that a larger fraction of field BMP stars were binaries compared to normal halo stars. Thus, BMP stars are ideal field blue straggler candidates for investigating internal stellar evolution processes and binary interaction. In particular, the presence or depletion in lithium in their spectra is a powerful indicator of their origin. They are either old, halo blue stragglers experiencing internal mixing processes or mass transfer (Li-depletion), or intermediate-age, single stars of possibly extragalactic origin (2.2 dex halo plateau Li). However, we note that internal mixing processes can lead to an increased level of Li. Hence, this study combines photometry and spectroscopy to unveil the origin of various BMP stars. We first show how to separate binaries from young blue stars using photometry, metallicity and lithium. Using a sample of 80 BMP stars (T > 6300 K), we find that 97% of the BMP binaries have V-Ks0 < 1.08 ± 0.03, while BMP stars that are not binaries lie above this cut in two thirds of the cases. This cut can help classify stars that lack radial velocities from follow-up observations. We then trace the origin of two BMP stars from the photometric sample by conducting a full chemical analysis using new high-resolution and high signal-to-noise spectra. Based on their radial velocities, Li, α and s- and r-process abundances we show that BPS CS22874-042 is a single star (A(Li) = 2.38 ± 0.10 dex) while with A(Li)= 2.23 ± 0.07 dex CD-48 2445 is a binary, contrary to earlier findings. Our analysis emphasises that field blue stragglers can be segregated from single metal-poor stars, using (V-Ks) colours with a fraction of single stars polluting the binary sample, but not vice versa. These two groups can only be properly separated by using information from stellar spectra, illustrating the need for accurate and precise stellar parameters and high-resolution, high-S/N spectra in order to fully understand and classify this intriguing class of stars. Our high-resolution spectrum analysis confirms the findings from the colour cuts and shows that CS 22874-042 is single, while CD -48 2445 is most likely a binary. Moreover, the stellar abundances show that both stars formed in situ; CS 22874-042 carries traces of massive star enrichment and CD -48 2445 shows indications of AGB mass transfer mixed with gases ejected possibly from neutron star mergers. Based on UVES archive data 077.B-0507 and 090.B-0605. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile. Full Table 4 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A54
Shen, Hong-Bin; Chou, Kuo-Chen
2005-11-25
The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hsieh, Tien-Hao; Lai, Shih-Ping; Belloche, Arnaud
2016-07-20
The formation mechanism of brown dwarfs (BDs) is one of the long-standing problems in star formation because the typical Jeans mass in molecular clouds is too large to form these substellar objects. To answer this question, it is crucial to study a BD in the embedded phase. IRAS 16253–2429 is classified as a very low-luminosity object (VeLLO) with an internal luminosity of <0.1 L {sub ⊙}. VeLLOs are believed to be very low-mass protostars or even proto-BDs. We observed the jet/outflow driven by IRAS 16253–2429 in CO (2–1), (6–5), and (7–6) using the IRAM 30 m and Atacama Pathfinder Experimentmore » telescopes and the Submillimeter Array (SMA) in order to study its dynamical features and physical properties. Our SMA map reveals two protostellar jets, indicating the existence of a proto-binary system as implied by the precessing jet detected in H{sub 2} emission. We detect a wiggling pattern in the position–velocity diagrams along the jet axes, which is likely due to the binary orbital motion. Based on this information, we derive the current mass of the binary as ∼0.032 M{sub ⊙}. Given the low envelope mass, IRAS 16253–2429 will form a binary that probably consist of one or two BDs. Furthermore, we found that the outflow force as well as the mass accretion rate are very low based on the multi-transition CO observations, which suggests that the final masses of the binary components are at the stellar/substellar boundary. Since IRAS 16253 is located in an isolated environment, we suggest that BDs can form through fragmentation and collapse, similar to low-mass stars.« less
Exploring the Power of Heterogeneous Information Sources
2011-01-01
Individual movies are classified as being of one or more of 18 genres , such as Comedy and Thriller , which can be treated as binary vectors. 2) User... genres , from different sources, in different formats, and with different types of representation. Many interesting patterns cannot be extracted from a...provide better web services or help film distributors in decision making, we need to conduct integrative analysis of all the information sources. For
Causal Video Object Segmentation From Persistence of Occlusions
2015-05-01
Precision, recall, and F-measure are reported on the ground truth anno - tations converted to binary masks. Note we cannot evaluate “number of...to lack of occlusions. References [1] P. Arbelaez, M. Maire, C. Fowlkes, and J . Malik. Con- tour detection and hierarchical image segmentation. TPAMI...X. Bai, J . Wang, D. Simons, and G. Sapiro. Video snapcut: robust video object cutout using localized classifiers. In ACM Transactions on Graphics
Variable Stars Observed in the Galactic Disk by AST3-1 from Dome A, Antarctica
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Lingzhi; Ma, Bin; Hu, Yi
AST3-1 is the second-generation wide-field optical photometric telescope dedicated to time-domain astronomy at Dome A, Antarctica. Here, we present the results of an i -band images survey from AST3-1 toward one Galactic disk field. Based on time-series photometry of 92,583 stars, 560 variable stars were detected with i magnitude ≤16.5 mag during eight days of observations; 339 of these are previously unknown variables. We tentatively classify the 560 variables as 285 eclipsing binaries (EW, EB, and EA), 27 pulsating variable stars ( δ Scuti, γ Doradus, δ Cephei variable, and RR Lyrae stars), and 248 other types of variables (unclassifiedmore » periodic, multiperiodic, and aperiodic variable stars). Of the eclipsing binaries, 34 show O’Connell effects. One of the aperiodic variables shows a plateau light curve and another variable shows a secondary maximum after peak brightness. We also detected a complex binary system with an RS CVn-like light-curve morphology; this object is being followed-up spectroscopically using the Gemini South telescope.« less
Binary space partitioning trees and their uses
NASA Technical Reports Server (NTRS)
Bell, Bradley N.
1989-01-01
Binary Space Partitioning (BSP) trees have some qualities that make them useful in solving many graphics related problems. The purpose is to describe what a BSP tree is, and how it can be used to solve the problem of hidden surface removal, and constructive solid geometry. The BSP tree is based on the idea that a plane acting as a divider subdivides space into two parts with one being on the positive side and the other on the negative. A polygonal solid is then represented as the volume defined by the collective interior half spaces of the solid's bounding surfaces. The nature of how the tree is organized lends itself well for sorting polygons relative to an arbitrary point in 3 space. The speed at which the tree can be traversed for depth sorting is fast enough to provide hidden surface removal at interactive speeds. The fact that a BSP tree actually represents a polygonal solid as a bounded volume also makes it quite useful in performing the boolean operations used in constructive solid geometry. Due to the nature of the BSP tree, polygons can be classified as they are subdivided. The ability to classify polygons as they are subdivided can enhance the simplicity of implementing constructive solid geometry.
Protein classification using sequential pattern mining.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2006-01-01
Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.
Porpiglia, Ermelinda; Hidalgo, Daniel; Koulnis, Miroslav; Tzafriri, Abraham R.; Socolovsky, Merav
2012-01-01
Erythropoietin (Epo)-induced Stat5 phosphorylation (p-Stat5) is essential for both basal erythropoiesis and for its acceleration during hypoxic stress. A key challenge lies in understanding how Stat5 signaling elicits distinct functions during basal and stress erythropoiesis. Here we asked whether these distinct functions might be specified by the dynamic behavior of the Stat5 signal. We used flow cytometry to analyze Stat5 phosphorylation dynamics in primary erythropoietic tissue in vivo and in vitro, identifying two signaling modalities. In later (basophilic) erythroblasts, Epo stimulation triggers a low intensity but decisive, binary (digital) p-Stat5 signal. In early erythroblasts the binary signal is superseded by a high-intensity graded (analog) p-Stat5 response. We elucidated the biological functions of binary and graded Stat5 signaling using the EpoR-HM mice, which express a “knocked-in” EpoR mutant lacking cytoplasmic phosphotyrosines. Strikingly, EpoR-HM mice are restricted to the binary signaling mode, which rescues these mice from fatal perinatal anemia by promoting binary survival decisions in erythroblasts. However, the absence of the graded p-Stat5 response in the EpoR-HM mice prevents them from accelerating red cell production in response to stress, including a failure to upregulate the transferrin receptor, which we show is a novel stress target. We found that Stat5 protein levels decline with erythroblast differentiation, governing the transition from high-intensity graded signaling in early erythroblasts to low-intensity binary signaling in later erythroblasts. Thus, using exogenous Stat5, we converted later erythroblasts into high-intensity graded signal transducers capable of eliciting a downstream stress response. Unlike the Stat5 protein, EpoR expression in erythroblasts does not limit the Stat5 signaling response, a non-Michaelian paradigm with therapeutic implications in myeloproliferative disease. Our findings show how the binary and graded modalities combine to generate high-fidelity Stat5 signaling over the entire basal and stress Epo range. They suggest that dynamic behavior may encode information during STAT signal transduction. PMID:22969412
Automated Classification of ROSAT Sources Using Heterogeneous Multiwavelength Source Catalogs
NASA Technical Reports Server (NTRS)
McGlynn, Thomas; Suchkov, A. A.; Winter, E. L.; Hanisch, R. J.; White, R. L.; Ochsenbein, F.; Derriere, S.; Voges, W.; Corcoran, M. F.
2004-01-01
We describe an on-line system for automated classification of X-ray sources, ClassX, and present preliminary results of classification of the three major catalogs of ROSAT sources, RASS BSC, RASS FSC, and WGACAT, into six class categories: stars, white dwarfs, X-ray binaries, galaxies, AGNs, and clusters of galaxies. ClassX is based on a machine learning technology. It represents a system of classifiers, each classifier consisting of a considerable number of oblique decision trees. These trees are built as the classifier is 'trained' to recognize various classes of objects using a training sample of sources of known object types. Each source is characterized by a preselected set of parameters, or attributes; the same set is then used as the classifier conducts classification of sources of unknown identity. The ClassX pipeline features an automatic search for X-ray source counterparts among heterogeneous data sets in on-line data archives using Virtual Observatory protocols; it retrieves from those archives all the attributes required by the selected classifier and inputs them to the classifier. The user input to ClassX is typically a file with target coordinates, optionally complemented with target IDs. The output contains the class name, attributes, and class probabilities for all classified targets. We discuss ways to characterize and assess the classifier quality and performance and present the respective validation procedures. Based on both internal and external validation, we conclude that the ClassX classifiers yield reasonable and reliable classifications for ROSAT sources and have the potential to broaden class representation significantly for rare object types.
Zhang, Wei; Chen, Jiwang; Chen, Yue; Xia, Wenshui; Xiong, Youling L; Wang, Hongxun
2016-03-15
Chitosan/whey protein isolate film incorporated with sodium laurate-modified TiO2 nanoparticles was developed. The nanocomposite film was characterized by scanning electron microscopy, X-ray diffraction and differential scanning calorimetry, and investigated in physicochemical properties as color, tensile strength, elongation at break, water vapor permeability and water adsorption isotherm. Our results showed that the nanoparticles improved the compatibility of whey protein isolate and chitosan. Addition of nanoparticles increased the whiteness of chitosan/whey protein isolate film, but decreased its transparency. Compared with binary film, the tensile strength and elongation at break of nanocomposite film were increased by 11.51% and 12.01%, respectively, and water vapor permeability was decreased by 7.60%. The equilibrium moisture of nanocomposite film was lower than binary film, and its water sorption isotherm of the nanocomposite film fitted well to Guggenheim-Anderson-deBoer model. The findings contributed to the development of novel food packaging materials. Copyright © 2015 Elsevier Ltd. All rights reserved.
ECOD: new developments in the evolutionary classification of domains
Schaeffer, R. Dustin; Liao, Yuxing; Cheng, Hua; Grishin, Nick V.
2017-01-01
Evolutionary Classification Of protein Domains (ECOD) (http://prodata.swmed.edu/ecod) comprehensively classifies protein with known spatial structures maintained by the Protein Data Bank (PDB) into evolutionary groups of protein domains. ECOD relies on a combination of automatic and manual weekly updates to achieve its high accuracy and coverage with a short update cycle. ECOD classifies the approximately 120 000 depositions of the PDB into more than 500 000 domains in ∼3400 homologous groups. We show the performance of the weekly update pipeline since the release of ECOD, describe improvements to the ECOD website and available search options, and discuss novel structures and homologous groups that have been classified in the recent updates. Finally, we discuss the future directions of ECOD and further improvements planned for the hierarchy and update process. PMID:27899594
Adiabatic Quantum Anomaly Detection and Machine Learning
NASA Astrophysics Data System (ADS)
Pudenz, Kristen; Lidar, Daniel
2012-02-01
We present methods of anomaly detection and machine learning using adiabatic quantum computing. The machine learning algorithm is a boosting approach which seeks to optimally combine somewhat accurate classification functions to create a unified classifier which is much more accurate than its components. This algorithm then becomes the first part of the larger anomaly detection algorithm. In the anomaly detection routine, we first use adiabatic quantum computing to train two classifiers which detect two sets, the overlap of which forms the anomaly class. We call this the learning phase. Then, in the testing phase, the two learned classification functions are combined to form the final Hamiltonian for an adiabatic quantum computation, the low energy states of which represent the anomalies in a binary vector space.
LBP and SIFT based facial expression recognition
NASA Astrophysics Data System (ADS)
Sumer, Omer; Gunes, Ece O.
2015-02-01
This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.
Tarafder, Sumit; Toukir Ahmed, Md; Iqbal, Sumaiya; Tamjidul Hoque, Md; Sohel Rahman, M
2018-03-14
Accessible surface area (ASA) of a protein residue is an effective feature for protein structure prediction, binding region identification, fold recognition problems etc. Improving the prediction of ASA by the application of effective feature variables is a challenging but explorable task to consider, specially in the field of machine learning. Among the existing predictors of ASA, REGAd 3 p is a highly accurate ASA predictor which is based on regularized exact regression with polynomial kernel of degree 3. In this work, we present a new predictor RBSURFpred, which extends REGAd 3 p on several dimensions by incorporating 58 physicochemical, evolutionary and structural properties into 9-tuple peptides via Chou's general PseAAC, which allowed us to obtain higher accuracies in predicting both real-valued and binary ASA. We have compared RBSURFpred for both real and binary space predictions with state-of-the-art predictors, such as REGAd 3 p and SPIDER2. We also have carried out a rigorous analysis of the performance of RBSURFpred in terms of different amino acids and their properties, and also with biologically relevant case-studies. The performance of RBSURFpred establishes itself as a useful tool for the community. Copyright © 2018 Elsevier Ltd. All rights reserved.
Consistent prediction of GO protein localization.
Spetale, Flavio E; Arce, Debora; Krsticevic, Flavia; Bulacio, Pilar; Tapia, Elizabeth
2018-05-17
The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC + , a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC + classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC + classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC + classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.
Wang, Xiaolei; Li, Chaoqun; Wang, Yan; Chen, Guangju
2013-12-20
We carried out molecular dynamics simulations and free energy calculations for a series of binary and ternary models of the cisplatin, transplatin and oxaliplatin agents binding to a monomeric Atox1 protein and a dimeric Atox1 protein to investigate their interaction mechanisms. All three platinum agents could respectively combine with the monomeric Atox1 protein and the dimeric Atox1 protein to form a stable binary and ternary complex due to the covalent interaction of the platinum center with the Atox1 protein. The results suggested that the extra interaction from the oxaliplatin ligand-Atox1 protein interface increases its affinity only for the OxaliPt + Atox1 model. The binding of the oxaliplatin agent to the Atox1 protein might cause larger deformation of the protein than those of the cisplatin and transplatin agents due to the larger size of the oxaliplatin ligand. However, the extra interactions to facilitate the stabilities of the ternary CisPt + 2Atox1 and OxaliPt + 2Atox1 models come from the α1 helices and α2-β4 loops of the Atox1 protein-Atox1 protein interface due to the cis conformation of the platinum agents. The combinations of two Atox1 proteins in an asymmetric way in the three ternary models were analyzed. These investigations might provide detailed information for understanding the interaction mechanism of the platinum agents binding to the Atox1 protein in the cytoplasm.
Effect of Glycerol Water Binary Mixtures on the Structure and Dynamics of Protein Solutions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghattyvenkatakrishna, Pavan K; Carri, Gustavo A.
We have performed 20ns of fully atomistic molecular dynamics simulations of Hen Egg-White Lysozyme in 0, 10, 20, 30 and 100% by weight of glycerol in water to better understand the microscopic physics behind the bioprotection offered by glycerol to naturally occuring biological systems. The sovlent exposure of protein surface residues changes when glycerol is introduced. The dynamic behavior of the protein, as quantified by the Incoherent Intermediate Scattering Function, shows a non-monotonic dependence on glycerol content. The fluctuations of the protein residues with respect to each other were found to be similar in all water containing solvents; but differentmore » from the pure glycerol case. The increase in the number of protein glycerol hydrogen bonds in glycerol water binary mixtures explains the slowing down of protein dynamics as the glycerol content increases. We also explored the dynamic behavior of the hydration layer. We show that the short-length scale dynamics of this layer are insenstive to glycerol concentration. However, the long-length scale behavior shows a significant dependence on glycerol content. We also provide insights into the behavior of bound and mobile water molecules.« less
Evidence for network evolution in an arabidopsis interactome map
USDA-ARS?s Scientific Manuscript database
Plants have unique features that evolved in response to their environments and ecosystems. A full account of the complex cellular networks that underlie plant-specific functions is still missing. We describe a proteome-wide binary protein-protein interaction map for the interactome network of the pl...
USDA-ARS?s Scientific Manuscript database
Demonstrating direct interactions between host and virus proteins during infection is a major goal and challenge for the field of virology. The majority of interactions are not binary or easily amenable to structural determination. Using infectious preparations of a polerovirus (Potato leafroll viru...
Automatic classification of protein structures using physicochemical parameters.
Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam
2014-09-01
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
Agrobacterium-mediated transformation of lipomyces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Ziyu; Magnuson, Jon K.; Deng, Shuang
This disclosure provides Agrobacterium-mediated transformation methods for the oil-producing (oleaginous) yeast Lipomyces sp., as well as yeast produced by the method. Such methods utilize Agrobacterium sp. cells that have a T-DNA binary plasmid, wherein the T-DNA binary plasmid comprises a first nucleic acid molecule encoding a first protein and a second nucleic acid molecule encoding a selective marker that permits growth of transformed Lipomyces sp. cells in selective culture media comprising an antibiotic.
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier
NASA Astrophysics Data System (ADS)
Wang, Leilei; Cheng, Jinyong
2018-03-01
Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
NASA Astrophysics Data System (ADS)
Nucita, A. A.; Licchelli, D.; De Paolis, F.; Ingrosso, G.; Strafella, F.; Katysheva, N.; Shugarov, S.
2018-05-01
The transient event labelled as TCP J05074264+2447555 recently discovered towards the Taurus region was quickly recognized to be an ongoing microlensing event on a source located at distance of only 700-800 pc from Earth. Here, we show that observations with high sampling rate close to the time of maximum magnification revealed features that imply the presence of a binary lens system with very low-mass ratio components. We present a complete description of the binary lens system, which host an Earth-like planet with most likely mass of 9.2 ± 6.6 M⊕. Furthermore, the source estimated location and detailed Monte Carlo simulations allowed us to classify the event as due to the closest lens system, being at a distance of ≃380 pc and mass ≃0.25 M⊙.
Gene-specific cell labeling using MiMIC transposons
Gnerer, Joshua P.; Venken, Koen J. T.; Dierick, Herman A.
2015-01-01
Binary expression systems such as GAL4/UAS, LexA/LexAop and QF/QUAS have greatly enhanced the power of Drosophila as a model organism by allowing spatio-temporal manipulation of gene function as well as cell and neural circuit function. Tissue-specific expression of these heterologous transcription factors relies on random transposon integration near enhancers or promoters that drive the binary transcription factor embedded in the transposon. Alternatively, gene-specific promoter elements are directly fused to the binary factor within the transposon followed by random or site-specific integration. However, such insertions do not consistently recapitulate endogenous expression. We used Minos-Mediated Integration Cassette (MiMIC) transposons to convert host loci into reliable gene-specific binary effectors. MiMIC transposons allow recombinase-mediated cassette exchange to modify the transposon content. We developed novel exchange cassettes to convert coding intronic MiMIC insertions into gene-specific binary factor protein-traps. In addition, we expanded the set of binary factor exchange cassettes available for non-coding intronic MiMIC insertions. We show that binary factor conversions of different insertions in the same locus have indistinguishable expression patterns, suggesting that they reliably reflect endogenous gene expression. We show the efficacy and broad applicability of these new tools by dissecting the cellular expression patterns of the Drosophila serotonin receptor gene family. PMID:25712101
Is WD 1437-008 a cataclysmic variable?
NASA Astrophysics Data System (ADS)
Shimansky, V. V.; Nurtdinova, D. N.; Borisov, N. V.; Spiridonova, O. I.
2011-10-01
Comprehensive observations of a close binary candidate WD 1437-008 are performed. The shape and amplitude of the observed brightness variations are shown to be inconsistent with the hypothesis of reflection effects, and the photometric period of the system, P phot = 0. d 2775, is found to differ from the period of spectral variations, P sp = 0. d 272060. As a result, WD 1437-008 has been preliminarily classified as a low-inclination cataclysmic variable.
Adaptive feature selection using v-shaped binary particle swarm optimization.
Teng, Xuyang; Dong, Hongbin; Zhou, Xiurong
2017-01-01
Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers.
Adaptive feature selection using v-shaped binary particle swarm optimization
Dong, Hongbin; Zhou, Xiurong
2017-01-01
Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850
BVRI Photometric Study of the High Mass Ratio, Detached, Pre-contact W UMa Binary GQ Cancri
NASA Astrophysics Data System (ADS)
Samec, R. G.; Olson, A.; Caton, D.; Faulkner, D. R.
2017-12-01
CCD BVRcIc light curves of GQ Cancri were observed in April 2013 using the SARA North 0.9-meter Telescope at Kitt Peak National Observatory in Arizona in remote mode. It is a high-amplitude (V 0.9 magnitude) K0±V type eclipsing binary (T1 5250 K) with a photometrically-determined mass ratio of M2 / M1 = 0.80. Its spectral color type classifies it as a pre-contact W UMa Binary (PCWB). The Wilson-Devinney Mode 2 solutions show that the system has a detached binary configuration with fill-outs of 94% and 98% for the primary and secondary component, respectively. As expected, the light curve is asymmetric due to spot activity. Three times of minimum light were calculated, for two primary eclipses and one secondary eclipse, from our present observations. In total, some 26 times of minimum light covering nearly 20 years of observation were used to determine linear and quadratic ephemerides. It is noted that the light curve solution remained in a detached state for every iteration of the computer runs. The components are very similar with a computed temperature difference of only 4 K, and the flux of the primary component accounts for 53±55% of the system's light in B, V, Rc, and Ic. A 12-degree radius high latitude white spot (faculae) was iterated on the primary component.
Pashaei, Elnaz; Pashaei, Elham; Aydin, Nizamettin
2018-04-14
In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets. Copyright © 2018 Elsevier Inc. All rights reserved.
Application of texture analysis method for mammogram density classification
NASA Astrophysics Data System (ADS)
Nithya, R.; Santhi, B.
2017-07-01
Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
Senseless acts as a binary switch during sensory organ precursor selection
NASA Technical Reports Server (NTRS)
Jafar-Nejad, Hamed; Acar, Melih; Nolo, Riitta; Lacin, Haluk; Pan, Hongling; Parkhurst, Susan M.; Bellen, Hugo J.
2003-01-01
During sensory organ precursor (SOP) specification, a single cell is selected from a proneural cluster of cells. Here, we present evidence that Senseless (Sens), a zinc-finger transcription factor, plays an important role in this process. We show that Sens is directly activated by proneural proteins in the presumptive SOPs and a few cells surrounding the SOP in most tissues. In the cells that express low levels of Sens, it acts in a DNA-binding-dependent manner to repress transcription of proneural genes. In the presumptive SOPs that express high levels of Sens, it acts as a transcriptional activator and synergizes with proneural proteins. We therefore propose that Sens acts as a binary switch that is fundamental to SOP selection.
Pore-forming activity of clostridial binary toxins.
Knapp, O; Benz, R; Popoff, M R
2016-03-01
Clostridial binary toxins (Clostridium perfringens Iota toxin, Clostridium difficile transferase, Clostridium spiroforme toxin, Clostridium botulinum C2 toxin) as Bacillus binary toxins, including Bacillus anthracis toxins consist of two independent proteins, one being the binding component which mediates the internalization into cell of the intracellularly active component. Clostridial binary toxins induce actin cytoskeleton disorganization through mono-ADP-ribosylation of globular actin and are responsible for enteric diseases. Clostridial and Bacillus binary toxins share structurally and functionally related binding components which recognize specific cell receptors, oligomerize, form pores in endocytic vesicle membrane, and mediate the transport of the enzymatic component into the cytosol. Binding components retain the global structure of pore-forming toxins (PFTs) from the cholesterol-dependent cytotoxin family such as perfringolysin. However, their pore-forming activity notably that of clostridial binding components is more related to that of heptameric PFT family including aerolysin and C. perfringens epsilon toxin. This review focuses upon pore-forming activity of clostridial binary toxins compared to other related PFTs. This article is part of a Special Issue entitled: Pore-Forming Toxins edited by Mauro Dalla Serra and Franco Gambale. Copyright © 2015 Elsevier B.V. All rights reserved.
Genetic programming based ensemble system for microarray data classification.
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
Classifying multispectral data by neural networks
NASA Technical Reports Server (NTRS)
Telfer, Brian A.; Szu, Harold H.; Kiang, Richard K.
1993-01-01
Several energy functions for synthesizing neural networks are tested on 2-D synthetic data and on Landsat-4 Thematic Mapper data. These new energy functions, designed specifically for minimizing misclassification error, in some cases yield significant improvements in classification accuracy over the standard least mean squares energy function. In addition to operating on networks with one output unit per class, a new energy function is tested for binary encoded outputs, which result in smaller network sizes. The Thematic Mapper data (four bands were used) is classified on a single pixel basis, to provide a starting benchmark against which further improvements will be measured. Improvements are underway to make use of both subpixel and superpixel (i.e. contextual or neighborhood) information in tile processing. For single pixel classification, the best neural network result is 78.7 percent, compared with 71.7 percent for a classical nearest neighbor classifier. The 78.7 percent result also improves on several earlier neural network results on this data.
NASA Astrophysics Data System (ADS)
Pinar, Anthony; Masarik, Matthew; Havens, Timothy C.; Burns, Joseph; Thelen, Brian; Becker, John
2015-05-01
This paper explores the effectiveness of an anomaly detection algorithm for downward-looking ground penetrating radar (GPR) and electromagnetic inductance (EMI) data. Threat detection with GPR is challenged by high responses to non-target/clutter objects, leading to a large number of false alarms (FAs), and since the responses of target and clutter signatures are so similar, classifier design is not trivial. We suggest a method based on a Run Packing (RP) algorithm to fuse GPR and EMI data into a composite confidence map to improve detection as measured by the area-under-ROC (NAUC) metric. We examine the value of a multiple kernel learning (MKL) support vector machine (SVM) classifier using image features such as histogram of oriented gradients (HOG), local binary patterns (LBP), and local statistics. Experimental results on government furnished data show that use of our proposed fusion and classification methods improves the NAUC when compared with the results from individual sensors and a single kernel SVM classifier.
Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.
2015-01-01
Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483
Ghosh, Rikhia; Roy, Susmita; Bagchi, Biman
2013-12-12
We carry out a series of long atomistic molecular dynamics simulations to study the unfolding of a small protein, chicken villin headpiece (HP-36), in water-ethanol (EtOH) binary mixture. The prime objective of this work is to explore the sensitivity of protein unfolding dynamics toward increasing concentration of the cosolvent and unravel essential features of intermediates formed in search of a dynamical pathway toward unfolding. In water-ethanol binary mixtures, HP-36 is found to unfold partially, under ambient conditions, that otherwise requires temperature as high as ∼600 K to denature in pure aqueous solvent. However, an interesting course of pathway is observed to be followed in the process, guided by the formation of unique intermediates. The first step of unfolding is essentially the separation of the cluster formed by three hydrophobic (phenylalanine) residues, namely, Phe-7, Phe-11, and Phe-18, which constitute the hydrophobic core, thereby initiating melting of helix-2 of the protein. The initial steps are similar to temperature-induced unfolding as well as chemical unfolding using DMSO as cosolvent. Subsequent unfolding steps follow a unique path. As water-ethanol shows composition-dependent anomalies, so do the details of unfolding dynamics. With an increase in cosolvent concentration, different partially unfolded intermediates are found to be formed. This is reflected in a remarkable nonmonotonic composition dependence of several order parameters, including fraction of native contacts and protein-solvent interaction energy. The emergence of such partially unfolded states can be attributed to the preferential solvation of the hydrophobic residues by the ethyl groups of ethanol. We further quantify the local dynamics of unfolding by using a Marcus-type theory.
Li, Jieyue; Newberg, Justin Y; Uhlén, Mathias; Lundberg, Emma; Murphy, Robert F
2012-01-01
The Human Protein Atlas contains immunofluorescence images showing subcellular locations for thousands of proteins. These are currently annotated by visual inspection. In this paper, we describe automated approaches to analyze the images and their use to improve annotation. We began by training classifiers to recognize the annotated patterns. By ranking proteins according to the confidence of the classifier, we generated a list of proteins that were strong candidates for reexamination. In parallel, we applied hierarchical clustering to group proteins and identified proteins whose annotations were inconsistent with the remainder of the proteins in their cluster. These proteins were reexamined by the original annotators, and a significant fraction had their annotations changed. The results demonstrate that automated approaches can provide an important complement to visual annotation.
Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E
2017-02-03
SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
7TMRmine: a Web server for hierarchical mining of 7TMR proteins
Lu, Guoqing; Wang, Zhifang; Jones, Alan M; Moriyama, Etsuko N
2009-01-01
Background Seven-transmembrane region-containing receptors (7TMRs) play central roles in eukaryotic signal transduction. Due to their biomedical importance, thorough mining of 7TMRs from diverse genomes has been an active target of bioinformatics and pharmacogenomics research. The need for new and accurate 7TMR/GPCR prediction tools is paramount with the accelerated rate of acquisition of diverse sequence information. Currently available and often used protein classification methods (e.g., profile hidden Markov Models) are highly accurate for identifying their membership information among already known 7TMR subfamilies. However, these alignment-based methods are less effective for identifying remote similarities, e.g., identifying proteins from highly divergent or possibly new 7TMR families. In this regard, more sensitive (e.g., alignment-free) methods are needed to complement the existing protein classification methods. A better strategy would be to combine different classifiers, from more specific to more sensitive methods, to identify a broader spectrum of 7TMR protein candidates. Description We developed a Web server, 7TMRmine, by integrating alignment-free and alignment-based classifiers specifically trained to identify candidate 7TMR proteins as well as transmembrane (TM) prediction methods. This new tool enables researchers to easily assess the distribution of GPCR functionality in diverse genomes or individual newly-discovered proteins. 7TMRmine is easily customized and facilitates exploratory analysis of diverse genomes. Users can integrate various alignment-based, alignment-free, and TM-prediction methods in any combination and in any hierarchical order. Sixteen classifiers (including two TM-prediction methods) are available on the 7TMRmine Web server. Not only can the 7TMRmine tool be used for 7TMR mining, but also for general TM-protein analysis. Users can submit protein sequences for analysis, or explore pre-analyzed results for multiple genomes. The server currently includes prediction results and the summary statistics for 68 genomes. Conclusion 7TMRmine facilitates the discovery of 7TMR proteins. By combining prediction results from different classifiers in a multi-level filtering process, prioritized sets of 7TMR candidates can be obtained for further investigation. 7TMRmine can be also used as a general TM-protein classifier. Comparisons of TM and 7TMR protein distributions among 68 genomes revealed interesting differences in evolution of these protein families among major eukaryotic phyla. PMID:19538753
Prediction of plant lncRNA by ensemble machine learning classifiers.
Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian
2018-05-02
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Are Some Pre-Cataclysmic Variables also Post-Cataclysmic Variables?
NASA Astrophysics Data System (ADS)
Sarna, M. J.; Marks, P. B.; Smith, R. C.
1995-10-01
We propose an evolutionary scenario in which post-common-envelope binaries (PCEBs) with secondary component masses between 0.8 Msun and 1.2 M0 start semi-detached evolution almost immediately after the common-envelope (CE) phase. These systems detach due to unstable mass transfer when the secondary develops a thick convective envelope. The duration of the detached phase is a few times 108 yr, depending on the efficiency of magnetic braking and gravitational radiation. We suggest that some of the systems that have been classified as PCEBs may be in this stage of evolution and hence would be more realistically classified as pre-cataclysmic variables (PreCVs). We also propose an observational test based on measurements of the carbon and oxygen isotopic ratios from the infrared CO bands.
Osgood, Ross S; Upham, Brad L; Bushel, Pierre R; Velmurugan, Kalpana; Xiong, Ka-Na; Bauer, Alison K
2017-05-01
Low molecular weight polycyclic aromatic hydrocarbons (LMW PAHs; < 206.3 g/mol) are prevalent and ubiquitous environmental contaminants, presenting a human health concern, and have not been as thoroughly studied as the high MW PAHs. LMW PAHs exert their pulmonary effects, in part, through P38-dependent and -independent mechanisms involving cell-cell communication and the production of pro-inflammatory mediators known to contribute to lung disease. Specifically, we determined the effects of two representative LMW PAHs, 1-methylanthracene (1-MeA) and fluoranthene (Flthn), individually and as a binary PAH mixture on the dysregulation of gap junctional intercellular communication (GJIC) and connexin 43 (Cx43), activation of mitogen activated protein kinases (MAPK), and induction of inflammatory mediators in a mouse non-tumorigenic alveolar type II cell line (C10). Both 1-MeA, Flthn, and the binary PAH mixture of 1-MeA and Flthn dysregulated GJIC in a dose and time-dependent manner, reduced Cx43 protein, and activated the following MAPKs: P38, ERK1/2, and JNK. Inhibition of P38 MAPK prevented PAH-induced dysregulation of GJIC, whereas inhibiting ERK and JNK did not prevent these PAHs from dysregulating GJIC indicating a P38-dependent mechanism. A toxicogenomic approach revealed significant P38-dependent and -independent pathways involved in inflammation, steroid synthesis, metabolism, and oxidative responses. Genes in these pathways were significantly altered by the binary PAH mixture when compared with 1-MeA and Flthn alone suggesting interactive effects. Exposure to the binary PAH mixture induced the production and release of cytokines and metalloproteinases from the C10 cells. Our findings with a binary mixture of PAHs suggest that combinations of LMW PAHs may elicit synergistic or additive inflammatory responses which warrant further investigation and confirmation. © The Author 2017. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Osgood, Ross S.; Upham, Brad L.; Bushel, Pierre R.; Velmurugan, Kalpana; Xiong, Ka-Na
2017-01-01
Abstract Low molecular weight polycyclic aromatic hydrocarbons (LMW PAHs; < 206.3 g/mol) are prevalent and ubiquitous environmental contaminants, presenting a human health concern, and have not been as thoroughly studied as the high MW PAHs. LMW PAHs exert their pulmonary effects, in part, through P38-dependent and -independent mechanisms involving cell-cell communication and the production of pro-inflammatory mediators known to contribute to lung disease. Specifically, we determined the effects of two representative LMW PAHs, 1-methylanthracene (1-MeA) and fluoranthene (Flthn), individually and as a binary PAH mixture on the dysregulation of gap junctional intercellular communication (GJIC) and connexin 43 (Cx43), activation of mitogen activated protein kinases (MAPK), and induction of inflammatory mediators in a mouse non-tumorigenic alveolar type II cell line (C10). Both 1-MeA, Flthn, and the binary PAH mixture of 1-MeA and Flthn dysregulated GJIC in a dose and time-dependent manner, reduced Cx43 protein, and activated the following MAPKs: P38, ERK1/2, and JNK. Inhibition of P38 MAPK prevented PAH-induced dysregulation of GJIC, whereas inhibiting ERK and JNK did not prevent these PAHs from dysregulating GJIC indicating a P38-dependent mechanism. A toxicogenomic approach revealed significant P38-dependent and -independent pathways involved in inflammation, steroid synthesis, metabolism, and oxidative responses. Genes in these pathways were significantly altered by the binary PAH mixture when compared with 1-MeA and Flthn alone suggesting interactive effects. Exposure to the binary PAH mixture induced the production and release of cytokines and metalloproteinases from the C10 cells. Our findings with a binary mixture of PAHs suggest that combinations of LMW PAHs may elicit synergistic or additive inflammatory responses which warrant further investigation and confirmation. PMID:28329830
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco
2017-09-01
Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Karim, Ahmad; Salleh, Rosli; Khan, Muhammad Khurram
2016-01-01
Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS), disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks’ back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps’ detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies. PMID:26978523
Karim, Ahmad; Salleh, Rosli; Khan, Muhammad Khurram
2016-01-01
Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS), disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks' back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps' detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies.
Collins, Mahlon A; An, Jiyan; Hood, Brian L; Conrads, Thomas P; Bowser, Robert P
2015-11-06
Analysis of the cerebrospinal fluid (CSF) proteome has proven valuable to the study of neurodegenerative disorders. To identify new protein/pathway alterations and candidate biomarkers for amyotrophic lateral sclerosis (ALS), we performed comparative proteomic profiling of CSF from sporadic ALS (sALS), healthy control (HC), and other neurological disease (OND) subjects using label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS). A total of 1712 CSF proteins were detected and relatively quantified by spectral counting. Levels of several proteins with diverse biological functions were significantly altered in sALS samples. Enrichment analysis was used to link these alterations to biological pathways, which were predominantly related to inflammation, neuronal activity, and extracellular matrix regulation. We then used our CSF proteomic profiles to create a support vector machines classifier capable of discriminating training set ALS from non-ALS (HC and OND) samples. Four classifier proteins, WD repeat-containing protein 63, amyloid-like protein 1, SPARC-like protein 1, and cell adhesion molecule 3, were identified by feature selection and externally validated. The resultant classifier distinguished ALS from non-ALS samples with 83% sensitivity and 100% specificity in an independent test set. Collectively, our results illustrate the utility of CSF proteomic profiling for identifying ALS protein/pathway alterations and candidate disease biomarkers.
Predicting permanent and transient protein-protein interfaces.
La, David; Kong, Misun; Hoffman, William; Choi, Youn Im; Kihara, Daisuke
2013-05-01
Protein-protein interactions (PPIs) are involved in diverse functions in a cell. To optimize functional roles of interactions, proteins interact with a spectrum of binding affinities. Interactions are conventionally classified into permanent and transient, where the former denotes tight binding between proteins that result in strong complexes, whereas the latter compose of relatively weak interactions that can dissociate after binding to regulate functional activity at specific time point. Knowing the type of interactions has significant implications for understanding the nature and function of PPIs. In this study, we constructed amino acid substitution models that capture mutation patterns at permanent and transient type of protein interfaces, which were found to be different with statistical significance. Using the substitution models, we developed a novel computational method that predicts permanent and transient protein binding interfaces (PBIs) in protein surfaces. Without knowledge of the interacting partner, the method uses a single query protein structure and a multiple sequence alignment of the sequence family. Using a large dataset of permanent and transient proteins, we show that our method, BindML+, performs very well in protein interface classification. A very high area under the curve (AUC) value of 0.957 was observed when predicted protein binding sites were classified. Remarkably, near prefect accuracy was achieved with an AUC of 0.991 when actual binding sites were classified. The developed method will be also useful for protein design of permanent and transient PBIs. Copyright © 2013 Wiley Periodicals, Inc.
Controllable Thermal Rectification Realized in Binary Phase Change Composites
Chen, Renjie; Cui, Yalong; Tian, He; Yao, Ruimin; Liu, Zhenpu; Shu, Yi; Li, Cheng; Yang, Yi; Ren, Tianling; Zhang, Gang; Zou, Ruqiang
2015-01-01
Phase transition is a natural phenomenon happened around our daily life, represented by the process from ice to water. While melting and solidifying at a certain temperature, a high heat of fusion is accompanied, classified as the latent heat. Phase change material (PCM) has been widely applied to store and release large amount of energy attributed to the distinctive thermal behavior. Here, with the help of nanoporous materials, we introduce a general strategy to achieve the binary eicosane/PEG4000 stuffed reduced graphene oxide aerogels, which has two ends with different melting points. It's successfully demonstrated this binary PCM composites exhibits thermal rectification characteristic. Partial phase transitions within porous networks instantaneously result in one end of the thermal conductivity saltation at a critical temperature, and therefore switch on or off the thermal rectification with the coefficient up to 1.23. This value can be further raised by adjusting the loading content of PCM. The uniqueness of this device lies in its performance as a normal thermal conductor at low temperature, only exhibiting rectification phenomenon when temperature is higher than a critical value. The stated technology has broad applications for thermal energy control in macroscopic scale such as energy-efficiency building or nanodevice thermal management. PMID:25748640
A novel probabilistic framework for event-based speech recognition
NASA Astrophysics Data System (ADS)
Juneja, Amit; Espy-Wilson, Carol
2003-10-01
One of the reasons for unsatisfactory performance of the state-of-the-art automatic speech recognition (ASR) systems is the inferior acoustic modeling of low-level acoustic-phonetic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal, but such a system for continuous speech recognition (CSR) is not known to exist. A probabilistic and statistical framework for CSR based on the idea of the representation of speech sounds by bundles of binary valued articulatory phonetic features is proposed. Multiple probabilistic sequences of linguistically motivated landmarks are obtained using binary classifiers of manner phonetic features-syllabic, sonorant and continuant-and the knowledge-based acoustic parameters (APs) that are acoustic correlates of those features. The landmarks are then used for the extraction of knowledge-based APs for source and place phonetic features and their binary classification. Probabilistic landmark sequences are constrained using manner class language models for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge-based systems that led the ASR community to switch to systems highly dependent on statistical pattern analysis methods and probabilistic language or grammar models.
2013-01-01
Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960
Controllable Thermal Rectification Realized in Binary Phase Change Composites
NASA Astrophysics Data System (ADS)
Chen, Renjie; Cui, Yalong; Tian, He; Yao, Ruimin; Liu, Zhenpu; Shu, Yi; Li, Cheng; Yang, Yi; Ren, Tianling; Zhang, Gang; Zou, Ruqiang
2015-03-01
Phase transition is a natural phenomenon happened around our daily life, represented by the process from ice to water. While melting and solidifying at a certain temperature, a high heat of fusion is accompanied, classified as the latent heat. Phase change material (PCM) has been widely applied to store and release large amount of energy attributed to the distinctive thermal behavior. Here, with the help of nanoporous materials, we introduce a general strategy to achieve the binary eicosane/PEG4000 stuffed reduced graphene oxide aerogels, which has two ends with different melting points. It's successfully demonstrated this binary PCM composites exhibits thermal rectification characteristic. Partial phase transitions within porous networks instantaneously result in one end of the thermal conductivity saltation at a critical temperature, and therefore switch on or off the thermal rectification with the coefficient up to 1.23. This value can be further raised by adjusting the loading content of PCM. The uniqueness of this device lies in its performance as a normal thermal conductor at low temperature, only exhibiting rectification phenomenon when temperature is higher than a critical value. The stated technology has broad applications for thermal energy control in macroscopic scale such as energy-efficiency building or nanodevice thermal management.
Structure Defect Property Relationships in Binary Intermetallics
NASA Astrophysics Data System (ADS)
Medasani, Bharat; Ding, Hong; Chen, Wei; Persson, Kristin; Canning, Andrew; Haranczyk, Maciej; Asta, Mark
2015-03-01
Ordered intermetallics are light weight materials with technologically useful high temperature properties such as creep resistance. Knowledge of constitutional and thermal defects is required to understand these properties. Vacancies and antisites are the dominant defects in the intermetallics and their concentrations and formation enthalpies could be computed by using first principles density functional theory and thermodynamic formalisms such as dilute solution method. Previously many properties of the intermetallics such as melting temperatures and formation enthalpies were statistically analyzed for large number of intermetallics using structure maps and data mining approaches. We undertook a similar exercise to establish the dependence of the defect properties in binary intermetallics on the underlying structural and chemical composition. For more than 200 binary intermetallics comprising of AB, AB2 and AB3 structures, we computed the concentrations and formation enthalpies of vacancies and antisites in a small range of stoichiometries deviating from ideal stoichiometry. The calculated defect properties were datamined to gain predictive capabilities of defect properties as well as to classify the intermetallics for their suitability in high-T applications. Supported by the US DOE under Contract No. DEAC02-05CH11231 under the Materials Project Center grant (Award No. EDCBEE).
Gene-specific cell labeling using MiMIC transposons.
Gnerer, Joshua P; Venken, Koen J T; Dierick, Herman A
2015-04-30
Binary expression systems such as GAL4/UAS, LexA/LexAop and QF/QUAS have greatly enhanced the power of Drosophila as a model organism by allowing spatio-temporal manipulation of gene function as well as cell and neural circuit function. Tissue-specific expression of these heterologous transcription factors relies on random transposon integration near enhancers or promoters that drive the binary transcription factor embedded in the transposon. Alternatively, gene-specific promoter elements are directly fused to the binary factor within the transposon followed by random or site-specific integration. However, such insertions do not consistently recapitulate endogenous expression. We used Minos-Mediated Integration Cassette (MiMIC) transposons to convert host loci into reliable gene-specific binary effectors. MiMIC transposons allow recombinase-mediated cassette exchange to modify the transposon content. We developed novel exchange cassettes to convert coding intronic MiMIC insertions into gene-specific binary factor protein-traps. In addition, we expanded the set of binary factor exchange cassettes available for non-coding intronic MiMIC insertions. We show that binary factor conversions of different insertions in the same locus have indistinguishable expression patterns, suggesting that they reliably reflect endogenous gene expression. We show the efficacy and broad applicability of these new tools by dissecting the cellular expression patterns of the Drosophila serotonin receptor gene family. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Compact and Hybrid Feature Description for Building Extraction
NASA Astrophysics Data System (ADS)
Li, Z.; Liu, Y.; Hu, Y.; Li, P.; Ding, Y.
2017-05-01
Building extraction in aerial orthophotos is crucial for various applications. Currently, deep learning has been shown to be successful in addressing building extraction with high accuracy and high robustness. However, quite a large number of samples is required in training a classifier when using deep learning model. In order to realize accurate and semi-interactive labelling, the performance of feature description is crucial, as it has significant effect on the accuracy of classification. In this paper, we bring forward a compact and hybrid feature description method, in order to guarantees desirable classification accuracy of the corners on the building roof contours. The proposed descriptor is a hybrid description of an image patch constructed from 4 sets of binary intensity tests. Experiments show that benefiting from binary description and making full use of color channels, this descriptor is not only computationally frugal, but also accurate than SURF for building extraction.
Oriented modulation for watermarking in direct binary search halftone images.
Guo, Jing-Ming; Su, Chang-Cheng; Liu, Yun-Fu; Lee, Hua; Lee, Jiann-Der
2012-09-01
In this paper, a halftoning-based watermarking method is presented. This method enables high pixel-depth watermark embedding, while maintaining high image quality. This technique is capable of embedding watermarks with pixel depths up to 3 bits without causing prominent degradation to the image quality. To achieve high image quality, the parallel oriented high-efficient direct binary search (DBS) halftoning is selected to be integrated with the proposed orientation modulation (OM) method. The OM method utilizes different halftone texture orientations to carry different watermark data. In the decoder, the least-mean-square-trained filters are applied for feature extraction from watermarked images in the frequency domain, and the naïve Bayes classifier is used to analyze the extracted features and ultimately to decode the watermark data. Experimental results show that the DBS-based OM encoding method maintains a high degree of image quality and realizes the processing efficiency and robustness to be adapted in printing applications.
An algorithm that improves speech intelligibility in noise for normal-hearing listeners.
Kim, Gibak; Lu, Yang; Hu, Yi; Loizou, Philipos C
2009-09-01
Traditional noise-suppression algorithms have been shown to improve speech quality, but not speech intelligibility. Motivated by prior intelligibility studies of speech synthesized using the ideal binary mask, an algorithm is proposed that decomposes the input signal into time-frequency (T-F) units and makes binary decisions, based on a Bayesian classifier, as to whether each T-F unit is dominated by the target or the masker. Speech corrupted at low signal-to-noise ratio (SNR) levels (-5 and 0 dB) using different types of maskers is synthesized by this algorithm and presented to normal-hearing listeners for identification. Results indicated substantial improvements in intelligibility (over 60% points in -5 dB babble) over that attained by human listeners with unprocessed stimuli. The findings from this study suggest that algorithms that can estimate reliably the SNR in each T-F unit can improve speech intelligibility.
HR 7578 - A K dwarf double-lined spectroscopic binary with peculiar abundances
NASA Technical Reports Server (NTRS)
Fekel, F. C., Jr.; Beavers, W. I.
1983-01-01
The number of double-lined K and M dwarf binaries which is currently known is quite small, only a dozen or less of each type. The HR 7578 system was classified as dK5 on the Mount Wilson system and as K2 V on the MK ystem. A summary of radial-velocity measurements including the observatory and weight of each observation is given in a table. The star with the stronger lines has been called component A. The final orbital element solution with all observations appropriately weighted was computed with a differential corrections computer program described by Barker et al. (1967). The program had been modified for the double-lined case. Of particular interest are the very large eccentricity of the system and the large minimum masses for each component. These large minimum masses suggest that eclipses may be detectable despite the relatively long period and small radii of the stars.
Face biometrics with renewable templates
NASA Astrophysics Data System (ADS)
van der Veen, Michiel; Kevenaar, Tom; Schrijen, Geert-Jan; Akkermans, Ton H.; Zuo, Fei
2006-02-01
In recent literature, privacy protection technologies for biometric templates were proposed. Among these is the so-called helper-data system (HDS) based on reliable component selection. In this paper we integrate this approach with face biometrics such that we achieve a system in which the templates are privacy protected, and multiple templates can be derived from the same facial image for the purpose of template renewability. Extracting binary feature vectors forms an essential step in this process. Using the FERET and Caltech databases, we show that this quantization step does not significantly degrade the classification performance compared to, for example, traditional correlation-based classifiers. The binary feature vectors are integrated in the HDS leading to a privacy protected facial recognition algorithm with acceptable FAR and FRR, provided that the intra-class variation is sufficiently small. This suggests that a controlled enrollment procedure with a sufficient number of enrollment measurements is required.
Entropy coders for image compression based on binary forward classification
NASA Astrophysics Data System (ADS)
Yoo, Hoon; Jeong, Jechang
2000-12-01
Entropy coders as a noiseless compression method are widely used as final step compression for images, and there have been many contributions to increase of entropy coder performance and to reduction of entropy coder complexity. In this paper, we propose some entropy coders based on the binary forward classification (BFC). The BFC requires overhead of classification but there is no change between the amount of input information and the total amount of classified output information, which we prove this property in this paper. And using the proved property, we propose entropy coders that are the BFC followed by Golomb-Rice coders (BFC+GR) and the BFC followed by arithmetic coders (BFC+A). The proposed entropy coders introduce negligible additional complexity due to the BFC. Simulation results also show better performance than other entropy coders that have similar complexity to the proposed coders.
VizieR Online Data Catalog: Dwarf novae outbursts properties (Otulakowska-Hypka+, 2016)
NASA Astrophysics Data System (ADS)
Otulakowska-Hypka, M.; Olech, A.; Patterson, J.
2017-11-01
In this study, we used the following available catalogue data sources. The catalogue and atlas of CVs (https://archive.stsci.edu/prepds/cvcat/) by Downes et al. (2001PASP..113..764D, Cat. V/123) which contains 1830 objects that have been classified as a CV before 2006 February 1, when the catalogue was frozen. Catalogue of cataclysmic binaries, low-mass X-ray binaries and related objects (http://www.mpa-garching.mpg.de/RKcat/) by Ritter & Kolb (2003A&A...404..301R, Cat. B/cb). Although the reference corresponds to a catalogue which is over 10yr old, its newest edition 7.21 (2013 December 31) has been used in this study. This catalogue contains 1094 CVs. Catalogue of J. Patterson, that is the supplementary electronic material to the publication Patterson (2011) containing properties of 292 non-magnetic CVs with orbital periods smaller than 3h (http://cbastro.org/dwarfnovashort/) (1 data file).
Four New Binary Stars in the Field of CL Aurigae. II
NASA Astrophysics Data System (ADS)
Kim, Chun-Hwey; Lee, Jae Woo; Duck, Hyun Kim; Andronov, Ivan L.
2010-12-01
We report on a discovery of four new variable stars (USNO-B1.0 1234-0103195, 1235- 0097170, 1236-0100293 and 1236-0100092) in the field of CL Aur. The stars are classified as eclipsing binary stars with orbital periods of 0.5137413(23) (EW type), 0.8698365(26) (EA) and 4.0055842(40) (EA with a significant orbital eccentricity), respectively. The fourth star (USNO-B1.0 1236-0100092) showed only one partial ascending branch of the light curves, although 22 nights were covered at the 61-cm telescope at the Sobaeksan Optical Astronomy Observatory (SOAO) in Korea. Fourteen minima timings for these stars are published separately. In an addition to the original discovery paper (Kim et al. 2010), we discuss methodological problems and present results of mathematical modeling of the light curves using other methods, i.e. trigonometric polynomial fits and the newly developed fit "NAV" ("New Algol Variable").
NASA Astrophysics Data System (ADS)
Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin
2010-12-01
We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
A learning framework for age rank estimation based on face images with scattering transform.
Chang, Kuang-Yu; Chen, Chu-Song
2015-03-01
This paper presents a cost-sensitive ordinal hyperplanes ranking algorithm for human age estimation based on face images. The proposed approach exploits relative-order information among the age labels for rank prediction. In our approach, the age rank is obtained by aggregating a series of binary classification results, where cost sensitivities among the labels are introduced to improve the aggregating performance. In addition, we give a theoretical analysis on designing the cost of individual binary classifier so that the misranking cost can be bounded by the total misclassification costs. An efficient descriptor, scattering transform, which scatters the Gabor coefficients and pooled with Gaussian smoothing in multiple layers, is evaluated for facial feature extraction. We show that this descriptor is a generalization of conventional bioinspired features and is more effective for face-based age inference. Experimental results demonstrate that our method outperforms the state-of-the-art age estimation approaches.
Espinoza-Herrera, Shirly J; Gaur, Vineet; Suo, Zucai; Carey, Paul R
2013-07-23
Y-Family DNA polymerases are known to bypass DNA lesions in vitro and in vivo. Sulfolobus solfataricus DNA polymerase (Dpo4) was chosen as a model Y-family enzyme for investigating the mechanism of DNA synthesis in single crystals. Crystals of Dpo4 in complexes with DNA (the binary complex) in the presence or absence of an incoming nucleotide were analyzed by Raman microscopy. (13)C- and (15)N-labeled d*CTP, or unlabeled dCTP, were soaked into the binary crystals with G as the templating base. In the presence of the catalytic metal ions, Mg(2+) and Mn(2+), nucleotide incorporation was detected by the disappearance of the triphosphate band of dCTP and the retention of *C modes in the crystal following soaking out of noncovalently bound C(or *C)TP. The addition of the second coded base, thymine, was observed by adding cognate dTTP to the crystal following a single d*CTP addition. Adding these two bases caused visible damage to the crystal that was possibly caused by protein and/or DNA conformational change within the crystal. When d*CTP is soaked into the Dpo4 crystal in the absence of Mn(2+) or Mg(2+), the primer extension reaction did not occur; instead, a ternary protein·template·d*CTP complex was formed. In the Raman difference spectra of both binary and ternary complexes, in addition to the modes of d(*C)CTP, features caused by ring modes from the template/primer bases being perturbed and from the DNA backbone appear, as well as features from perturbed peptide and amino acid side chain modes. These effects are more pronounced in the ternary complex than in the binary complex. Using standardized Raman intensities followed as a function of time, the C(*C)TP population in the crystal was maximal at ∼20 min. These remained unchanged in the ternary complex but declined in the binary complexes as chain incorporation occurred.
Li, Jian-Feng; Bush, Jenifer; Xiong, Yan; Li, Lei; McCormack, Matthew
2011-01-01
Protein-protein interactions (PPIs) constitute the regulatory network that coordinates diverse cellular functions. There are growing needs in plant research for creating protein interaction maps behind complex cellular processes and at a systems biology level. However, only a few approaches have been successfully used for large-scale surveys of PPIs in plants, each having advantages and disadvantages. Here we present split firefly luciferase complementation (SFLC) as a highly sensitive and noninvasive technique for in planta PPI investigation. In this assay, the separate halves of a firefly luciferase can come into close proximity and transiently restore its catalytic activity only when their fusion partners, namely the two proteins of interest, interact with each other. This assay was conferred with quantitativeness and high throughput potential when the Arabidopsis mesophyll protoplast system and a microplate luminometer were employed for protein expression and luciferase measurement, respectively. Using the SFLC assay, we could monitor the dynamics of rapamycin-induced and ascomycin-disrupted interaction between Arabidopsis FRB and human FKBP proteins in a near real-time manner. As a proof of concept for large-scale PPI survey, we further applied the SFLC assay to testing 132 binary PPIs among 8 auxin response factors (ARFs) and 12 Aux/IAA proteins from Arabidopsis. Our results demonstrated that the SFLC assay is ideal for in vivo quantitative PPI analysis in plant cells and is particularly powerful for large-scale binary PPI screens.
NASA Astrophysics Data System (ADS)
To, Cuong; Pham, Tuan D.
2010-01-01
In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.
Characterization of binary string statistics for syntactic landmine detection
NASA Astrophysics Data System (ADS)
Nasif, Ahmed O.; Mark, Brian L.; Hintz, Kenneth J.
2011-06-01
Syntactic landmine detection has been proposed to detect and classify non-metallic landmines using ground penetrating radar (GPR). In this approach, the GPR return is processed to extract characteristic binary strings for landmine and clutter discrimination. In our previous work, we discussed the preprocessing methodology by which the amplitude information of the GPR A-scan signal can be effectively converted into binary strings, which identify the impedance discontinuities in the signal. In this work, we study the statistical properties of the binary string space. In particular, we develop a Markov chain model to characterize the observed bit sequence of the binary strings. The state is defined as the number of consecutive zeros between two ones in the binarized A-scans. Since the strings are highly sparse (the number of zeros is much greater than the number of ones), defining the state this way leads to fewer number of states compared to the case where each bit is defined as a state. The number of total states is further reduced by quantizing the number of consecutive zeros. In order to identify the correct order of the Markov model, the mean square difference (MSD) between the transition matrices of mine strings and non-mine strings is calculated up to order four using training data. The results show that order one or two maximizes this MSD. The specification of the transition probabilities of the chain can be used to compute the likelihood of any given string. Such a model can be used to identify characteristic landmine strings during the training phase. These developments on modeling and characterizing the string statistics can potentially be part of a real-time landmine detection algorithm that identifies landmine and clutter in an adaptive fashion.
Jenkins, Martin
2016-01-01
Objective. In clinical trials of RA, it is common to assess effectiveness using end points based upon dichotomized continuous measures of disease activity, which classify patients as responders or non-responders. Although dichotomization generally loses statistical power, there are good clinical reasons to use these end points; for example, to allow for patients receiving rescue therapy to be assigned as non-responders. We adopt a statistical technique called the augmented binary method to make better use of the information provided by these continuous measures and account for how close patients were to being responders. Methods. We adapted the augmented binary method for use in RA clinical trials. We used a previously published randomized controlled trial (Oral SyK Inhibition in Rheumatoid Arthritis-1) to assess its performance in comparison to a standard method treating patients purely as responders or non-responders. The power and error rate were investigated by sampling from this study. Results. The augmented binary method reached similar conclusions to standard analysis methods but was able to estimate the difference in response rates to a higher degree of precision. Results suggested that CI widths for ACR responder end points could be reduced by at least 15%, which could equate to reducing the sample size of a study by 29% to achieve the same statistical power. For other end points, the gain was even higher. Type I error rates were not inflated. Conclusion. The augmented binary method shows considerable promise for RA trials, making more efficient use of patient data whilst still reporting outcomes in terms of recognized response end points. PMID:27338084
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laycock, Silas; Cappallo, Rigel; Williams, Benjamin F.
We have monitored the Cassiopeia dwarf galaxy (IC 10) in a series of 10 Chandra ACIS-S observations to capture its variable and transient X-ray source population, which is expected to be dominated by High Mass X-ray Binaries (HMXBs). We present a sample of 21 X-ray sources that are variable between observations at the 3 σ level, from a catalog of 110 unique point sources. We find four transients (flux variability ratio greater than 10) and a further eight objects with ratios >5. The observations span the years 2003–2010 and reach a limiting luminosity of >10{sup 35} erg s{sup −1}, providingmore » sensitivity to X-ray binaries in IC 10 as well as flare stars in the foreground Milky Way. The nature of the variable sources is investigated from light curves, X-ray spectra, energy quantiles, and optical counterparts. The purpose of this study is to discover the composition of the X-ray binary population in a young starburst environment. IC 10 provides a sharp contrast in stellar population age (<10 My) when compared to the Magellanic Clouds (40–200 My) where most of the known HMXBs reside. We find 10 strong HMXB candidates, 2 probable background Active Galactic Nuclei, 4 foreground flare-stars or active binaries, and 5 not yet classifiable sources. Complete classification of the sample requires optical spectroscopy for radial velocity analysis and deeper X-ray observations to obtain higher S/N spectra and search for pulsations. A catalog and supporting data set are provided.« less
On the Rate and on the Gravitational Wave Emission of Short and Long GRBs
NASA Astrophysics Data System (ADS)
Ruffini, R.; Rodriguez, J.; Muccino, M.; Rueda, J. A.; Aimuratov, Y.; Barres de Almeida, U.; Becerra, L.; Bianco, C. L.; Cherubini, C.; Filippi, S.; Gizzi, D.; Kovacevic, M.; Moradi, R.; Oliveira, F. G.; Pisani, G. B.; Wang, Y.
2018-05-01
On the ground of the large number of gamma-ray bursts (GRBs) detected with cosmological redshift, we classified GRBs in seven subclasses, all with binary progenitors which emit gravitational waves (GWs). Each binary is composed of combinations of carbon–oxygen cores (COcore), neutron stars (NSs), black holes (BHs), and white dwarfs (WDs). The long bursts, traditionally assumed to originate from a BH with an ultrarelativistic jetted emission, not emitting GWs, have been subclassified as (I) X-ray flashes (XRFs), (II) binary-driven hypernovae (BdHNe), and (III) BH–supernovae (BH–SNe). They are framed within the induced gravitational collapse paradigm with a progenitor COcore–NS/BH binary. The SN explosion of the COcore triggers an accretion process onto the NS/BH. If the accretion does not lead the NS to its critical mass, an XRF occurs, while when the BH is present or formed by accretion, a BdHN occurs. When the binaries are not disrupted, XRFs lead to NS–NS and BdHNe lead to NS–BH. The short bursts, originating in NS–NS, are subclassified as (IV) short gamma-ray flashes (S-GRFs) and (V) short GRBs (S-GRBs), the latter when a BH is formed. There are (VI) ultrashort GRBs (U-GRBs) and (VII) gamma-ray flashes (GRFs) formed in NS–BH and NS–WD, respectively. We use the occurrence rate and GW emission of these subclasses to assess their detectability by Advanced LIGO-Virgo, eLISA, and resonant bars. We discuss the consequences of our results in view of the announcement of the LIGO/Virgo Collaboration of the source GW 170817 as being originated by an NS–NS.
Hussain, Shaista; Basu, Arindam
2016-01-01
The development of power-efficient neuromorphic devices presents the challenge of designing spike pattern classification algorithms which can be implemented on low-precision hardware and can also achieve state-of-the-art performance. In our pursuit of meeting this challenge, we present a pattern classification model which uses a sparse connection matrix and exploits the mechanism of nonlinear dendritic processing to achieve high classification accuracy. A rate-based structural learning rule for multiclass classification is proposed which modifies a connectivity matrix of binary synaptic connections by choosing the best “k” out of “d” inputs to make connections on every dendritic branch (k < < d). Because learning only modifies connectivity, the model is well suited for implementation in neuromorphic systems using address-event representation (AER). We develop an ensemble method which combines several dendritic classifiers to achieve enhanced generalization over individual classifiers. We have two major findings: (1) Our results demonstrate that an ensemble created with classifiers comprising moderate number of dendrites performs better than both ensembles of perceptrons and of complex dendritic trees. (2) In order to determine the moderate number of dendrites required for a specific classification problem, a two-step solution is proposed. First, an adaptive approach is proposed which scales the relative size of the dendritic trees of neurons for each class. It works by progressively adding dendrites with fixed number of synapses to the network, thereby allocating synaptic resources as per the complexity of the given problem. As a second step, theoretical capacity calculations are used to convert each neuronal dendritic tree to its optimal topology where dendrites of each class are assigned different number of synapses. The performance of the model is evaluated on classification of handwritten digits from the benchmark MNIST dataset and compared with other spike classifiers. We show that our system can achieve classification accuracy within 1 − 2% of other reported spike-based classifiers while using much less synaptic resources (only 7%) compared to that used by other methods. Further, an ensemble classifier created with adaptively learned sizes can attain accuracy of 96.4% which is at par with the best reported performance of spike-based classifiers. Moreover, the proposed method achieves this by using about 20% of the synapses used by other spike algorithms. We also present results of applying our algorithm to classify the MNIST-DVS dataset collected from a real spike-based image sensor and show results comparable to the best reported ones (88.1% accuracy). For VLSI implementations, we show that the reduced synaptic memory can save upto 4X area compared to conventional crossbar topologies. Finally, we also present a biologically realistic spike-based version for calculating the correlations required by the structural learning rule and demonstrate the correspondence between the rate-based and spike-based methods of learning. PMID:27065782
Mining sequential patterns for protein fold recognition.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2008-02-01
Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Koua, Dominique; Kuhn-Nentwig, Lucia
2017-01-01
Spider venoms are rich cocktails of bioactive peptides, proteins, and enzymes that are being intensively investigated over the years. In order to provide a better comprehension of that richness, we propose a three-level family classification system for spider venom components. This classification is supported by an exhaustive set of 219 new profile hidden Markov models (HMMs) able to attribute a given peptide to its precise peptide type, family, and group. The proposed classification has the advantages of being totally independent from variable spider taxonomic names and can easily evolve. In addition to the new classifiers, we introduce and demonstrate the efficiency of hmmcompete, a new standalone tool that monitors HMM-based family classification and, after post-processing the result, reports the best classifier when multiple models produce significant scores towards given peptide queries. The combined used of hmmcompete and the new spider venom component-specific classifiers demonstrated 96% sensitivity to properly classify all known spider toxins from the UniProtKB database. These tools are timely regarding the important classification needs caused by the increasing number of peptides and proteins generated by transcriptomic projects. PMID:28786958
2012-01-01
Background Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC), where all classes are identified simultaneously, and one-versus-all (OVA), where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer), while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS) approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity. Results We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN), and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV) of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV) and OSC approaches (0.76 PPV). Conclusions Use of the CAS strategy increases the PPV for a multi-category classification system over two common alternative strategies. In classification problems such as histopathology, where multiple class groups exist with varying degrees of heterogeneity, the CAS system can intelligently assign class labels to objects by performing multiple binary classifications according to domain knowledge. PMID:23110677
Ghosh, Soumadwip; Dey, Souvik; Patel, Mahendra; Chakrabarti, Rajarshi
2017-03-15
The folding/unfolding equilibrium of proteins in aqueous medium can be altered by adding small organic molecules generally termed as co-solvents. Denaturants such as urea are instrumental in the unfolding of proteins while protecting osmolytes favour the folded ensemble. Recently, room temperature ionic liquids (ILs) have been shown to counteract the deleterious effect of urea on proteins. In this paper, using atomistic molecular dynamics we show that a ternary mixture containing a particular ammonium-based IL, triethylammonium acetate (TEAA), and urea (in 1 : 5 molar ratio) helps a small 15-residue S-peptide analogue regain most of its native structure, whereas a binary aqueous mixture containing a large amount of urea alone completely distorts it. Our simulations show that the denaturant urea directly interacts with the peptide backbone in the binary mixture while for the ternary mixture both urea as well as the IL are preferentially excluded from the peptide surface.
Field, Nicholas; Konstantinidis, Spyridon; Velayudhan, Ajoy
2017-08-11
The combination of multi-well plates and automated liquid handling is well suited to the rapid measurement of the adsorption isotherms of proteins. Here, single and binary adsorption isotherms are reported for BSA, ovalbumin and conalbumin on a strong anion exchanger over a range of pH and salt levels. The impact of the main experimental factors at play on the accuracy and precision of the adsorbed protein concentrations is quantified theoretically and experimentally. In addition to the standard measurement of liquid concentrations before and after adsorption, the amounts eluted from the wells are measured directly. This additional measurement corroborates the calculation based on liquid concentration data, and improves precision especially under conditions of weak or moderate interaction strength. The traditional measurement of multicomponent isotherms is limited by the speed of HPLC analysis; this analytical bottleneck is alleviated by careful multivariate analysis of UV spectra. Copyright © 2017. Published by Elsevier B.V.
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.
Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene
2011-01-01
To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.
K2 Variable Catalogue: Variable stars and eclipsing binaries in K2 campaigns 1 and 0
NASA Astrophysics Data System (ADS)
Armstrong, D. J.; Kirk, J.; Lam, K. W. F.; McCormac, J.; Walker, S. R.; Brown, D. J. A.; Osborn, H. P.; Pollacco, D. L.; Spake, J.
2015-07-01
Aims: We have created a catalogue of variable stars found from a search of the publicly available K2 mission data from Campaigns 1 and 0. This catalogue provides the identifiers of 8395 variable stars, including 199 candidate eclipsing binaries with periods up to 60 d and 3871 periodic or quasi-periodic objects, with periods up to 20 d for Campaign 1 and 15 d for Campaign 0. Methods: Lightcurves are extracted and detrended from the available data. These are searched using a combination of algorithmic and human classification, leading to a classifier for each object as an eclipsing binary, sinusoidal periodic, quasi periodic, or aperiodic variable. The source of the variability is not identified, but could arise in the non-eclipsing binary cases from pulsation or stellar activity. Each object is cross-matched against variable star related guest observer proposals to the K2 mission, which specifies the variable type in some cases. The detrended lightcurves are also compared to lightcurves currently publicly available. Results: The resulting catalogue gives the ID, type, period, semi-amplitude, and range of the variation seen. We also make available the detrended lightcurves for each object. The catalogue is available at http://deneb.astro.warwick.ac.uk/phrlbj/k2varcat/ and at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/579/A19
Engraving Print Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoelck, Daniel; Barbe, Joaquim
2008-04-15
A print is a mark, or drawing, made in or upon a plate, stone, woodblock or other material which is cover with ink and then is press usually into a paper reproducing the image on the paper. Engraving prints usually are image composed of a group of binary lines, specially those are made with relief and intaglio techniques. Varying the number and the orientation of lines, the drawing of the engraving print is conformed. For this reason we propose an application based on image processing methods to classify engraving prints.
Classification of Automated Search Traffic
NASA Astrophysics Data System (ADS)
Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.
As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.
Dong, Wei-Feng; Canil, Sarah; Lai, Raymond; Morel, Didier; Swanson, Paul E.; Izevbaye, Iyare
2018-01-01
A new automated MYC IHC classifier based on bivariate logistic regression is presented. The predictor relies on image analysis developed with the open-source ImageJ platform. From a histologic section immunostained for MYC protein, 2 dimensionless quantitative variables are extracted: (a) relative distance between nuclei positive for MYC IHC based on euclidean minimum spanning tree graph and (b) coefficient of variation of the MYC IHC stain intensity among MYC IHC-positive nuclei. Distance between positive nuclei is suggested to inversely correlate MYC gene rearrangement status, whereas coefficient of variation is suggested to inversely correlate physiological regulation of MYC protein expression. The bivariate classifier was compared with 2 other MYC IHC classifiers (based on percentage of MYC IHC positive nuclei), all tested on 113 lymphomas including mostly diffuse large B-cell lymphomas with known MYC fluorescent in situ hybridization (FISH) status. The bivariate classifier strongly outperformed the “percentage of MYC IHC-positive nuclei” methods to predict MYC+ FISH status with 100% sensitivity (95% confidence interval, 94-100) associated with 80% specificity. The test is rapidly performed and might at a minimum provide primary IHC screening for MYC gene rearrangement status in diffuse large B-cell lymphomas. Furthermore, as this bivariate classifier actually predicts “permanent overexpressed MYC protein status,” it might identify nontranslocation-related chromosomal anomalies missed by FISH. PMID:27093450
USDA-ARS?s Scientific Manuscript database
MOCASSIN-prot is a software, implemented in Perl and Matlab, for constructing protein similarity networks to classify proteins. Both domain composition and quantitative sequence similarity information are utilized in constructing the directed protein similarity networks. For each reference protein i...
The Kirkwood-Buff theory of solutions and the local composition of liquid mixtures.
Shulgin, Ivan L; Ruckenstein, Eli
2006-06-29
The present paper is devoted to the local composition of liquid mixtures calculated in the framework of the Kirkwood-Buff theory of solutions. A new method is suggested to calculate the excess (or deficit) number of various molecules around a selected (central) molecule in binary and multicomponent liquid mixtures in terms of measurable macroscopic thermodynamic quantities, such as the derivatives of the chemical potentials with respect to concentrations, the isothermal compressibility, and the partial molar volumes. This method accounts for an inaccessible volume due to the presence of a central molecule and is applied to binary and ternary mixtures. For the ideal binary mixture it is shown that because of the difference in the volumes of the pure components there is an excess (or deficit) number of different molecules around a central molecule. The excess (or deficit) becomes zero when the components of the ideal binary mixture have the same volume. The new method is also applied to methanol + water and 2-propanol + water mixtures. In the case of the 2-propanol + water mixture, the new method, in contrast to the other ones, indicates that clusters dominated by 2-propanol disappear at high alcohol mole fractions, in agreement with experimental observations. Finally, it is shown that the application of the new procedure to the ternary mixture water/protein/cosolvent at infinite dilution of the protein led to almost the same results as the methods involving a reference state.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hinerman, Jennifer M.; Dignam, J. David; Mueser, Timothy C.
2012-04-05
The bacteriophage T4 gp59 helicase assembly protein (gp59) is required for loading of gp41 replicative helicase onto DNA protected by gp32 single-stranded DNA-binding protein. The gp59 protein recognizes branched DNA structures found at replication and recombination sites. Binding of gp32 protein (full-length and deletion constructs) to gp59 protein measured by isothermal titration calorimetry demonstrates that the gp32 protein C-terminal A-domain is essential for protein-protein interaction in the absence of DNA. Sedimentation velocity experiments with gp59 protein and gp32ΔB protein (an N-terminal B-domain deletion) show that these proteins are monomers but form a 1:1 complex with a dissociation constant comparable withmore » that determined by isothermal titration calorimetry. Small angle x-ray scattering (SAXS) studies indicate that the gp59 protein is a prolate monomer, consistent with the crystal structure and hydrodynamic properties determined from sedimentation velocity experiments. SAXS experiments also demonstrate that gp32ΔB protein is a prolate monomer with an elongated A-domain protruding from the core. Moreover, fitting structures of gp59 protein and the gp32 core into the SAXS-derived molecular envelope supports a model for the gp59 protein-gp32ΔB protein complex. Our earlier work demonstrated that gp59 protein attracts full-length gp32 protein to pseudo-Y junctions. A model of the gp59 protein-DNA complex, modified to accommodate new SAXS data for the binary complex together with mutational analysis of gp59 protein, is presented in the accompanying article (Dolezal, D., Jones, C. E., Lai, X., Brister, J. R., Mueser, T. C., Nossal, N. G., and Hinton, D. M. (2012) J. Biol. Chem. 287, 18596–18607).« less
Geomorphic Flood Area (GFA): a QGIS tool for a cost-effective delineation of the floodplains
NASA Astrophysics Data System (ADS)
Samela, Caterina; Albano, Raffaele; Sole, Aurelia; Manfreda, Salvatore
2017-04-01
The importance of delineating flood hazard and risk areas at a global scale has been highlighted for many years. However, its complete achievement regularly encounters practical difficulties, above all the lack of data and implementation costs. In conditions of scarce data availability (e.g. ungauged basins, large-scale analyses), a fast and cost-effective floodplain delineation can be carried out using geomorphic methods (e.g., Manfreda et al., 2011; 2014). In particular, an automatic DEM-based procedure has been implemented in an open-source QGIS plugin named Geomorphic Flood Area - tool (GFA - tool). This tool performs a linear binary classification based on the recently proposed Geomorphic Flood Index (GFI), which exhibited high classification accuracy and reliability in several test sites located in Europe, United States and Africa (Manfreda et al., 2015; Samela et al., 2016, 2017; Samela, 2016). The GFA - tool is designed to make available to all users the proposed procedure, that includes a number of operations requiring good geomorphic and GIS competences. It allows computing the GFI through terrain analysis, turning it into a binary classifier, and training it on the base of a standard inundation map derived for a portion of the river basin (a minimum of 2% of the river basin's area is suggested) using detailed methods of analysis (e.g. flood hazard maps produced by emergency management agencies or river basin authorities). Finally, GFA - tool allows to extend the classification outside the calibration area to delineate the flood-prone areas across the entire river basin. The full analysis has been implemented in this plugin with a user-friendly interface that should make it easy to all user to apply the approach and produce the desired results. Keywords: flood susceptibility; data scarce environments; geomorphic flood index; linear binary classification; Digital elevation models (DEMs). References Manfreda, S., Di Leo, M., Sole, A., (2011). Detection of Flood Prone Areas using Digital Elevation Models, Journal of Hydrologic Engineering, 16(10), 781-790. Manfreda, S., Nardi, F., Samela, C., Grimaldi, S., Taramasso, A. C., Roth, G., & Sole, A. (2014). Investigation on the Use of Geomorphic Approaches for the Delineation of Flood Prone Areas, Journal of Hydrology, 517, 863-876. Manfreda, S., Samela, C., Gioia, A., Consoli, G., Iacobellis, V., Giuzio, L., & Sole, A. (2015). Flood-prone areas assessment using linear binary classifiers based on flood maps obtained from 1D and 2D hydraulic models. Natural Hazards, Vol. 79 (2), pp 735-754. Samela, C. (2016), 100-year flood susceptibility maps for the continental U.S. derived with a geomorphic method. University of Basilicata. Dataset. Samela, C., Manfreda, S., Paola, F. D., Giugni, M., Sole, A., & Fiorentino, M. (2016). DEM-Based Approaches for the Delineation of Flood-Prone Areas in an Ungauged Basin in Africa. Journal of Hydrologic Engineering, 21(2), 1-10. Samela, C., Troy, T.J., Manfreda, S. (2017). Geomorphic classifiers for flood-prone areas delineation for data-scarce environments, Advances in Water Resources (under review).
Alves, Cibele C O; Franca, Adriana S; Oliveira, Leandro S
2013-01-01
Adsorption of phenolic amino acids, such as phenylalanine and tyrosine, is quite relevant for the production of protein hydrolysates used as dietary formulations for patients suffering from congenital disorders of amino acid metabolism, such as phenylketonuria. In this study, an adsorbent prepared from corn cobs was evaluated for the removal of tyrosine (Tyr) from both a single component solution and a binary aqueous solution with phenylalanine (Phe). The adsorption behavior of tyrosine was similar to that of phenylalanine in single component solutions, however, with a much lower adsorption capacity (14 mg g(-1) for Tyr compared to 109 mg g(-1) for Phe). Tyr adsorption kinetics was satisfactorily described by a pseudosecond-order model as it was for Phe. In adsorption equilibrium studies for binary mixtures, the presence of Tyr in Phe solutions favored Phe faster adsorption whereas the opposite behavior was observed for the presence of Phe in Tyr solutions. Such results indicate that, in binary systems, Phe will be adsorbed preferably to Tyr, and this is a welcome feature when employing the prepared adsorbent for the removal of Phe from protein hydrolysates to be used in dietary formulations for phenylketonuria treatment.
Saini, Harsh; Lal, Sunil Pranit; Naidu, Vimal Vikash; Pickering, Vincel Wince; Singh, Gurmeet; Tsunoda, Tatsuhiko; Sharma, Alok
2016-12-05
High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs). PMID:26985826
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).
Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions
Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret
2009-01-01
Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-09
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-01
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly. PMID:25572661
NASA Astrophysics Data System (ADS)
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-01
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
Neural-network classifiers for automatic real-world aerial image recognition
NASA Astrophysics Data System (ADS)
Greenberg, Shlomo; Guterman, Hugo
1996-08-01
We describe the application of the multilayer perceptron (MLP) network and a version of the adaptive resonance theory version 2-A (ART 2-A) network to the problem of automatic aerial image recognition (AAIR). The classification of aerial images, independent of their positions and orientations, is required for automatic tracking and target recognition. Invariance is achieved by the use of different invariant feature spaces in combination with supervised and unsupervised neural networks. The performance of neural-network-based classifiers in conjunction with several types of invariant AAIR global features, such as the Fourier-transform space, Zernike moments, central moments, and polar transforms, are examined. The advantages of this approach are discussed. The performance of the MLP network is compared with that of a classical correlator. The MLP neural-network correlator outperformed the binary phase-only filter (BPOF) correlator. It was found that the ART 2-A distinguished itself with its speed and its low number of required training vectors. However, only the MLP classifier was able to deal with a combination of shift and rotation geometric distortions.
Neural-network classifiers for automatic real-world aerial image recognition.
Greenberg, S; Guterman, H
1996-08-10
We describe the application of the multilayer perceptron (MLP) network and a version of the adaptive resonance theory version 2-A (ART 2-A) network to the problem of automatic aerial image recognition (AAIR). The classification of aerial images, independent of their positions and orientations, is required for automatic tracking and target recognition. Invariance is achieved by the use of different invariant feature spaces in combination with supervised and unsupervised neural networks. The performance of neural-network-based classifiers in conjunction with several types of invariant AAIR global features, such as the Fourier-transform space, Zernike moments, central moments, and polar transforms, are examined. The advantages of this approach are discussed. The performance of the MLP network is compared with that of a classical correlator. The MLP neural-network correlator outperformed the binary phase-only filter (BPOF) correlator. It was found that the ART 2-A distinguished itself with its speed and its low number of required training vectors. However, only the MLP classifier was able to deal with a combination of shift and rotation geometric distortions.
Designing boosting ensemble of relational fuzzy systems.
Scherer, Rafał
2010-10-01
A method frequently used in classification systems for improving classification accuracy is to combine outputs of several classifiers. Among various types of classifiers, fuzzy ones are tempting because of using intelligible fuzzy if-then rules. In the paper we build an AdaBoost ensemble of relational neuro-fuzzy classifiers. Relational fuzzy systems bond input and output fuzzy linguistic values by a binary relation; thus, fuzzy rules have additional, comparing to traditional fuzzy systems, weights - elements of a fuzzy relation matrix. Thanks to this the system is better adjustable to data during learning. In the paper an ensemble of relational fuzzy systems is proposed. The problem is that such an ensemble contains separate rule bases which cannot be directly merged. As systems are separate, we cannot treat fuzzy rules coming from different systems as rules from the same (single) system. In the paper, the problem is addressed by a novel design of fuzzy systems constituting the ensemble, resulting in normalization of individual rule bases during learning. The method described in the paper is tested on several known benchmarks and compared with other machine learning solutions from the literature.
Novel Approach to Classify Plants Based on Metabolite-Content Similarity.
Liu, Kang; Abdullah, Azian Azamimi; Huang, Ming; Nishioka, Takaaki; Altaf-Ul-Amin, Md; Kanaya, Shigehiko
2017-01-01
Secondary metabolites are bioactive substances with diverse chemical structures. Depending on the ecological environment within which they are living, higher plants use different combinations of secondary metabolites for adaptation (e.g., defense against attacks by herbivores or pathogenic microbes). This suggests that the similarity in metabolite content is applicable to assess phylogenic similarity of higher plants. However, such a chemical taxonomic approach has limitations of incomplete metabolomics data. We propose an approach for successfully classifying 216 plants based on their known incomplete metabolite content. Structurally similar metabolites have been clustered using the network clustering algorithm DPClus. Plants have been represented as binary vectors, implying relations with structurally similar metabolite groups, and classified using Ward's method of hierarchical clustering. Despite incomplete data, the resulting plant clusters are consistent with the known evolutional relations of plants. This finding reveals the significance of metabolite content as a taxonomic marker. We also discuss the predictive power of metabolite content in exploring nutritional and medicinal properties in plants. As a byproduct of our analysis, we could predict some currently unknown species-metabolite relations.
Novel Approach to Classify Plants Based on Metabolite-Content Similarity
Abdullah, Azian Azamimi; Huang, Ming; Nishioka, Takaaki
2017-01-01
Secondary metabolites are bioactive substances with diverse chemical structures. Depending on the ecological environment within which they are living, higher plants use different combinations of secondary metabolites for adaptation (e.g., defense against attacks by herbivores or pathogenic microbes). This suggests that the similarity in metabolite content is applicable to assess phylogenic similarity of higher plants. However, such a chemical taxonomic approach has limitations of incomplete metabolomics data. We propose an approach for successfully classifying 216 plants based on their known incomplete metabolite content. Structurally similar metabolites have been clustered using the network clustering algorithm DPClus. Plants have been represented as binary vectors, implying relations with structurally similar metabolite groups, and classified using Ward's method of hierarchical clustering. Despite incomplete data, the resulting plant clusters are consistent with the known evolutional relations of plants. This finding reveals the significance of metabolite content as a taxonomic marker. We also discuss the predictive power of metabolite content in exploring nutritional and medicinal properties in plants. As a byproduct of our analysis, we could predict some currently unknown species-metabolite relations. PMID:28164123
The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition
NASA Astrophysics Data System (ADS)
Štambuk, Nikola
The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.
Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery.
Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S; Pusey, Marc L; Aygün, Ramazan S
2014-03-01
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.
Le, Nguyen-Quoc-Khanh; Ho, Quang-Thai; Ou, Yu-Yen
2018-06-13
Deep learning has been increasingly used to solve a number of problems with state-of-the-art performance in a wide variety of fields. In biology, deep learning can be applied to reduce feature extraction time and achieve high levels of performance. In our present work, we apply deep learning via two-dimensional convolutional neural networks and position-specific scoring matrices to classify Rab protein molecules, which are main regulators in membrane trafficking for transferring proteins and other macromolecules throughout the cell. The functional loss of specific Rab molecular functions has been implicated in a variety of human diseases, e.g., choroideremia, intellectual disabilities, cancer. Therefore, creating a precise model for classifying Rabs is crucial in helping biologists understand the molecular functions of Rabs and design drug targets according to such specific human disease information. We constructed a robust deep neural network for classifying Rabs that achieved an accuracy of 99%, 99.5%, 96.3%, and 97.6% for each of four specific molecular functions. Our approach demonstrates superior performance to traditional artificial neural networks. Therefore, from our proposed study, we provide both an effective tool for classifying Rab proteins and a basis for further research that can improve the performance of biological modeling using deep neural networks. Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Schudlo, Larissa C.; Chau, Tom
2015-12-01
Objective. The majority of near-infrared spectroscopy (NIRS) brain-computer interface (BCI) studies have investigated binary classification problems. Limited work has considered differentiation of more than two mental states, or multi-class differentiation of higher-level cognitive tasks using measurements outside of the anterior prefrontal cortex. Improvements in accuracies are needed to deliver effective communication with a multi-class NIRS system. We investigated the feasibility of a ternary NIRS-BCI that supports mental states corresponding to verbal fluency task (VFT) performance, Stroop task performance, and unconstrained rest using prefrontal and parietal measurements. Approach. Prefrontal and parietal NIRS signals were acquired from 11 able-bodied adults during rest and performance of the VFT or Stroop task. Classification was performed offline using bagging with a linear discriminant base classifier trained on a 10 dimensional feature set. Main results. VFT, Stroop task and rest were classified at an average accuracy of 71.7% ± 7.9%. The ternary classification system provided a statistically significant improvement in information transfer rate relative to a binary system controlled by either mental task (0.87 ± 0.35 bits/min versus 0.73 ± 0.24 bits/min). Significance. These results suggest that effective communication can be achieved with a ternary NIRS-BCI that supports VFT, Stroop task and rest via measurements from the frontal and parietal cortices. Further development of such a system is warranted. Accurate ternary classification can enhance communication rates offered by NIRS-BCIs, improving the practicality of this technology.
Classifying prosthetic use via accelerometry in persons with transtibial amputations.
Redfield, Morgan T; Cagle, John C; Hafner, Brian J; Sanders, Joan E
2013-01-01
Knowledge of how persons with amputation use their prostheses and how this use changes over time may facilitate effective rehabilitation practices and enhance understanding of prosthesis functionality. Perpetual monitoring and classification of prosthesis use may also increase the health and quality of life for prosthetic users. Existing monitoring and classification systems are often limited in that they require the subject to manipulate the sensor (e.g., attach, remove, or reset a sensor), record data over relatively short time periods, and/or classify a limited number of activities and body postures of interest. In this study, a commercially available three-axis accelerometer (ActiLife ActiGraph GT3X+) was used to characterize the activities and body postures of individuals with transtibial amputation. Accelerometers were mounted on prosthetic pylons of 10 persons with transtibial amputation as they performed a preset routine of actions. Accelerometer data was postprocessed using a binary decision tree to identify when the prosthesis was being worn and to classify periods of use as movement (i.e., leg motion such as walking or stair climbing), standing (i.e., standing upright with limited leg motion), or sitting (i.e., seated with limited leg motion). Classifications were compared to visual observation by study researchers. The classifier achieved a mean +/- standard deviation accuracy of 96.6% +/- 3.0%.
Classifying Prosthetic Use via Accelerometry in Persons with Trans-Tibial Amputations
Redfield, Morgan T.; Cagle, John C.; Hafner, Brian J.; Sanders, Joan E.
2014-01-01
Knowledge of how persons with amputation use their prostheses and how this use changes over time may facilitate effective rehabilitation practices and enhance understanding of prosthesis functionality. Perpetual monitoring and classification of prosthesis use may also increase the health and quality of life for prosthetic users. Existing monitoring and classification systems are often limited in that they require the subject to manipulate the sensor (e.g., attach, remove, or reset a sensor), record data over relatively short time periods, and/or classify a limited number of activities and body postures of interest. In this study, a commercially-available three-axis accelerometer (ActiLife ActiGraph GT3X+) was used to characterize the activities and body postures of individuals with trans-tibial amputation. Accelerometers were mounted on prosthetic pylons of ten persons with trans-tibial amputation as they performed a preset routine of actions. Accelerometer data was post-processed using a Binary Decision Tree to identify when the prosthesis was being worn and to classify periods of use as movement (i.e., leg motion like walking or stair climbing), standing (i.e., standing upright with limited leg motion), or sitting (i.e., seated with limited leg motion). Classifications were compared to visual observation by study researchers. The classifier achieved a mean accuracy of 96.6% (SD=3.0%). PMID:24458961
Hayat, Maqsood; Khan, Asifullah
2011-02-21
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.
Shioi, Narumi; Ogawa, Eiki; Mizukami, Yuki; Abe, Shuhei; Hayashi, Rieko; Terada, Shigeyuki
2013-01-01
Viperidae snakes containing various venomous proteins also have several anti-toxic proteins in their sera. However, the physiological function of serum protein has been elucidated incompletely. Small serum protein (SSP)-1 is a major component of the SSPs isolated from the serum of a Japanese viper, the habu snake (Trimeresurus flavoviridis). It exists in the blood as a binary complex with habu serum factor (HSF), a snake venom metalloproteinase inhibitor. Affinity chromatography of the venom on an SSP-1-immobilized column identified HV1, an apoptosis-inducing metalloproteinase, as the target protein of SSP-1. Biacore measurements revealed that SSP-1 was bound to HV1 with a dissociation constant of 8.2 × 10−8 M. However, SSP-1 did not inhibit the peptidase activity of HV1. Although HSF alone showed no inhibitory activity or binding affinity to HV1, the SSP-1–HSF binary complex bound to HV1 formed a ternary complex that non-competitively inhibited the peptidase activity of HV1 with a inhibition constant of 5.1 ± 1.3 × 10−9 M. The SSP-1–HSF complex also effectively suppressed the apoptosis of vascular endothelial cells and caspase 3 activation induced by HV1. Thus, SSP-1 is a unique protein that non-covalently attaches to HV1 and changes its susceptibility to HSF. PMID:23100271
Shanir, P P Muhammed; Khan, Kashif Ahmad; Khan, Yusuf Uzzaman; Farooq, Omar; Adeli, Hojjat
2017-12-01
Epileptic neurological disorder of the brain is widely diagnosed using the electroencephalography (EEG) technique. EEG signals are nonstationary in nature and show abnormal neural activity during the ictal period. Seizures can be identified by analyzing and obtaining features of EEG signal that can detect these abnormal activities. The present work proposes a novel morphological feature extraction technique based on the local binary pattern (LBP) operator. LBP provides a unique decimal value to a sample point by weighing the binary outcomes after thresholding the neighboring samples with the present sample point. These LBP values assist in capturing the rising and falling edges of the EEG signal, thus providing a morphologically featured discriminating pattern for epilepsy detection. In the present work, the variability in the LBP values is measured by calculating the sum of absolute difference of the consecutive LBP values. Interquartile range is calculated over the preprocessed EEG signal to provide dispersion measure in the signal. For classification purpose, K-nearest neighbor classifier is used, and the performance is evaluated on 896.9 hours of data from CHB-MIT continuous EEG database. Mean accuracy of 99.7% and mean specificity of 99.8% is obtained with average false detection rate of 0.47/h and sensitivity of 99.2% for 136 seizures.
Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2012-05-10
RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
Segmentation and analysis of mouse pituitary cells with graphic user interface (GUI)
NASA Astrophysics Data System (ADS)
González, Erika; Medina, Lucía.; Hautefeuille, Mathieu; Fiordelisio, Tatiana
2018-02-01
In this work we present a method to perform pituitary cell segmentation in image stacks acquired by fluorescence microscopy from pituitary slice preparations. Although there exist many procedures developed to achieve cell segmentation tasks, they are generally based on the edge detection and require high resolution images. However in the biological preparations that we worked on, the cells are not well defined as experts identify their intracellular calcium activity due to fluorescence intensity changes in different regions over time. This intensity changes were associated with time series over regions, and because they present a particular behavior they were used into a classification procedure in order to perform cell segmentation. Two logistic regression classifiers were implemented for the time series classification task using as features the area under the curve and skewness in the first classifier and skewness and kurtosis in the second classifier. Once we have found both decision boundaries in two different feature spaces by training using 120 time series, the decision boundaries were tested over 12 image stacks through a python graphical user interface (GUI), generating binary images where white pixels correspond to cells and the black ones to background. Results show that area-skewness classifier reduces the time an expert dedicates in locating cells by up to 75% in some stacks versus a 92% for the kurtosis-skewness classifier, this evaluated on the number of regions the method found. Due to the promising results, we expect that this method will be improved adding more relevant features to the classifier.
Classification of deadlift biomechanics with wearable inertial measurement units.
O'Reilly, Martin A; Whelan, Darragh F; Ward, Tomas E; Delahunt, Eamonn; Caulfield, Brian M
2017-06-14
The deadlift is a compound full-body exercise that is fundamental in resistance training, rehabilitation programs and powerlifting competitions. Accurate quantification of deadlift biomechanics is important to reduce the risk of injury and ensure training and rehabilitation goals are achieved. This study sought to develop and evaluate deadlift exercise technique classification systems utilising Inertial Measurement Units (IMUs), recording at 51.2Hz, worn on the lumbar spine, both thighs and both shanks. It also sought to compare classification quality when these IMUs are worn in combination and in isolation. Two datasets of IMU deadlift data were collected. Eighty participants first completed deadlifts with acceptable technique and 5 distinct, deliberately induced deviations from acceptable form. Fifty-five members of this group also completed a fatiguing protocol (3-Repition Maximum test) to enable the collection of natural deadlift deviations. For both datasets, universal and personalised random-forests classifiers were developed and evaluated. Personalised classifiers outperformed universal classifiers in accuracy, sensitivity and specificity in the binary classification of acceptable or aberrant technique and in the multi-label classification of specific deadlift deviations. Whilst recent research has favoured universal classifiers due to the reduced overhead in setting them up for new system users, this work demonstrates that such techniques may not be appropriate for classifying deadlift technique due to the poor accuracy achieved. However, personalised classifiers perform very well in assessing deadlift technique, even when using data derived from a single lumbar-worn IMU to detect specific naturally occurring technique mistakes. Copyright © 2017 Elsevier Ltd. All rights reserved.
The EB factory project. II. Validation with the Kepler field in preparation for K2 and TESS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parvizi, Mahmoud; Paegert, Martin; Stassun, Keivan G., E-mail: mahmoud.parvizi@vanderbilt.edu
Large repositories of high precision light curve data, such as the Kepler data set, provide the opportunity to identify astrophysically important eclipsing binary (EB) systems in large quantities. However, the rate of classical “by eye” human analysis restricts complete and efficient mining of EBs from these data using classical techniques. To prepare for mining EBs from the upcoming K2 mission as well as other current missions, we developed an automated end-to-end computational pipeline—the Eclipsing Binary Factory (EBF)—that automatically identifies EBs and classifies them into morphological types. The EBF has been previously tested on ground-based light curves. To assess the performancemore » of the EBF in the context of space-based data, we apply the EBF to the full set of light curves in the Kepler “Q3” Data Release. We compare the EBs identified from this automated approach against the human generated Kepler EB Catalog of ∼2600 EBs. When we require EB classification with ⩾90% confidence, we find that the EBF correctly identifies and classifies eclipsing contact (EC), eclipsing semi-detached (ESD), and eclipsing detached (ED) systems with a false positive rate of only 4%, 4%, and 8%, while complete to 64%, 46%, and 32%, respectively. When classification confidence is relaxed, the EBF identifies and classifies ECs, ESDs, and EDs with a slightly higher false positive rate of 6%, 16%, and 8%, while much more complete to 86%, 74%, and 62%, respectively. Through our processing of the entire Kepler “Q3” data set, we also identify 68 new candidate EBs that may have been missed by the human generated Kepler EB Catalog. We discuss the EBF's potential application to light curve classification for periodic variable stars more generally for current and upcoming surveys like K2 and the Transiting Exoplanet Survey Satellite.« less
Farberg, Aaron S; Winkelmann, Richard R; Tucker, Natalie; White, Richard; Rigel, Darrell S
2017-09-01
BACKGROUND: Early diagnosis of melanoma is critical to survival. New technologies, such as a multi-spectral digital skin lesion analysis (MSDSLA) device [MelaFind, STRATA Skin Sciences, Horsham, Pennsylvania] may be useful to enhance clinician evaluation of concerning pigmented skin lesions. Previous studies evaluated the effect of only the binary output. OBJECTIVE: The objective of this study was to determine how decisions dermatologists make regarding pigmented lesion biopsies are impacted by providing both the underlying classifier score (CS) and associated probability risk provided by multi-spectral digital skin lesion analysis. This outcome was also compared against the improvement reported with the provision of only the binary output. METHODS: Dermatologists attending an educational conference evaluated 50 pigmented lesions (25 melanomas and 25 benign lesions). Participants were asked if they would biopsy the lesion based on clinical images, and were asked this question again after being shown multi-spectral digital skin lesion analysis data that included the probability graphs and classifier score. RESULTS: Data were analyzed from a total of 160 United States board-certified dermatologists. Biopsy sensitivity for melanoma improved from 76 percent following clinical evaluation to 92 percent after quantitative multi-spectral digital skin lesion analysis information was provided ( p <0.0001). Specificity improved from 52 percent to 79 percent ( p <0.0001). The positive predictive value increased from 61 percent to 81 percent ( p <0.01) when the quantitative data were provided. Negative predictive value also increased (68% vs. 91%, p<0.01), and overall biopsy accuracy was greater with multi-spectral digital skin lesion analysis (64% vs. 86%, p <0.001). Interrater reliability improved (intraclass correlation 0.466 before, 0.559 after). CONCLUSION: Incorporating the classifier score and probability data into physician evaluation of pigmented lesions led to both increased sensitivity and specificity, thereby resulting in more accurate biopsy decisions.
The Eb Factory Project. Ii. Validation With the Kepler Field in Preparation for K2 and Tess
NASA Astrophysics Data System (ADS)
Parvizi, Mahmoud; Paegert, Martin; Stassun, Keivan G.
2014-12-01
Large repositories of high precision light curve data, such as the Kepler data set, provide the opportunity to identify astrophysically important eclipsing binary (EB) systems in large quantities. However, the rate of classical “by eye” human analysis restricts complete and efficient mining of EBs from these data using classical techniques. To prepare for mining EBs from the upcoming K2 mission as well as other current missions, we developed an automated end-to-end computational pipeline—the Eclipsing Binary Factory (EBF)—that automatically identifies EBs and classifies them into morphological types. The EBF has been previously tested on ground-based light curves. To assess the performance of the EBF in the context of space-based data, we apply the EBF to the full set of light curves in the Kepler “Q3” Data Release. We compare the EBs identified from this automated approach against the human generated Kepler EB Catalog of ˜ 2600 EBs. When we require EB classification with ≥slant 90% confidence, we find that the EBF correctly identifies and classifies eclipsing contact (EC), eclipsing semi-detached (ESD), and eclipsing detached (ED) systems with a false positive rate of only 4%, 4%, and 8%, while complete to 64%, 46%, and 32%, respectively. When classification confidence is relaxed, the EBF identifies and classifies ECs, ESDs, and EDs with a slightly higher false positive rate of 6%, 16%, and 8%, while much more complete to 86%, 74%, and 62%, respectively. Through our processing of the entire Kepler “Q3” data set, we also identify 68 new candidate EBs that may have been missed by the human generated Kepler EB Catalog. We discuss the EBF's potential application to light curve classification for periodic variable stars more generally for current and upcoming surveys like K2 and the Transiting Exoplanet Survey Satellite.
Ribosome-Inactivating and Related Proteins
Schrot, Joachim; Weng, Alexander; Melzig, Matthias F.
2015-01-01
Ribosome-inactivating proteins (RIPs) are toxins that act as N-glycosidases (EC 3.2.2.22). They are mainly produced by plants and classified as type 1 RIPs and type 2 RIPs. There are also RIPs and RIP related proteins that cannot be grouped into the classical type 1 and type 2 RIPs because of their different sizes, structures or functions. In addition, there is still not a uniform nomenclature or classification existing for RIPs. In this review, we give the current status of all known plant RIPs and we make a suggestion about how to unify those RIPs and RIP related proteins that cannot be classified as type 1 or type 2 RIPs. PMID:26008228
Using linear algebra for protein structural comparison and classification
2009-01-01
In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in. PMID:21637532
Using linear algebra for protein structural comparison and classification.
Gomide, Janaína; Melo-Minardi, Raquel; Dos Santos, Marcos Augusto; Neshich, Goran; Meira, Wagner; Lopes, Júlio César; Santoro, Marcelo
2009-07-01
In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2015-03-15
Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). Copyright © 2014 Elsevier Inc. All rights reserved.
New variable stars discovered in the fields of three Galactic open clusters using the VVV survey
NASA Astrophysics Data System (ADS)
Palma, T.; Minniti, D.; Dékány, I.; Clariá, J. J.; Alonso-García, J.; Gramajo, L. V.; Ramírez Alegría, S.; Bonatto, C.
2016-11-01
This project is a massive near-infrared (NIR) search for variable stars in highly reddened and obscured open cluster (OC) fields projected on regions of the Galactic bulge and disk. The search is performed using photometric NIR data in the J-, H- and Ks- bands obtained from the Vista Variables in the Vía Láctea (VVV) Survey. We performed in each cluster field a variability search using Stetson's variability statistics to select the variable candidates. Later, those candidates were subjected to a frequency analysis using the Generalized Lomb-Scargle and the Phase Dispersion Minimization algorithms. The number of independent observations range between 63 and 73. The newly discovered variables in this study, 157 in total in three different known OCs, are classified based on their light curve shapes, periods, amplitudes and their location in the corresponding color-magnitude (J -Ks ,Ks) and color-color (H -Ks , J - H) diagrams. We found 5 possible Cepheid stars which, based on the period-luminosity relation, are very likely type II Cepheids located behind the bulge. Among the newly discovered variables, there are eclipsing binaries, δ Scuti, as well as background RR Lyrae stars. Using the new version of the Wilson & Devinney code as well as the "Physics Of Eclipsing Binaries" (PHOEBE) code, we analyzed some of the best eclipsing binaries we discovered. Our results show that these studied systems turn out to be ranging from detached to double-contact binaries, with low eccentricities and high inclinations of approximately 80°. Their surface temperatures range between 3500 K and 8000 K.
Root, Katharina; Wittwer, Yves; Barylyuk, Konstantin; Anders, Ulrike; Zenobi, Renato
2017-09-01
Native ESI-MS is increasingly used for quantitative analysis of biomolecular interactions. In such analyses, peak intensity ratios measured in mass spectra are treated as abundance ratios of the respective molecules in solution. While signal intensities of similar-size analytes, such as a protein and its complex with a small molecule, can be directly compared, significant distortions of the peak ratio due to unequal signal response of analytes impede the application of this approach for large oligomeric biomolecular complexes. We use a model system based on concatenated maltose binding protein units (MBPn, n = 1, 2, 3) to systematically study the behavior of protein mixtures in ESI-MS. The MBP concatamers differ from each other only by their mass while the chemical composition and other properties remain identical. We used native ESI-MS to analyze model mixtures of MBP oligomers, including equimolar mixtures of two proteins, as well as binary mixtures containing different fractions of the individual components. Pronounced deviation from a linear dependence of the signal intensity with concentration was observed for all binary mixtures investigated. While equimolar mixtures showed linear signal dependence at low concentrations, distinct ion suppression was observed above 20 μM. We systematically studied factors that are most often used in the literature to explain the origin of suppression effects. Implications of this effect for quantifying protein-protein binding affinity by native ESI-MS are discussed in general and demonstrated for an example of an anti-MBP antibody with its ligand, MBP. Graphical Abstract ᅟ.
2013-01-01
Background The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. Results In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. Conclusions This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/). PMID:23387468
Panwar, Bharat; Gupta, Sudheer; Raghava, Gajendra P S
2013-02-07
The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/).
Liu, W.; Montana, Vedrana; Parpura, Vladimir; Mohideen, U.
2010-01-01
We use an Atomic Force Microscope based single molecule measurements to evaluate the activation free energy in the interaction of SNARE proteins syntaxin 1A, SNAP25B and synaptobrevin 2 which regulate intracellular fusion of vesicles with target membranes. The dissociation rate of the binary syntaxin-synaptobrevin and the ternary syntaxin-SNAP25B-synaptobrevin complex was measured from the rupture force distribution as a function of the rate of applied force. The temperature dependence of the spontaneous dissociation rate was used to obtain the activation energy to the transition state of 19.8 ± 3.5 kcal/mol = 33 ± 6 kBT and 25.7 ± 3.0 kcal/mol = 43 ± 5 kBT for the binary and ternary complex, respectively. They are consistent with those measured previously for the ternary complex in lipid membranes and are of order expected for bilayer fusion and pore formation. The ΔG was 12.4–16.6 kcal/mol = 21–28 kBT and 13.8–18.0 kcal/mol = 23–30 kBT for the binary and ternary complex, respectively. The ternary complex was more stable by 1.4 kcal/mol = 2.3 kBT, consistent with the spontaneous dissociation rates. The higher adhesion energies and smaller molecular extensions measured with SNAP25B point to its possible unique and important physiological role in tethering/docking the vesicle in closer proximity to the plasma membrane and increasing the probability for fusion completion. PMID:20107522
Classifying Radio Galaxies with the Convolutional Neural Network
NASA Astrophysics Data System (ADS)
Aniyan, A. K.; Thorat, K.
2017-06-01
We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff-Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ˜200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.
A machine learning approach for classification of anatomical coverage in CT
NASA Astrophysics Data System (ADS)
Wang, Xiaoyong; Lo, Pechin; Ramakrishna, Bharath; Goldin, Johnathan; Brown, Matthew
2016-03-01
Automatic classification of anatomical coverage of medical images is critical for big data mining and as a pre-processing step to automatically trigger specific computer aided diagnosis systems. The traditional way to identify scans through DICOM headers has various limitations due to manual entry of series descriptions and non-standardized naming conventions. In this study, we present a machine learning approach where multiple binary classifiers were used to classify different anatomical coverages of CT scans. A one-vs-rest strategy was applied. For a given training set, a template scan was selected from the positive samples and all other scans were registered to it. Each registered scan was then evenly split into k × k × k non-overlapping blocks and for each block the mean intensity was computed. This resulted in a 1 × k3 feature vector for each scan. The feature vectors were then used to train a SVM based classifier. In this feasibility study, four classifiers were built to identify anatomic coverages of brain, chest, abdomen-pelvis, and chest-abdomen-pelvis CT scans. Each classifier was trained and tested using a set of 300 scans from different subjects, composed of 150 positive samples and 150 negative samples. Area under the ROC curve (AUC) of the testing set was measured to evaluate the performance in a two-fold cross validation setting. Our results showed good classification performance with an average AUC of 0.96.
Cuesta, D; Varela, M; Miró, P; Galdós, P; Abásolo, D; Hornero, R; Aboy, M
2007-07-01
Body temperature is a classical diagnostic tool for a number of diseases. However, it is usually employed as a plain binary classification function (febrile or not febrile), and therefore its diagnostic power has not been fully developed. In this paper, we describe how body temperature regularity can be used for diagnosis. Our proposed methodology is based on obtaining accurate long-term temperature recordings at high sampling frequencies and analyzing the temperature signal using a regularity metric (approximate entropy). In this study, we assessed our methodology using temperature registers acquired from patients with multiple organ failure admitted to an intensive care unit. Our results indicate there is a correlation between the patient's condition and the regularity of the body temperature. This finding enabled us to design a classifier for two outcomes (survival or death) and test it on a dataset including 36 subjects. The classifier achieved an accuracy of 72%.
An ordinal classification approach for CTG categorization.
Georgoulas, George; Karvelis, Petros; Gavrilis, Dimitris; Stylios, Chrysostomos D; Nikolakopoulos, George
2017-07-01
Evaluation of cardiotocogram (CTG) is a standard approach employed during pregnancy and delivery. But, its interpretation requires high level expertise to decide whether the recording is Normal, Suspicious or Pathological. Therefore, a number of attempts have been carried out over the past three decades for development automated sophisticated systems. These systems are usually (multiclass) classification systems that assign a category to the respective CTG. However most of these systems usually do not take into consideration the natural ordering of the categories associated with CTG recordings. In this work, an algorithm that explicitly takes into consideration the ordering of CTG categories, based on binary decomposition method, is investigated. Achieved results, using as a base classifier the C4.5 decision tree classifier, prove that the ordinal classification approach is marginally better than the traditional multiclass classification approach, which utilizes the standard C4.5 algorithm for several performance criteria.
NASA Astrophysics Data System (ADS)
Kistenev, Yury V.; Borisov, Alexey V.; Titarenko, Maria A.; Baydik, Olga D.; Shapovalov, Alexander V.
2018-04-01
The ability to diagnose oral lichen planus (OLP) based on saliva analysis using THz time-domain spectroscopy and chemometrics is discussed. The study involved 30 patients (2 male and 28 female) with OLP. This group consisted of two subgroups with the erosive form of OLP (n = 15) and with the reticular and papular forms of OLP (n = 15). The control group consisted of six healthy volunteers (one male and five females) without inflammation in the mucous membrane in the oral cavity and without periodontitis. Principal component analysis was used to reveal informative features in the experimental data. The one-versus-one multiclass classifier using support vector machine binary classifiers was used. The two-stage classification approach using several absorption spectra scans for an individual saliva sample provided 100% accuracy of differential classification between OLP subgroups and control group.
Support vector machines-based fault diagnosis for turbo-pump rotor
NASA Astrophysics Data System (ADS)
Yuan, Sheng-Fa; Chu, Fu-Lei
2006-05-01
Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
Sex determination in insects: a binary decision based on alternative splicing.
Salz, Helen K
2011-08-01
The gene regulatory networks that control sex determination vary between species. Despite these differences, comparative studies in insects have found that alternative splicing is reiteratively used in evolution to control expression of the key sex-determining genes. Sex determination is best understood in Drosophila where activation of the RNA binding protein-encoding gene Sex-lethal is the central female-determining event. Sex-lethal serves as a genetic switch because once activated it controls its own expression by a positive feedback splicing mechanism. Sex fate choice in is also maintained by self-sustaining positive feedback splicing mechanisms in other dipteran and hymenopteran insects, although different RNA binding protein-encoding genes function as the binary switch. Studies exploring the mechanisms of sex-specific splicing have revealed the extent to which sex determination is integrated with other developmental regulatory networks. Copyright © 2011 Elsevier Ltd. All rights reserved.
Clostridium spiroforme toxin is a binary toxin which ADP-ribosylates cellular actin.
Popoff, M R; Boquet, P
1988-05-16
We have purified from Clostridium spiroforme strain 246 an heterogeneous population of proteins (Sa) ranging from 43 to 47 kilodaltons exhibiting ADP-ribosyl transferase activity as do C. botulinum C2 toxin component I or the ia chain of C. perfringens E iota toxin. C. spiriforme Sa had alone no activity upon injection in mice or inoculated to Vero cells. When spiroforme ADP ribosyl transferase were mixed with a trypsin activated protein (Sb) separated from C. spiroforme bacterial supernatant, a lethal effect in mice and cytotoxicity on Vero cells were recorded. The Sa cross-reacted immunologically with either the light chain of C. perfringens E iota toxin or the ADP-ribosyl transferase from C. difficile 196 strain. No immunological relatedness was observed between Sa and C2 toxin component I. C. spiroforme toxin is thus another binary toxin close to iota.
1984-08-01
produce even the most basic binary cloud data and methodologies needed to support the evaluation programs." In view of this recognized deficiency, the...There was an exchange of information with non - DoD agencies, with presentations made by NASA and NOAA (see pp. 537, 569). A brief report by the steering...on cloud data bases and methodologies for users. To achieve these actions requires explicit support. *See classified supplementary volume. vi CONTENTS
Facial expression recognition based on improved deep belief networks
NASA Astrophysics Data System (ADS)
Wu, Yao; Qiu, Weigen
2017-08-01
In order to improve the robustness of facial expression recognition, a method of face expression recognition based on Local Binary Pattern (LBP) combined with improved deep belief networks (DBNs) is proposed. This method uses LBP to extract the feature, and then uses the improved deep belief networks as the detector and classifier to extract the LBP feature. The combination of LBP and improved deep belief networks is realized in facial expression recognition. In the JAFFE (Japanese Female Facial Expression) database on the recognition rate has improved significantly.
A Method of Character Detection and Segmentation for Highway Guide Signs
NASA Astrophysics Data System (ADS)
Xu, Jiawei; Zhang, Chongyang
2018-01-01
In this paper, a method of character detection and segmentation for highway signs in China is proposed. It consists of four steps. Firstly, the highway sign area is detectedby colour and geometric features, andthe possible character region is obtained by multi-level projection strategy. Secondly, pseudo target character region is removed by local binary patterns (LBP) feature. Thirdly, convolutional neural network (CNN)is used to classify target regions. Finally, adaptive projection strategies are used to segment characters strings. Experimental results indicate that the proposed method achieves new state-of-the-art results.
FBC: a flat binary code scheme for fast Manhattan hash retrieval
NASA Astrophysics Data System (ADS)
Kong, Yan; Wu, Fuzhang; Gao, Lifa; Wu, Yanjun
2018-04-01
Hash coding is a widely used technique in approximate nearest neighbor (ANN) search, especially in document search and multimedia (such as image and video) retrieval. Based on the difference of distance measurement, hash methods are generally classified into two categories: Hamming hashing and Manhattan hashing. Benefitting from better neighborhood structure preservation, Manhattan hashing methods outperform earlier methods in search effectiveness. However, due to using decimal arithmetic operations instead of bit operations, Manhattan hashing becomes a more time-consuming process, which significantly decreases the whole search efficiency. To solve this problem, we present an intuitive hash scheme which uses Flat Binary Code (FBC) to encode the data points. As a result, the decimal arithmetic used in previous Manhattan hashing can be replaced by more efficient XOR operator. The final experiments show that with a reasonable memory space growth, our FBC speeds up more than 80% averagely without any search accuracy loss when comparing to the state-of-art Manhattan hashing methods.
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
NASA Astrophysics Data System (ADS)
Kim, C.-H.; Kreiner, J. M.; Zakrzewski, B.; Ogłoza, W.; Kim, H.-W.; Jeong, M.-J.
2018-04-01
A comprehensive catalog of 623 galactic eclipsing binary (EB) systems with eccentric orbits is presented with more than 2830 times of minima determined from the archived photometric data by various sky-survey projects and new photometric measurements. The systems are divided into two groups according to whether the individual system has a GCVS name or not. All the systems in both groups are further classified into three categories (D, A, and A+III) on the basis of their eclipse timing diagrams: 453 D systems showing just constantly displaced secondary minima, 139 A systems displaying only apsidal motion (AM), and 31 A+III systems exhibiting both AM and light-time effects. AM parameters for 170 systems (A and A+III systems) are consistently calculated and cataloged with basic information for all systems. Some important statistics for the AM parameters are discussed and compared with those derived for the eccentric EB systems in the Large and Small Magellanic Clouds.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Apellániz, J. Maíz; Sota, A.; Alfaro, E. J.
This is the third installment of the Galactic O-Star Spectroscopic Survey (GOSSS), a massive spectroscopic survey of Galactic O stars, based on new homogeneous, high signal-to-noise ratio, R ∼ 2500 digital observations selected from the Galactic O-Star Catalog. In this paper, we present 142 additional stellar systems with O stars from both hemispheres, bringing the total of O-type systems published within the project to 590. Among the new objects, there are 20 new O stars. We also identify 11 new double-lined spectroscopic binaries, 6 of which are of O+O type and 5 of O+B type, and an additional new tripled-lined spectroscopic binary of O+O+Bmore » type. We also revise some of the previous GOSSS classifications, present some egregious examples of stars erroneously classified as O-type in the past, introduce the use of luminosity class IV at spectral types O4-O5.5, and adapt the classification scheme to the work of Arias et al.« less
NASA Astrophysics Data System (ADS)
Scholz, Ralf-Dieter; Bell, Cameron P. M.
2018-02-01
We present three new nearby L dwarf candidates, found in a continued combined color/proper motion search using WISE, 2MASS, and other survey data, where we included extended WISE sources and looked closer to the Galactic plane region. Their spectral types and distances were estimated from photometric comparisons to well-known L dwarfs with trigonometric parallaxes. The first object, 2MASS J07555430-3259589, is an extremely red L7.5p dwarf candidate at a photometric distance of about 16 pc. Its position, proper motion and distance are consistent with membership in the Carina-Near young moving group. The second one, 2MASS J07414279-0506464, is resolved in Gaia DR1 as a close binary (separation 0.3 arcsec), and we classify it as a equal-mass binary candidate consisting of two L5 dwarfs at 19 pc. Our nearest new neighbor, 2MASS J19251275+0700362, is an L7 dwarf candidate at 10 pc.
Study on the Secant Segmentation Algorithm of Rubber Tree
NASA Astrophysics Data System (ADS)
Li, Shute; Zhang, Jie; Zhang, Jian; Sun, Liang; Liu, Yongna
2018-04-01
Natural rubber is one of the most important materials in the national defense and industry, and the tapping panel dryness (TPD) of the rubber tree is one of the most serious diseases that affect the production of rubber. Although considerable progress has been made in the more than 100 years of research on the TPD, there are still many areas to be improved. At present, the method of artificial observation is widely used to identify TPD, but the diversity of rubber tree secant symptoms leads to the inaccurate judgement of the level of TPD. In this paper, image processing technology is used to separate the secant and latex, so that we can get rid of the interference factors, get the exact secant and latex binary image. By calculating the area ratio of the corresponding binary images, the grade of TPD can be classified accurately. and can also provide an objective basis for the accurate identification of the tapping panel dryness (TPD) level.
Diagnosis of skin cancer using image processing
NASA Astrophysics Data System (ADS)
Guerra-Rosas, Esperanza; Álvarez-Borrego, Josué; Coronel-Beltrán, Ángel
2014-10-01
In this papera methodology for classifying skin cancerin images of dermatologie spots based on spectral analysis using the K-law Fourier non-lineartechnique is presented. The image is segmented and binarized to build the function that contains the interest area. The image is divided into their respective RGB channels to obtain the spectral properties of each channel. The green channel contains more information and therefore this channel is always chosen. This information is point to point multiplied by a binary mask and to this result a Fourier transform is applied written in nonlinear form. If the real part of this spectrum is positive, the spectral density takeunit values, otherwise are zero. Finally the ratio of the sum of the unit values of the spectral density with the sum of values of the binary mask are calculated. This ratio is called spectral index. When the value calculated is in the spectral index range three types of cancer can be detected. Values found out of this range are benign injure.
NASA Astrophysics Data System (ADS)
Schechter, Paul L.; Morgan, Nicholas D.; Chehade, B.; Metcalfe, N.; Shanks, T.; McDonald, Michael
2017-05-01
We have analyzed images from the VST-ATLAS survey to identify candidate gravitationally lensed quasar systems in a sample of WISE sources with W1-W2> 0.7. Results from follow-up spectroscopy with the Baade 6.5 m telescope are presented for eight systems. One of them is a quadruply lensed quasar, and two are doubly lensed systems. Two are projected superpositions of two quasars at different redshifts. In one system, two quasars, although at the same redshift, have very different emission line profiles and constitute a physical binary. In two systems, the component spectra are consistent with the lensing hypothesis, after allowing for microlensing. However, as no lensing galaxy is detected in these two systems, we classify them as lensless twins. More extensive observations are needed to establish whether they are in fact lensed quasars or physical binaries. This paper includes data gathered with the 6.5 m Magellan telescopes located at Las Campanas Observatory, Chile.
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing
2016-01-01
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651
Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Li, Li-Ping; Huang, De-Shuang; Yan, Gui-Ying; Nie, Ru; Huang, Yu-An
2017-04-04
Identification of protein-protein interactions (PPIs) is of critical importance for deciphering the underlying mechanisms of almost all biological processes of cell and providing great insight into the study of human disease. Although much effort has been devoted to identifying PPIs from various organisms, existing high-throughput biological techniques are time-consuming, expensive, and have high false positive and negative results. Thus it is highly urgent to develop in silico methods to predict PPIs efficiently and accurately in this post genomic era. In this article, we report a novel computational model combining our newly developed discriminative vector machine classifier (DVM) and an improved Weber local descriptor (IWLD) for the prediction of PPIs. Two components, differential excitation and orientation, are exploited to build evolutionary features for each protein sequence. The main characteristics of the proposed method lies in introducing an effective feature descriptor IWLD which can capture highly discriminative evolutionary information from position-specific scoring matrixes (PSSM) of protein data, and employing the powerful and robust DVM classifier. When applying the proposed method to Yeast and H. pylori data sets, we obtained excellent prediction accuracies as high as 96.52% and 91.80%, respectively, which are significantly better than the previous methods. Extensive experiments were then performed for predicting cross-species PPIs and the predictive results were also pretty promising. To further validate the performance of the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier on Human data set. The experimental results obtained indicate that our method is highly effective for PPIs prediction and can be taken as a supplementary tool for future proteomics research.
Nguyen, Khuyen Thi; Ho, Quynh Ngoc; Pham, Thu Ha; Phan, Tuan-Nghia; Tran, Van-Tuan
2016-12-01
Aspergillus oryzae is a safe mold widely used in food industry. It is also considered as a microbial cell factory for production of recombinant proteins and enzymes. Currently, genetic manipulation of filamentous fungi is achieved via Agrobacterium tumefaciens-mediated transformation methods usually employing antibiotic resistance markers. These methods are hardly usable for A. oryzae due to its strong resistance to the common antifungal compounds used for fungal transformation. In this study, we have constructed two binary vectors carrying the pyrG gene from A. oryzae as a biochemical marker than an antibiotic resistance marker, and an expression cassette for GFP or DsRed reporter gene under control of the constitutive gpdA promoter from Aspergillus nidulans. All components of these vectors are changeable to generate new versions for specific research purposes. The developed vectors are fully functional for heterologous expression of the GFP and DsRed fluorescent proteins in the uridine/uracil auxotrophic A. oryzae strain. Our study provides a new approach for A. oryzae transformation using pyrG as the selectable auxotrophic marker, A. tumefaciens as the DNA transfer tool and fungal spores as the transformation material. The binary vectors constructed can be used for gene expression studies in this industrially important filamentous fungus.
Wavelet images and Chou's pseudo amino acid composition for protein classification.
Nanni, Loris; Brahnam, Sheryl; Lumini, Alessandra
2012-08-01
The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou's pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar .
NASA Astrophysics Data System (ADS)
Anosova, Joanna P.
2017-06-01
On 14 Sept, 2015 The LIGO reported the first direct detection of gravitational waves and the first direct observation of a binary black hole. These observations demonstrate the existence of binary black holes in stellar systems predicted by Einstein in his general theory of relativity a century earlier.A lot of violent and complicated phenomena take place on different scales in the Universe. Many of them may be caused by multiple centers of gravitational attraction: planetary rings, accretion discs of various scales, peculiar structures of single galaxies and interacting galaxies. In this work, we show that various features of celestial objects can be understood by assuming the existence of two dominant centers of gravity in stellar systems.We study numerically the dynamical evolution of models with the central super-massive binary black holes and extended shells with numerous low-mass particles inside and around the orbits of binaries. These particles could be star clusters or gas and dust complexes. We consider several tens of thousands of initial conditions for the general three-body problem and compile them. We studied the dynamical evolution of all spherical shells together and separately. Our method permits us to study the individual trajectories of particles, their close double and triple approaches, and inspect the time-depending structures in the models. Multiple runs of the models allow us to classify the numerous strong triple interactions of the binary components with low-mass particles; frequently, the "gravitational slingshot" effect occurs in the center of systems. Such strong interactions of bodies are results in various structures with "dumb-bell" bars, close and open spirals, different types of flows, jets etc. These structures are often very similar the observed structures of galaxies.We found some combinations of the initial conditions and model parameters that produce at some time similar structures as that found in the galaxies Arp 5, 87, 214, 240, and NGC 4027, 6946. Our Figures show results of such comparison and the past and future evolution of our models.
Network-based prediction and knowledge mining of disease genes.
Carson, Matthew B; Lu, Hui
2015-01-01
In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network could be used to identify likely disease associations. We analyzed the human protein interaction network and its relationship to disease and found that both the number of interactions with other proteins and the disease relationship of neighboring proteins helped to determine whether a protein had a relationship to disease. Our classifier predicted many proteins with no annotated disease association to be disease-related, which indicated that these proteins have network characteristics that are similar to disease-related proteins and may therefore have disease associations not previously identified. By performing a post-processing step after the prediction, we were able to identify evidence in literature supporting this possibility. This method could provide a useful filter for experimentalists searching for new candidate protein targets for drug repositioning and could also be extended to include other network and data types in order to refine these predictions.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
Integration of multiple biological features yields high confidence human protein interactome.
Karagoz, Kubra; Sevimoglu, Tuba; Arga, Kazim Yalcin
2016-08-21
The biological function of a protein is usually determined by its physical interaction with other proteins. Protein-protein interactions (PPIs) are identified through various experimental methods and are stored in curated databases. The noisiness of the existing PPI data is evident, and it is essential that a more reliable data is generated. Furthermore, the selection of a set of PPIs at different confidence levels might be necessary for many studies. Although different methodologies were introduced to evaluate the confidence scores for binary interactions, a highly reliable, almost complete PPI network of Homo sapiens is not proposed yet. The quality and coverage of human protein interactome need to be improved to be used in various disciplines, especially in biomedicine. In the present work, we propose an unsupervised statistical approach to assign confidence scores to PPIs of H. sapiens. To achieve this goal PPI data from six different databases were collected and a total of 295,288 non-redundant interactions between 15,950 proteins were acquired. The present scoring system included the context information that was assigned to PPIs derived from eight biological attributes. A high confidence network, which included 147,923 binary interactions between 13,213 proteins, had scores greater than the cutoff value of 0.80, for which sensitivity, specificity, and coverage were 94.5%, 80.9%, and 82.8%, respectively. We compared the present scoring method with others for evaluation. Reducing the noise inherent in experimental PPIs via our scoring scheme increased the accuracy significantly. As it was demonstrated through the assessment of process and cancer subnetworks, this study allows researchers to construct and analyze context-specific networks via valid PPI sets and one can easily achieve subnetworks around proteins of interest at a specified confidence level. Copyright © 2016 Elsevier Ltd. All rights reserved.
HMPAS: Human Membrane Protein Analysis System
2013-01-01
Background Membrane proteins perform essential roles in diverse cellular functions and are regarded as major pharmaceutical targets. The significance of membrane proteins has led to the developing dozens of resources related with membrane proteins. However, most of these resources are built for specific well-known membrane protein groups, making it difficult to find common and specific features of various membrane protein groups. Methods We collected human membrane proteins from the dispersed resources and predicted novel membrane protein candidates by using ortholog information and our membrane protein classifiers. The membrane proteins were classified according to the type of interaction with the membrane, subcellular localization, and molecular function. We also made new feature dataset to characterize the membrane proteins in various aspects including membrane protein topology, domain, biological process, disease, and drug. Moreover, protein structure and ICD-10-CM based integrated disease and drug information was newly included. To analyze the comprehensive information of membrane proteins, we implemented analysis tools to identify novel sequence and functional features of the classified membrane protein groups and to extract features from protein sequences. Results We constructed HMPAS with 28,509 collected known membrane proteins and 8,076 newly predicted candidates. This system provides integrated information of human membrane proteins individually and in groups organized by 45 subcellular locations and 1,401 molecular functions. As a case study, we identified associations between the membrane proteins and diseases and present that membrane proteins are promising targets for diseases related with nervous system and circulatory system. A web-based interface of this system was constructed to facilitate researchers not only to retrieve organized information of individual proteins but also to use the tools to analyze the membrane proteins. Conclusions HMPAS provides comprehensive information about human membrane proteins including specific features of certain membrane protein groups. In this system, user can acquire the information of individual proteins and specified groups focused on their conserved sequence features, involved cellular processes, and diseases. HMPAS may contribute as a valuable resource for the inference of novel cellular mechanisms and pharmaceutical targets associated with the human membrane proteins. HMPAS is freely available at http://fcode.kaist.ac.kr/hmpas. PMID:24564858
2011-01-01
Background For efficient and large scale production of recombinant proteins in plants transient expression by agroinfection has a number of advantages over stable transformation. Simple manipulation, rapid analysis and high expression efficiency are possible. In pea, Pisum sativum, a Virus Induced Gene Silencing System using the pea early browning virus has been converted into an efficient agroinfection system by converting the two RNA genomes of the virus into binary expression vectors for Agrobacterium transformation. Results By vacuum infiltration (0.08 Mpa, 1 min) of germinating pea seeds with 2-3 cm roots with Agrobacteria carrying the binary vectors, expression of the gene for Green Fluorescent Protein as marker and the gene for the human acidic fibroblast growth factor (aFGF) was obtained in 80% of the infiltrated developing seedlings. Maximal production of the recombinant proteins was achieved 12-15 days after infiltration. Conclusions Compared to the leaf injection method vacuum infiltration of germinated seeds is highly efficient allowing large scale production of plants transiently expressing recombinant proteins. The production cycle of plants for harvesting the recombinant protein was shortened from 30 days for leaf injection to 15 days by applying vacuum infiltration. The synthesized aFGF was purified by heparin-affinity chromatography and its mitogenic activity on NIH 3T3 cells confirmed to be similar to a commercial product. PMID:21548923
NASA Astrophysics Data System (ADS)
Liew, Oi Wah; Asundi, Anand K.; Chen, Jun-Wei; Chew, Yiwen; Yu, Shangjuan; Yeo, Gare H.
2001-05-01
In this paper, fiber optic spectroscopy is developed to detect and quantify recombinant green (EGFP) and red (DsRED) fluorescent proteins in vitro and in vivo. The bacterial expression vectors carrying the coding regions of EGFP and DsRED were introduced into Escherichia coli host cells and fluorescent proteins were produced following induction with IPTG. Soluble EGFP and DsRED proteins were isolated from lysed bacterial cells and serially diluted for quantitative analysis by fiber optic spectroscopy. Fluorescence at the appropriate emission wavelengths could be detected up to 64X dilution for EGFP and 40X dilution for DsRED. To determine the capability of spectroscopy detection in vivo, transgenic potato hairy roots expressing EGFP and DsRED were regenerated. This was achieved by cloning the EGFP and DsRED genes into the plant binary vector, pTMV35S, to create the recombinant vectors pGLOWGreen and pGLOWRed. These latter binary vectors were introduced into Agrobacterium rhizogenes strain A4T. Infection of potato cells with transformed agrobacteria was used to insert the fluorescent protein genes into the potato genome. Genetically modified potato cells were then regenerated into hairy roots. A panel of transformed hairy roots expressing varying levels of fluorescent proteins was selected by fluorescence microscopy. We are now assessing the capability of spectroscopic detection system for in vivo quantification of green and red fluorescence levels in transformed roots.
Effects of Nickel, Chlorpyrifos and Their Mixture on the Dictyostelium discoideum Proteome
Boatti, Lara; Robotti, Elisa; Marengo, Emilio; Viarengo, Aldo; Marsano, Francesco
2012-01-01
Mixtures of chemicals can have additive, synergistic or antagonistic interactions. We investigated the effects of the exposure to nickel, the organophosphate insecticide chlorpyrifos at effect concentrations (EC) of 25% and 50% and their binary mixture (Ec25 + EC25) on Dictyostelium discoideum amoebae based on lysosomal membrane stability (LMS). We treated D. discoideum with these compounds under controlled laboratory conditions and evaluated the changes in protein levels using a two-dimensional gel electrophoresis (2DE) proteomic approach. Nickel treatment at EC25 induced changes in 14 protein spots, 12 of which were down-regulated. Treatment with nickel at EC50 resulted in changes in 15 spots, 10 of which were down-regulated. Treatment with chlorpyrifos at EC25 induced changes in six spots, all of which were down-regulated; treatment with chlorpyrifos at EC50 induced changes in 13 spots, five of which were down-regulated. The mixture corresponding to EC25 of each compound induced changes in 19 spots, 13 of which were down-regulated. The data together reveal that a different protein expression signature exists for each treatment, and that only a few proteins are modulated in multiple different treatments. For a simple binary mixture, the proteomic response does not allow for the identification of each toxicant. The protein spots that showed significant differences were identified by mass spectrometry, which revealed modulations of proteins involved in metal detoxification, stress adaptation, the oxidative stress response and other cellular processes. PMID:23443088
A parallelized binary search tree
USDA-ARS?s Scientific Manuscript database
PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex mathematical computations and its processing time increases when i...
Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery
Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S.; Pusey, Marc L.; Aygün, Ramazan S.
2015-01-01
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset. PMID:25914518
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Chien-Hsiu, E-mail: leech@naoj.org
Eclipsing binaries offer a unique opportunity to determine basic stellar properties. With the advent of wide-field camera and all-sky time-domain surveys, thousands of eclipsing binaries have been charted via light curve classification, yet their fundamental properties remain unexplored mainly due to the extensive efforts needed for spectroscopic follow-ups. In this paper, we present the discovery of a short-period ( P = 0.313 day), double-lined M-dwarf eclipsing binary, CSSJ114804.3+255132/SDSSJ114804.35+255132.6, by cross-matching binary light curves from the Catalina Sky Survey and spectroscopically classified M dwarfs from the Sloan Digital Sky Survey. We obtain follow-up spectra using the Gemini telescope, enabling us to determinemore » the mass, radius, and temperature of the primary and secondary component to be M {sub 1} = 0.47 ± 0.03(statistic) ± 0.03(systematic) M {sub ⊙}, M {sub 2} = 0.46 ± 0.03(statistic) ± 0.03(systematic) M {sub ⊙}, R {sub 1} = 0.52 ± 0.08(statistic) ± 0.07(systematic) R {sub ⊙}, R {sub 2} =0.60 ± 0.08(statistic) ± 0.08(systematic) R {sub ⊙}, T {sub 1} = 3560 ± 100 K, and T {sub 2} = 3040 ± 100 K, respectively. The systematic error was estimated using the difference between eccentric and non-eccentric fits. Our analysis also indicates that there is definitively third-light contamination (66%) in the CSS photometry. The secondary star seems inflated, probably due to tidal locking of the close secondary companion, which is common for very short-period binary systems. Future spectroscopic observations with high resolution will narrow down the uncertainties of stellar parameters for both components, rendering this system as a benchmark for studying fundamental properties of M dwarfs.« less
NASA Astrophysics Data System (ADS)
Fang, Xiao; Thompson, Todd A.; Hirata, Christopher M.
2018-05-01
We investigate the long-term secular dynamics and Lidov-Kozai (LK) eccentricity oscillations of quadruple systems composed of two binaries at quadrupole and octupole orders in the perturbing Hamiltonian. We show that the fraction of systems reaching high eccentricities is enhanced relative to triple systems, over a broader range of parameter space. We show that this fraction grows with time, unlike triple systems evolved at quadrupole order. This is fundamentally because with their additional degrees of freedom, quadruple systems do not have a maximal set of commuting constants of the motion, even in secular theory at quadrupole order. We discuss these results in the context of star-star and white dwarf-white dwarf (WD) binaries, with emphasis on WD-WD mergers and collisions relevant to the Type Ia supernova problem. For star-star systems, we find that more than 30 per cent of systems reach high eccentricity within a Hubble time, potentially forming triple systems via stellar mergers or close binaries. For WD-WD systems, taking into account general relativistic and tidal precession and dissipation, we show that the merger rate is enhanced in quadruple systems relative to triple systems by a factor of 3.5-10, and that the long-term evolution of quadruple systems leads to a delay-time distribution ˜1/t for mergers and collisions. In gravitational wave-driven mergers of compact objects, we classify the mergers by their evolutionary patterns in phase space and identify a regime in about 8 per cent of orbital shrinking mergers, where eccentricity oscillations occur on the general relativistic precession time-scale, rather than the much longer LK time-scale. Finally, we generalize previous treatments of oscillations in the inner binary eccentricity (evection) to eccentric mutual orbits. We assess the merger rate in quadruple and triple systems and the implications for their viability as progenitors of stellar mergers and Type Ia supernovae.
Geomorphic Flood Area (GFA): a DEM-based tool for flood susceptibility mapping at large scales
NASA Astrophysics Data System (ADS)
Manfreda, S.; Samela, C.; Albano, R.; Sole, A.
2017-12-01
Flood hazard and risk mapping over large areas is a critical issue. Recently, many researchers are trying to achieve a global scale mapping encountering several difficulties, above all the lack of data and implementation costs. In data scarce environments, a preliminary and cost-effective floodplain delineation can be performed using geomorphic methods (e.g., Manfreda et al., 2014). We carried out several years of research on this topic, proposing a morphologic descriptor named Geomorphic Flood Index (GFI) (Samela et al., 2017) and developing a Digital Elevation Model (DEM)-based procedure able to identify flood susceptible areas. The procedure exhibited high accuracy in several test sites in Europe, United States and Africa (Manfreda et al., 2015; Samela et al., 2016, 2017) and has been recently implemented in a QGIS plugin named Geomorphic Flood Area (GFA) - tool. The tool allows to automatically compute the GFI, and turn it into a linear binary classifier capable of detecting flood-prone areas. To train this classifier, an inundation map derived using hydraulic models for a small portion of the basin is required (the minimum is 2% of the river basin's area). In this way, the GFA-tool allows to extend the classification of the flood-prone areas across the entire basin. We are also defining a simplified procedure for the estimation of the river depth, which may be helpful for large-scale analyses to approximatively evaluate the expected flood damages in the surrounding areas. ReferencesManfreda, S., Nardi, F., Samela, C., Grimaldi, S., Taramasso, A. C., Roth, G., & Sole, A. (2014). Investigation on the use of geomorphic approaches for the delineation of flood prone areas. J. Hydrol., 517, 863-876. Manfreda, S., Samela, C., Gioia, A., Consoli, G., Iacobellis, V., Giuzio, L., & Sole, A. (2016). Flood-prone areas assessment using linear binary classifiers based on flood maps obtained from 1D and 2D hydraulic models. Nat. Hazards, Vol. 79 (2), pp 735-754. Samela, C., Manfreda, S., Paola, F. D., Giugni, M., Sole, A., & Fiorentino, M. (2016). DEM-Based Approaches for the Delineation of Flood-Prone Areas in an Ungauged Basin in Africa. J. Hydrol. Eng,, 06015010. Samela, C., Troy, T. J., & Manfreda, S. (2017a). Geomorphic classifiers for flood-prone areas delineation for data-scarce environments. Adv. Water Resour., 102, 13-28.
Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan
2003-12-01
The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.
Yang, Xin; Jia, Yigang; Hu, Yi; Xu, Qing; Xu, Xian
2016-01-01
Candida rugosa lipase (CRL) has been widely used as a biocatalyst for non-aqueous synthesis in biotechnological applications, which, however, often suffers significant loss of activity in organic solvent. Experimental results show that trehalose could actively counteract the organic-solvent-induced protein denaturation, while the molecular mechanisms still don’t unclear. Herein, CRL was used as a model enzyme to explore the effects of trehalose on the retention of enzymatic activity upon incubation in N,N-dimethylformamide (DMF). Results showed that both catalytic activity and conformation changes of CRL influenced by DMF solvent were inhibited by trehalose in a dose-dependent fashion. The simulations further indicated that the CRL protein unfolded in binary DMF solution, but retained the native state in the ternary DMF/trehalose system. Trehalose as the second osmolyte added into binary DMF solution decreased DMF-CRL hydrogen bonds efficiently, whereas increased the intermolecular hydrogen bondings between DMF and trehalose. Thus, the origin of its denaturing effects of DMF on protein is thought to be due to the preferential exclusion of trehalose as well as the intermolecular hydrogen bondings between trehalose and DMF. These findings suggest that trehalose protect the CRL protein from DMF-induced unfolding via both indirect and direct interactions. PMID:27031946
Yang, Xin; Jiang, Ling; Jia, Yigang; Hu, Yi; Xu, Qing; Xu, Xian; Huang, He
2016-01-01
Candida rugosa lipase (CRL) has been widely used as a biocatalyst for non-aqueous synthesis in biotechnological applications, which, however, often suffers significant loss of activity in organic solvent. Experimental results show that trehalose could actively counteract the organic-solvent-induced protein denaturation, while the molecular mechanisms still don't unclear. Herein, CRL was used as a model enzyme to explore the effects of trehalose on the retention of enzymatic activity upon incubation in N,N-dimethylformamide (DMF). Results showed that both catalytic activity and conformation changes of CRL influenced by DMF solvent were inhibited by trehalose in a dose-dependent fashion. The simulations further indicated that the CRL protein unfolded in binary DMF solution, but retained the native state in the ternary DMF/trehalose system. Trehalose as the second osmolyte added into binary DMF solution decreased DMF-CRL hydrogen bonds efficiently, whereas increased the intermolecular hydrogen bondings between DMF and trehalose. Thus, the origin of its denaturing effects of DMF on protein is thought to be due to the preferential exclusion of trehalose as well as the intermolecular hydrogen bondings between trehalose and DMF. These findings suggest that trehalose protect the CRL protein from DMF-induced unfolding via both indirect and direct interactions.
NASA Astrophysics Data System (ADS)
Murphy, Brian W.; Darragh, Andrew; Hettinger, Paul; Hibshman, Adam; Johnson, Elliott W.; Liu, Z. J.; Pajkos, Michael A.; Stephenson, Hunter R.; Vondersaar, John R.; Conroy, Kyle E.; McCombs, Thayne A.; Reinhardt, Erik D.; Toddy, Joseph
2015-08-01
We present the results of an extensive study intended to search for and properly classify the variable stars in five galactic globular clusters. Each of the five clusters was observed hundreds to thousands of times over a time span ranging from 2 to 4 years using the SARA 0.6m located at Cerro Tololo Interamerican Observatory. The images were analyzed using the image subtract method of Alard (2000) to identify and produce light curves of all variables found in each cluster. In total we identified 373 variables with 140 of these being newly discovered increasing the number of known variables stars in these clusters by 60%. Of the total we have identified 312 RR Lyrae variables (187 RR0, 18 RR01, 99 RR1, 8 RR2), 9 SX Phe stars, 6 Cepheid variables, 11 eclipsing variables, and 35 long period variables. For IC4499 we identified 64 RR0, 18 RR01, 14 RR1, 4 RR2, 1 SX Phe, 1 eclipsing binary, and 2 long period variables. For NGC4833 we identified 10 RR0, 7 RR1, 2 RR2, 6 SX Phe, 5 eclipsing binaries, and 9 long period variables. For NGC6171 (M107) we identified 13 RR0, 7 RR1, and 1 SX Phe. For NGC6402 (M14) we identified 52 RR0, 56 RR1, 1 RR2, 1 SX Phe, 6 Cepheids, 1 eclipsing binary, and 15 long period variables. For NGC6584 we identified 48 RR0, 15 RR1, 1 RR2, 5 eclipsing binaries, and 9 long period variables. Using the RR Lyrae variables we found the mean V magnitude of the horizontal branch to be VHB = ⟨V ⟩RR = 17.63, 15.51, 15.72, 17.13, and 16.37 magnitudes for IC4499, NGC4833, NGC6171 (M107), NGC6402 (M14), and NGC6584, respectively. From our extensive data set we were able to obtain sufficient temporal and complete phase coverage of the RR Lyrae variables. This has allowed us not only to properly classify each of the RR Lyrae variables but also to use Fourier decomposition of the light curves to further analyze the properties of the variable stars and hence physical properties of each clusters. In this poster we will give the temperature, radius, stellar mass, metallicity, and helium abundance of the set of RR Lyrae variable stars found in each of the five globular clusters.
Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry
2014-01-01
Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. PMID:24784581
Urasaki, Yasuyo; Fiscus, Ronald R; Le, Thuc T
2016-04-01
We describe an alternative approach to classifying fatty liver by profiling protein post-translational modifications (PTMs) with high-throughput capillary isoelectric focusing (cIEF) immunoassays. Four strains of mice were studied, with fatty livers induced by different causes, such as ageing, genetic mutation, acute drug usage, and high-fat diet. Nutrient-sensitive PTMs of a panel of 12 liver metabolic and signalling proteins were simultaneously evaluated with cIEF immunoassays, using nanograms of total cellular protein per assay. Changes to liver protein acetylation, phosphorylation, and O-N-acetylglucosamine glycosylation were quantified and compared between normal and diseased states. Fatty liver tissues could be distinguished from one another by distinctive protein PTM profiles. Fatty liver is currently classified by morphological assessment of lipid droplets, without identifying the underlying molecular causes. In contrast, high-throughput profiling of protein PTMs has the potential to provide molecular classification of fatty liver. Copyright © 2016 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Stability of halophilic proteins: from dipeptide attributes to discrimination classifier.
Zhang, Guangya; Huihua, Ge; Yi, Lin
2013-02-01
To investigate the molecular features responsible for protein halophilicity is of great significance for understanding the structure basis of protein halo-stability and would help to develop a practical strategy for designing halophilic proteins. In this work, we have systematically analyzed the dipeptide composition of the halophilic and non-halophilic protein sequences. We observed the halophilic proteins contained more DA, RA, AD, RR, AP, DD, PD, EA, VG and DV at the expense of LK, IL, II, IA, KK, IS, KA, GK, RK and AI. We identified some macromolecular signatures of halo-adaptation, and thought the dipeptide composition might contain more information than amino acid composition. Based on the dipeptide composition, we have developed a machine learning method for classifying halophilic and non-halophilic proteins for the first time. The accuracy of our method for the training dataset was 100.0%, and for the 10-fold cross-validation was 93.1%. We also discussed the influence of some specific dipeptides on prediction accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.
Zurawski, S M; Zurawski, G
1988-01-01
We have analyzed structure--function relationships of the protein hormone murine interleukin 2 by fine structural deletion mapping. A total of 130 deletion mutant proteins, together with some substitution and insertion mutant proteins, was expressed in Escherichia coli and analyzed for their ability to sustain the proliferation of a cloned murine T cell line. This analysis has permitted a functional map of the protein to be drawn and classifies five segments of the protein, which together contain 48% of the sequence, as unessential to the biological activity of the protein. A further 26% of the protein is classified as important, but not crucial, for the activity. Three regions, consisting of amino acids 32-35, 66-77 and 119-141 contain the remaining 26% of the protein and are critical to the biological activity of the protein. The functional map is discussed in the context of the possible role of the identified critical regions in the structure of the hormone and its binding to the interleukin 2 receptor complex. Images PMID:3261239
Multivariate detrending of fMRI signal drifts for real-time multiclass pattern classification.
Lee, Dongha; Jang, Changwon; Park, Hae-Jeong
2015-03-01
Signal drift in functional magnetic resonance imaging (fMRI) is an unavoidable artifact that limits classification performance in multi-voxel pattern analysis of fMRI. As conventional methods to reduce signal drift, global demeaning or proportional scaling disregards regional variations of drift, whereas voxel-wise univariate detrending is too sensitive to noisy fluctuations. To overcome these drawbacks, we propose a multivariate real-time detrending method for multiclass classification that involves spatial demeaning at each scan and the recursive detrending of drifts in the classifier outputs driven by a multiclass linear support vector machine. Experiments using binary and multiclass data showed that the linear trend estimation of the classifier output drift for each class (a weighted sum of drifts in the class-specific voxels) was more robust against voxel-wise artifacts that lead to inconsistent spatial patterns and the effect of online processing than voxel-wise detrending. The classification performance of the proposed method was significantly better, especially for multiclass data, than that of voxel-wise linear detrending, global demeaning, and classifier output detrending without demeaning. We concluded that the multivariate approach using classifier output detrending of fMRI signals with spatial demeaning preserves spatial patterns, is less sensitive than conventional methods to sample size, and increases classification performance, which is a useful feature for real-time fMRI classification. Copyright © 2014 Elsevier Inc. All rights reserved.
An optical study of X-ray sources in the old open clusters NGC 752 and NGC 6940
NASA Astrophysics Data System (ADS)
van den Berg, M.; Verbunt, F.
2001-08-01
We observed the optical counterparts of X-ray sources in the old open clusters NGC 752 and NGC 6940 to search for the origin of the X-rays. The photometric variability reported earlier for the blue straggler H 209 is not confirmed by our light curves, nor is an indication for variability seen in the spectra; thus its X-rays remain unexplained. The X-rays of VR 111 and VR 114 are likely not a result of magnetic activity as these stars lack strong Ca II H&K emission, while in VR 108 the level of activity could be enhanced. The short-period binary H 313 is a photometric variable; this supports the interpretation that it is a magnetically active binary. From the detection of the Li I 6707.8 Å line, we classify the giant in VR 84 as a first-ascent giant; this leaves its circular orbit unexplained. As a side-result we report the detection of Li I 6707.8 Å in the spectrum of the giant H 3 and the absence of this line in the spectrum of the giant H 11; this classifies H 3 as a first-ascent giant and H 11 as a core-helium-burning clump star, and confirms the faint extension of the red-giant clump in NGC 752. Based on observations made with the Jacobus Kapteyn Telescope and the William Herschel Telescope operated on the island of La Palma by the Isaac Newton Group in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofisica de Canarias.
Fusion of pixel and object-based features for weed mapping using unmanned aerial vehicle imagery
NASA Astrophysics Data System (ADS)
Gao, Junfeng; Liao, Wenzhi; Nuyttens, David; Lootens, Peter; Vangeyte, Jürgen; Pižurica, Aleksandra; He, Yong; Pieters, Jan G.
2018-05-01
The developments in the use of unmanned aerial vehicles (UAVs) and advanced imaging sensors provide new opportunities for ultra-high resolution (e.g., less than a 10 cm ground sampling distance (GSD)) crop field monitoring and mapping in precision agriculture applications. In this study, we developed a strategy for inter- and intra-row weed detection in early season maize fields from aerial visual imagery. More specifically, the Hough transform algorithm (HT) was applied to the orthomosaicked images for inter-row weed detection. A semi-automatic Object-Based Image Analysis (OBIA) procedure was developed with Random Forests (RF) combined with feature selection techniques to classify soil, weeds and maize. Furthermore, the two binary weed masks generated from HT and OBIA were fused for accurate binary weed image. The developed RF classifier was evaluated by 5-fold cross validation, and it obtained an overall accuracy of 0.945, and Kappa value of 0.912. Finally, the relationship of detected weeds and their ground truth densities was quantified by a fitted linear model with a coefficient of determination of 0.895 and a root mean square error of 0.026. Besides, the importance of input features was evaluated, and it was found that the ratio of vegetation length and width was the most significant feature for the classification model. Overall, our approach can yield a satisfactory weed map, and we expect that the obtained accurate and timely weed map from UAV imagery will be applicable to realize site-specific weed management (SSWM) in early season crop fields for reducing spraying non-selective herbicides and costs.
Recognizing human activities using appearance metric feature and kinematics feature
NASA Astrophysics Data System (ADS)
Qian, Huimin; Zhou, Jun; Lu, Xinbiao; Wu, Xinye
2017-05-01
The problem of automatically recognizing human activities from videos through the fusion of the two most important cues, appearance metric feature and kinematics feature, is considered. And a system of two-dimensional (2-D) Poisson equations is introduced to extract the more discriminative appearance metric feature. Specifically, the moving human blobs are first detected out from the video by background subtraction technique to form a binary image sequence, from which the appearance feature designated as the motion accumulation image and the kinematics feature termed as centroid instantaneous velocity are extracted. Second, 2-D discrete Poisson equations are employed to reinterpret the motion accumulation image to produce a more differentiated Poisson silhouette image, from which the appearance feature vector is created through the dimension reduction technique called bidirectional 2-D principal component analysis, considering the balance between classification accuracy and time consumption. Finally, a cascaded classifier based on the nearest neighbor classifier and two directed acyclic graph support vector machine classifiers, integrated with the fusion of the appearance feature vector and centroid instantaneous velocity vector, is applied to recognize the human activities. Experimental results on the open databases and a homemade one confirm the recognition performance of the proposed algorithm.
Building Change Detection from Bi-Temporal Dense-Matching Point Clouds and Aerial Images.
Pang, Shiyan; Hu, Xiangyun; Cai, Zhongliang; Gong, Jinqi; Zhang, Mi
2018-03-24
In this work, a novel building change detection method from bi-temporal dense-matching point clouds and aerial images is proposed to address two major problems, namely, the robust acquisition of the changed objects above ground and the automatic classification of changed objects into buildings or non-buildings. For the acquisition of changed objects above ground, the change detection problem is converted into a binary classification, in which the changed area above ground is regarded as the foreground and the other area as the background. For the gridded points of each period, the graph cuts algorithm is adopted to classify the points into foreground and background, followed by the region-growing algorithm to form candidate changed building objects. A novel structural feature that was extracted from aerial images is constructed to classify the candidate changed building objects into buildings and non-buildings. The changed building objects are further classified as "newly built", "taller", "demolished", and "lower" by combining the classification and the digital surface models of two periods. Finally, three typical areas from a large dataset are used to validate the proposed method. Numerous experiments demonstrate the effectiveness of the proposed algorithm.
A Novel Signal Modeling Approach for Classification of Seizure and Seizure-Free EEG Signals.
Gupta, Anubha; Singh, Pushpendra; Karlekar, Mandar
2018-05-01
This paper presents a signal modeling-based new methodology of automatic seizure detection in EEG signals. The proposed method consists of three stages. First, a multirate filterbank structure is proposed that is constructed using the basis vectors of discrete cosine transform. The proposed filterbank decomposes EEG signals into its respective brain rhythms: delta, theta, alpha, beta, and gamma. Second, these brain rhythms are statistically modeled with the class of self-similar Gaussian random processes, namely, fractional Brownian motion and fractional Gaussian noises. The statistics of these processes are modeled using a single parameter called the Hurst exponent. In the last stage, the value of Hurst exponent and autoregressive moving average parameters are used as features to design a binary support vector machine classifier to classify pre-ictal, inter-ictal (epileptic with seizure free interval), and ictal (seizure) EEG segments. The performance of the classifier is assessed via extensive analysis on two widely used data set and is observed to provide good accuracy on both the data set. Thus, this paper proposes a novel signal model for EEG data that best captures the attributes of these signals and hence, allows to boost the classification accuracy of seizure and seizure-free epochs.
Joint Sparse Recovery With Semisupervised MUSIC
NASA Astrophysics Data System (ADS)
Wen, Zaidao; Hou, Biao; Jiao, Licheng
2017-05-01
Discrete multiple signal classification (MUSIC) with its low computational cost and mild condition requirement becomes a significant noniterative algorithm for joint sparse recovery (JSR). However, it fails in rank defective problem caused by coherent or limited amount of multiple measurement vectors (MMVs). In this letter, we provide a novel sight to address this problem by interpreting JSR as a binary classification problem with respect to atoms. Meanwhile, MUSIC essentially constructs a supervised classifier based on the labeled MMVs so that its performance will heavily depend on the quality and quantity of these training samples. From this viewpoint, we develop a semisupervised MUSIC (SS-MUSIC) in the spirit of machine learning, which declares that the insufficient supervised information in the training samples can be compensated from those unlabeled atoms. Instead of constructing a classifier in a fully supervised manner, we iteratively refine a semisupervised classifier by exploiting the labeled MMVs and some reliable unlabeled atoms simultaneously. Through this way, the required conditions and iterations can be greatly relaxed and reduced. Numerical experimental results demonstrate that SS-MUSIC can achieve much better recovery performances than other MUSIC extended algorithms as well as some typical greedy algorithms for JSR in terms of iterations and recovery probability.
A Dictionary Approach to Electron Backscatter Diffraction Indexing.
Chen, Yu H; Park, Se Un; Wei, Dennis; Newstadt, Greg; Jackson, Michael A; Simmons, Jeff P; De Graef, Marc; Hero, Alfred O
2015-06-01
We propose a framework for indexing of grain and subgrain structures in electron backscatter diffraction patterns of polycrystalline materials. We discretize the domain of a dynamical forward model onto a dense grid of orientations, producing a dictionary of patterns. For each measured pattern, we identify the most similar patterns in the dictionary, and identify boundaries, detect anomalies, and index crystal orientations. The statistical distribution of these closest matches is used in an unsupervised binary decision tree (DT) classifier to identify grain boundaries and anomalous regions. The DT classifies a pattern as an anomaly if it has an abnormally low similarity to any pattern in the dictionary. It classifies a pixel as being near a grain boundary if the highly ranked patterns in the dictionary differ significantly over the pixel's neighborhood. Indexing is accomplished by computing the mean orientation of the closest matches to each pattern. The mean orientation is estimated using a maximum likelihood approach that models the orientation distribution as a mixture of Von Mises-Fisher distributions over the quaternionic three sphere. The proposed dictionary matching approach permits segmentation, anomaly detection, and indexing to be performed in a unified manner with the additional benefit of uncertainty quantification.
Exhaustive comparison and classification of ligand-binding surfaces in proteins
Murakami, Yoichi; Kinoshita, Kengo; Kinjo, Akira R; Nakamura, Haruki
2013-01-01
Many proteins function by interacting with other small molecules (ligands). Identification of ligand-binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand-binding protein sequences and functions. Consequently, we classified the patches into ∼2000 well-characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross-fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes. PMID:23934772
Preferential solvatation of human serum albumin in dimethylsulfoxide-H2O binary solution
NASA Astrophysics Data System (ADS)
Grigoryan, K. R.
2009-12-01
The preferential solvatation of human serum albumin (HSA) in dimethylsulfoxide (DMSO) aqueous solutions were studied using the densitometry method. It has been shown that at DMSO low concentrations HSA undergoes to preferential hydration, but at DMSO higher concentrations preferential binding of DMSO molecules to protein occurs. It has been estimated that DMSO exhibits stabilizing/destabilizing effect on HSA structure which is explained in terms of hydration/solvatation of protein, on the one hand, and the medium structure enhancement/disruption around the protein molecule, on the other hand.
Chen, Peng; Li, Jinyan
2010-05-17
Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
Specificity and non-specificity in RNA–protein interactions
Jankowsky, Eckhard; Harris, Michael E.
2016-01-01
Gene expression is regulated by complex networks of interactions between RNAs and proteins. Proteins that interact with RNA have been traditionally viewed as either specific or non-specific; specific proteins interact preferentially with defined RNA sequence or structure motifs, whereas non-specific proteins interact with RNA sites devoid of such characteristics. Recent studies indicate that the binary “specific vs. non-specific” classification is insufficient to describe the full spectrum of RNA–protein interactions. Here, we review new methods that enable quantitative measurements of protein binding to large numbers of RNA variants, and the concepts aimed as describing resulting binding spectra: affinity distributions, comprehensive binding models and free energy landscapes. We discuss how these new methodologies and associated concepts enable work towards inclusive, quantitative models for specific and non-specific RNA–protein interactions. PMID:26285679
NASA Astrophysics Data System (ADS)
Sarma, Rahul; Paul, Sandip
2013-07-01
The ability of the osmolyte, trimethylamine-N-oxide (TMAO), to protect proteins from deleterious effect of urea, another commonly available osmolyte, is well-established. However, the molecular mechanism of this counteraction is not understood yet. To provide a molecular level understanding of how TMAO protects proteins in highly concentrated urea solution, we report here molecular dynamics simulation results of a 15-residue model peptide in two different conformations: helix and extended. For both conformations, simulations are carried out in pure water as well as in binary and ternary aqueous solutions of urea and TMAO. Analysis of solvation characteristics reveals direct interactions of urea and TMAO with peptide residues. However, the number of TMAO molecules that enter in the first solvation shell of the peptide is significantly lower than that of urea, and, unlike water and urea, TMAO shows its inability to form hydrogen bond with backbone oxygen and negatively charged sidechains. Preferential accumulation of urea near the peptide surface and preferential exclusion of TMAO from the peptide surface are observed. Inclusion of osmolytes in the peptide solvation shell leads to dehydration of the peptide in binary and ternary solutions of urea and TMAO. Solvation of peptide residues are investigated more closely by calculating the number of hydrogen bonds between the peptide and solution species. It is found that number of hydrogen bonds formed by the peptide with solution species increases in binary urea solution (relative to pure water) and this relative enhancement in hydrogen bond number reduces upon addition of TMAO. Our simulation results also suggest that, in the ternary solution, the peptide solvation layer is better mixed in terms of water and urea as compared to binary urea solution. Implications of the results for counteraction mechanism of TMAO are discussed.
2013-01-01
Background Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. Results In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms. Conclusion The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes. PMID:24266945
Infrared moving small target detection based on saliency extraction and image sparse representation
NASA Astrophysics Data System (ADS)
Zhang, Xiaomin; Ren, Kan; Gao, Jin; Li, Chaowei; Gu, Guohua; Wan, Minjie
2016-10-01
Moving small target detection in infrared image is a crucial technique of infrared search and tracking system. This paper present a novel small target detection technique based on frequency-domain saliency extraction and image sparse representation. First, we exploit the features of Fourier spectrum image and magnitude spectrum of Fourier transform to make a rough extract of saliency regions and use a threshold segmentation system to classify the regions which look salient from the background, which gives us a binary image as result. Second, a new patch-image model and over-complete dictionary were introduced to the detection system, then the infrared small target detection was converted into a problem solving and optimization process of patch-image information reconstruction based on sparse representation. More specifically, the test image and binary image can be decomposed into some image patches follow certain rules. We select the target potential area according to the binary patch-image which contains salient region information, then exploit the over-complete infrared small target dictionary to reconstruct the test image blocks which may contain targets. The coefficients of target image patch satisfy sparse features. Finally, for image sequence, Euclidean distance was used to reduce false alarm ratio and increase the detection accuracy of moving small targets in infrared images due to the target position correlation between frames.
High-resolution spectroscopic observations of the new CEMP-s star CD -50°776
NASA Astrophysics Data System (ADS)
Roriz, M.; Pereira, C. B.; Drake, N. A.; Roig, F.; Silva, J. V. Sales
2017-11-01
Carbon enhanced metal-poor (CEMP) stars are a particular class of low-metalicity halo stars whose chemical analysis may provide important contrains to the chemistry evolution of the Galaxy and to the models of mass-transfer and evolution of components in binary systems. Here, we present a detailed analysis of the CEMP star CD -50°776, using high resolution optical spectroscopy. We found that CD -50°776 has a metalicity [Fe/H] = -2.31 and a carbon abundance [C/Fe] = +1.21. Analysing the s-process elements and the europium abundances, we show that this star is actually a CEMP-s star, based on the criteria set in the literature to classify these chemically peculiar objects. We also show that CD -50°776 is a lead star, since it has a ratio [Pb/Ce] = +0.97. In addition, we show that CD -50°776 develops radial velocity variations that may be attributed to the orbital motion in a binary system. The abundance pattern of CD -50°776 is discussed and compared to other CEMP-s stars already reported in the literature to show that this star is a quite exceptional object among the CEMP stars, particularly due to its low nitrogen abundance. Explaining this pattern may require to improve the nucleosynthesis models, and the evolutionary models of mass transfer and binary interaction.
An Ocular Protein Triad Can Classify Four Complex Retinal Diseases
NASA Astrophysics Data System (ADS)
Kuiper, J. J. W.; Beretta, L.; Nierkens, S.; van Leeuwen, R.; Ten Dam-van Loon, N. H.; Ossewaarde-van Norel, J.; Bartels, M. C.; de Groot-Mijnes, J. D. F.; Schellekens, P.; de Boer, J. H.; Radstake, T. R. D. J.
2017-01-01
Retinal diseases generally are vision-threatening conditions that warrant appropriate clinical decision-making which currently solely dependents upon extensive clinical screening by specialized ophthalmologists. In the era where molecular assessment has improved dramatically, we aimed at the identification of biomarkers in 175 ocular fluids to classify four archetypical ocular conditions affecting the retina (age-related macular degeneration, idiopathic non-infectious uveitis, primary vitreoretinal lymphoma, and rhegmatogenous retinal detachment) with one single test. Unsupervised clustering of ocular proteins revealed a classification strikingly similar to the clinical phenotypes of each disease group studied. We developed and independently validated a parsimonious model based merely on three proteins; interleukin (IL)-10, IL-21, and angiotensin converting enzyme (ACE) that could correctly classify patients with an overall accuracy, sensitivity and specificity of respectively, 86.7%, 79.4% and 92.5%. Here, we provide proof-of-concept for molecular profiling as a diagnostic aid for ophthalmologists in the care for patients with retinal conditions.
Tucker, George; Loh, Po-Ru; Berger, Bonnie
2013-10-04
Comprehensive protein-protein interaction (PPI) maps are a powerful resource for uncovering the molecular basis of genetic interactions and providing mechanistic insights. Over the past decade, high-throughput experimental techniques have been developed to generate PPI maps at proteome scale, first using yeast two-hybrid approaches and more recently via affinity purification combined with mass spectrometry (AP-MS). Unfortunately, data from both protocols are prone to both high false positive and false negative rates. To address these issues, many methods have been developed to post-process raw PPI data. However, with few exceptions, these methods only analyze binary experimental data (in which each potential interaction tested is deemed either observed or unobserved), neglecting quantitative information available from AP-MS such as spectral counts. We propose a novel method for incorporating quantitative information from AP-MS data into existing PPI inference methods that analyze binary interaction data. Our approach introduces a probabilistic framework that models the statistical noise inherent in observations of co-purifications. Using a sampling-based approach, we model the uncertainty of interactions with low spectral counts by generating an ensemble of possible alternative experimental outcomes. We then apply the existing method of choice to each alternative outcome and aggregate results over the ensemble. We validate our approach on three recent AP-MS data sets and demonstrate performance comparable to or better than state-of-the-art methods. Additionally, we provide an in-depth discussion comparing the theoretical bases of existing approaches and identify common aspects that may be key to their performance. Our sampling framework extends the existing body of work on PPI analysis using binary interaction data to apply to the richer quantitative data now commonly available through AP-MS assays. This framework is quite general, and many enhancements are likely possible. Fruitful future directions may include investigating more sophisticated schemes for converting spectral counts to probabilities and applying the framework to direct protein complex prediction methods.
A radio-pulsing white dwarf binary star.
Marsh, T R; Gänsicke, B T; Hümmerich, S; Hambsch, F-J; Bernhard, K; Lloyd, C; Breedt, E; Stanway, E R; Steeghs, D T; Parsons, S G; Toloza, O; Schreiber, M R; Jonker, P G; van Roestel, J; Kupfer, T; Pala, A F; Dhillon, V S; Hardy, L K; Littlefair, S P; Aungwerojwit, A; Arjyotha, S; Koester, D; Bochinski, J J; Haswell, C A; Frank, P; Wheatley, P J
2016-09-15
White dwarfs are compact stars, similar in size to Earth but approximately 200,000 times more massive. Isolated white dwarfs emit most of their power from ultraviolet to near-infrared wavelengths, but when in close orbits with less dense stars, white dwarfs can strip material from their companions and the resulting mass transfer can generate atomic line and X-ray emission, as well as near- and mid-infrared radiation if the white dwarf is magnetic. However, even in binaries, white dwarfs are rarely detected at far-infrared or radio frequencies. Here we report the discovery of a white dwarf/cool star binary that emits from X-ray to radio wavelengths. The star, AR Scorpii (henceforth AR Sco), was classified in the early 1970s as a δ-Scuti star, a common variety of periodic variable star. Our observations reveal instead a 3.56-hour period close binary, pulsing in brightness on a period of 1.97 minutes. The pulses are so intense that AR Sco's optical flux can increase by a factor of four within 30 seconds, and they are also detectable at radio frequencies. They reflect the spin of a magnetic white dwarf, which we find to be slowing down on a 10 7 -year timescale. The spin-down power is an order of magnitude larger than that seen in electromagnetic radiation, which, together with an absence of obvious signs of accretion, suggests that AR Sco is primarily spin-powered. Although the pulsations are driven by the white dwarf's spin, they mainly originate from the cool star. AR Sco's broadband spectrum is characteristic of synchrotron radiation, requiring relativistic electrons. These must either originate from near the white dwarf or be generated in situ at the M star through direct interaction with the white dwarf's magnetosphere.
Visaya, Maria Vivien; Sherwell, David; Sartorius, Benn; Cromieres, Fabien
2015-01-01
We analyse demographic longitudinal survey data of South African (SA) and Mozambican (MOZ) rural households from the Agincourt Health and Socio-Demographic Surveillance System in South Africa. In particular, we determine whether absolute poverty status (APS) is associated with selected household variables pertaining to socio-economic determination, namely household head age, household size, cumulative death, adults to minor ratio, and influx. For comparative purposes, households are classified according to household head nationality (SA or MOZ) and APS (rich or poor). The longitudinal data of each of the four subpopulations (SA rich, SA poor, MOZ rich, and MOZ poor) is a five-dimensional space defined by binary variables (questions), subjects, and time. We use the orbit method to represent binary multivariate longitudinal data (BMLD) of each household as a two-dimensional orbit and to visualise dynamics and behaviour of the population. At each time step, a point (x, y) from the orbit of a household corresponds to the observation of the household, where x is a binary sequence of responses and y is an ordering of variables. The ordering of variables is dynamically rearranged such that clusters and holes associated to least and frequently changing variables in the state space respectively, are exposed. Analysis of orbits reveals information of change at both individual- and population-level, change patterns in the data, capacity of states in the state space, and density of state transitions in the orbits. Analysis of household orbits of the four subpopulations show association between (i) households headed by older adults and rich households, (ii) large household size and poor households, and (iii) households with more minors than adults and poor households. Our results are compared to other methods of BMLD analysis. PMID:25919116
NASA Astrophysics Data System (ADS)
Mooley, K. P.; Wrobel, J. M.; Anderson, M. M.; Hallinan, G.
2018-01-01
Supermassive binary black holes (BBHs) on sub-parsec scales are prime targets for gravitational wave experiments. They also provide insights on close binary evolution and hierarchical structure formation. Sub-parsec BBHs cannot be spatially resolved but indirect methods can identify candidates. In 2015 Liu et al. reported an optical-continuum periodicity in the quasar PSO J334.2028+01.4075, with the estimated mass and rest-frame period suggesting an orbital separation of about 0.006 pc (0.7 μ arcsec). The persistence of the quasar's optical periodicity has recently been disfavoured over an extended baseline. However, if a radio jet is launched from a sub-parsec BBH, the binary's properties can influence the radio structure on larger scales. Here, we use the Very Long Baseline Array (VLBA) and Karl G. Jansky Very Large Array (VLA) to study the parsec- and kiloparsec-scale emission energized by the quasar's putative BBH. We find two VLBA components separated by 3.6 mas (30 pc), tentatively identifying one as the VLBA 'core' from which the other was ejected. The VLBA components contribute to a point-like, time-variable VLA source that is straddled by lobes spanning 8 arcsec (66 kpc). We classify PSO J334.2028+01.4075 as a lobe-dominated quasar, albeit with an atypically large twist of 39° between its elongation position angles on parsec- and kiloparsec-scales. By analogy with 3C 207, a well-studied lobe-dominated quasar with a similarly-rare twist, we speculate that PSO J334.2028+01.4075 could be ejecting jet components over an inner cone that traces a precessing jet in a BBH system.
Abdollahpour, Nooshin; Soheili, Vahid; Saberi, Mohammad Reza; Chamani, Jamshidkhan
2016-12-01
Human serum albumin (HSA) is the most frequent protein in blood plasma. Albumin transports various compounds, preserves osmotic pressure, and buffers pH. A unique feature of albumin is its ability to bind drugs and other bioactive molecules. However, it is important to consider binary and ternary systems of two pharmaceuticals to estimate the effect of the first drug on the second one and physicochemical properties. Different techniques including time-resolved, second-derivative and anisotropy fluorescence spectroscopy, resonance light scattering (RLS), critical induced aggregation concentration (C CIAC ), particle size, zeta potential and stability analysis were employed in this assessment to elucidate the binding behavior of Amlodipine and Aspirin to HSA. Moreover, isothermal titration calorimetric techniques were performed and the QSAR properties were applied to analyze the hydration energy and log P. Multiple sequence alignments were also used to predict the structure and biological characteristics of the HSA binding site. Time-resolved fluorescence spectroscopy showed interaction of both drugs to HSA based on a static quenching mechanism. Subsequently, second-derivative fluorescence spectroscopy presented different values of parameter H in binary and ternary systems, which were suggested that tryptophan was in a more polar environment in the ternary system than in a binary system. Moreover, the polydispersity index and results from mean number measurements revealed that the presence of the second drug caused a decrease in the stability of systems and increased the heterogeneity of complex. It is also, observed that the gradual addition of HSA has led to a marked increase in fluorescence anisotropy (r) of Amlodipine and Aspirin which can be suggested that the drugs were located in a restricted environment of the protein as confirmed by Red Edge Excitation Shift (REES) studies. The isothermal titration calorimetric technique demonstrated that the interaction of the drugs with HSA was an enthalpically-driven process. The present experiment showed that the binding of Amlodipine and Aspirin to HSA induced a conformational change of HSA. It was also identified that the protein binding of the first drug could be affected by the second drug. Such results can be of great use for understanding the pharmacokinetic and pharmacodynamic mechanisms of drugs.
Kazemi, Fatemeh; Najafabadi, Tooraj Abbasian; Araabi, Babak Nadjar
2016-01-01
Acute myelogenous leukemia (AML) is a subtype of acute leukemia, which is characterized by the accumulation of myeloid blasts in the bone marrow. Careful microscopic examination of stained blood smear or bone marrow aspirate is still the most significant diagnostic methodology for initial AML screening and considered as the first step toward diagnosis. It is time-consuming and due to the elusive nature of the signs and symptoms of AML; wrong diagnosis may occur by pathologists. Therefore, the need for automation of leukemia detection has arisen. In this paper, an automatic technique for identification and detection of AML and its prevalent subtypes, i.e., M2-M5 is presented. At first, microscopic images are acquired from blood smears of patients with AML and normal cases. After applying image preprocessing, color segmentation strategy is applied for segmenting white blood cells from other blood components and then discriminative features, i.e., irregularity, nucleus-cytoplasm ratio, Hausdorff dimension, shape, color, and texture features are extracted from the entire nucleus in the whole images containing multiple nuclei. Images are classified to cancerous and noncancerous images by binary support vector machine (SVM) classifier with 10-fold cross validation technique. Classifier performance is evaluated by three parameters, i.e., sensitivity, specificity, and accuracy. Cancerous images are also classified into their prevalent subtypes by multi-SVM classifier. The results show that the proposed algorithm has achieved an acceptable performance for diagnosis of AML and its common subtypes. Therefore, it can be used as an assistant diagnostic tool for pathologists.
Classifying Radio Galaxies with the Convolutional Neural Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aniyan, A. K.; Thorat, K.
We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categoriesmore » is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.« less
Network-based prediction and knowledge mining of disease genes
2015-01-01
Background In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. Methods We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Results Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network could be used to identify likely disease associations. Conclusions We analyzed the human protein interaction network and its relationship to disease and found that both the number of interactions with other proteins and the disease relationship of neighboring proteins helped to determine whether a protein had a relationship to disease. Our classifier predicted many proteins with no annotated disease association to be disease-related, which indicated that these proteins have network characteristics that are similar to disease-related proteins and may therefore have disease associations not previously identified. By performing a post-processing step after the prediction, we were able to identify evidence in literature supporting this possibility. This method could provide a useful filter for experimentalists searching for new candidate protein targets for drug repositioning and could also be extended to include other network and data types in order to refine these predictions. PMID:26043920
Barth, Holger; Aktories, Klaus; Popoff, Michel R; Stiles, Bradley G
2004-09-01
Certain pathogenic species of Bacillus and Clostridium have developed unique methods for intoxicating cells that employ the classic enzymatic "A-B" paradigm for protein toxins. The binary toxins produced by B. anthracis, B. cereus, C. botulinum, C. difficile, C. perfringens, and C. spiroforme consist of components not physically associated in solution that are linked to various diseases in humans, animals, or insects. The "B" components are synthesized as precursors that are subsequently activated by serine-type proteases on the targeted cell surface and/or in solution. Following release of a 20-kDa N-terminal peptide, the activated "B" components form homoheptameric rings that subsequently dock with an "A" component(s) on the cell surface. By following an acidified endosomal route and translocation into the cytosol, "A" molecules disable a cell (and host organism) via disruption of the actin cytoskeleton, increasing intracellular levels of cyclic AMP, or inactivation of signaling pathways linked to mitogen-activated protein kinase kinases. Recently, B. anthracis has gleaned much notoriety as a biowarfare/bioterrorism agent, and of primary interest has been the edema and lethal toxins, their role in anthrax, as well as the development of efficacious vaccines and therapeutics targeting these virulence factors and ultimately B. anthracis. This review comprehensively surveys the literature and discusses the similarities, as well as distinct differences, between each Clostridium and Bacillus binary toxin in terms of their biochemistry, biology, genetics, structure, and applications in science and medicine. The information may foster future studies that aid novel vaccine and drug development, as well as a better understanding of a conserved intoxication process utilized by various gram-positive, spore-forming bacteria.
Barth, Holger; Aktories, Klaus; Popoff, Michel R.; Stiles, Bradley G.
2004-01-01
Certain pathogenic species of Bacillus and Clostridium have developed unique methods for intoxicating cells that employ the classic enzymatic “A-B” paradigm for protein toxins. The binary toxins produced by B. anthracis, B. cereus, C. botulinum, C. difficile, C. perfringens, and C. spiroforme consist of components not physically associated in solution that are linked to various diseases in humans, animals, or insects. The “B” components are synthesized as precursors that are subsequently activated by serine-type proteases on the targeted cell surface and/or in solution. Following release of a 20-kDa N-terminal peptide, the activated “B” components form homoheptameric rings that subsequently dock with an “A” component(s) on the cell surface. By following an acidified endosomal route and translocation into the cytosol, “A” molecules disable a cell (and host organism) via disruption of the actin cytoskeleton, increasing intracellular levels of cyclic AMP, or inactivation of signaling pathways linked to mitogen-activated protein kinase kinases. Recently, B. anthracis has gleaned much notoriety as a biowarfare/bioterrorism agent, and of primary interest has been the edema and lethal toxins, their role in anthrax, as well as the development of efficacious vaccines and therapeutics targeting these virulence factors and ultimately B. anthracis. This review comprehensively surveys the literature and discusses the similarities, as well as distinct differences, between each Clostridium and Bacillus binary toxin in terms of their biochemistry, biology, genetics, structure, and applications in science and medicine. The information may foster future studies that aid novel vaccine and drug development, as well as a better understanding of a conserved intoxication process utilized by various gram-positive, spore-forming bacteria. PMID:15353562
Cheng, Zhanzhan; Zhou, Shuigeng; Wang, Yang; Liu, Hui; Guan, Jihong; Chen, Yi-Ping Phoebe
2016-05-18
Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. Source code, datasets and related documents of PUCPI are available at: http://admis.fudan.edu.cn/projects/pucpi.html.
Multiscale Rotation-Invariant Convolutional Neural Networks for Lung Texture Classification.
Wang, Qiangchang; Zheng, Yuanjie; Yang, Gongping; Jin, Weidong; Chen, Xinjian; Yin, Yilong
2018-01-01
We propose a new multiscale rotation-invariant convolutional neural network (MRCNN) model for classifying various lung tissue types on high-resolution computed tomography. MRCNN employs Gabor-local binary pattern that introduces a good property in image analysis-invariance to image scales and rotations. In addition, we offer an approach to deal with the problems caused by imbalanced number of samples between different classes in most of the existing works, accomplished by changing the overlapping size between the adjacent patches. Experimental results on a public interstitial lung disease database show a superior performance of the proposed method to state of the art.
He 2-104 - A symbiotic proto-planetary nebula?
NASA Technical Reports Server (NTRS)
Schwarz, Hugo E.; Aspin, Colin; Lutz, Julie H.
1989-01-01
CCD observations are presented for He 2-104, an object previously classified as both PN and symbiotic star, which show that this is in fact a protoplanetary nebula (PPN) with a dynamical age of about 800 yr. The presence of highly collimated jets, extending over 75 arcsec on the sky, combined with an energy distribution showing a hot as well as a cool component, indicates that He 2-104 is a binary PPN. Since the primary is probably a Mira with a 400-d period (as reported by Whitelock, 1988), it is proposed that the system is a symbiotic PPN.
NASA Astrophysics Data System (ADS)
Dorn-Wallenstein, Trevor Z.; Levesque, Emily
2017-11-01
Thanks to incredible advances in instrumentation, surveys like the Sloan Digital Sky Survey have been able to find and catalog billions of objects, ranging from local M dwarfs to distant quasars. Machine learning algorithms have greatly aided in the effort to classify these objects; however, there are regimes where these algorithms fail, where interesting oddities may be found. We present here an X-ray bright quasar misidentified as a red supergiant/X-ray binary, and a subsequent search of the SDSS quasar catalog for X-ray bright stars misidentified as quasars.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shetty, Nishant D.; Reddy, Manchi C.M.; Palaninathan, Satheesh K.
2010-10-11
PII constitutes a family of signal transduction proteins that act as nitrogen sensors in microorganisms and plants. Mycobacterium tuberculosis (Mtb) has a single homologue of PII whose precise role has as yet not been explored. We have solved the crystal structures of the Mtb PII protein in its apo and ATP bound forms to 1.4 and 2.4 {angstrom} resolutions, respectively. The protein forms a trimeric assembly in the crystal lattice and folds similarly to the other PII family proteins. The Mtb PII:ATP binary complex structure reveals three ATP molecules per trimer, each bound between the base of the T-loop ofmore » one subunit and the C-loop of the neighboring subunit. In contrast to the apo structure, at least one subunit of the binary complex structure contains a completely ordered T-loop indicating that ATP binding plays a role in orienting this loop region towards target proteins like the ammonium transporter, AmtB. Arg38 of the T-loop makes direct contact with the {gamma}-phosphate of the ATP molecule replacing the Mg{sup 2+} position seen in the Methanococcus jannaschii GlnK1 structure. The C-loop of a neighboring subunit encloses the other side of the ATP molecule, placing the GlnK specific C-terminal 3{sub 10} helix in the vicinity. Homology modeling studies with the E. coli GlnK:AmtB complex reveal that Mtb PII could form a complex similar to the complex in E. coli. The structural conservation and operon organization suggests that the Mtb PII gene encodes for a GlnK protein and might play a key role in the nitrogen regulatory pathway.« less
Martínez-Júlvez, Marta; Medina, Milagros; Velázquez-Campoy, Adrián
2009-01-01
Abstract The thermodynamics of the formation of binary and ternary complexes between Anabaena PCC 7119 FNR and its substrates, NADP+ and Fd, or Fld, has been studied by ITC. Despite structural dissimilarities, the main difference between Fd and Fld binding to FNR relates to hydrophobicity, reflected in different binding heat capacity and number of water molecules released from the interface. At pH 8, the formation of the binary complexes is both enthalpically and entropically driven, accompanied by the protonation of at least one ionizable group. His299 FNR has been identified as the main responsible for the proton exchange observed. However, at pH 10, where no protonation occurs and intrinsic binding parameters can be obtained, the formation of the binary complexes is entropically driven, with negligible enthalpic contribution. Absence of the FMN cofactor in Fld does not alter significantly the strength of the interaction, but considerably modifies the enthalpic and entropic contributions, suggesting a different binding mode. Ternary complexes show negative cooperativity (6-fold and 11-fold reduction in binding affinity, respectively), and an increase in the enthalpic contribution (more favorable) and a decrease in the entropic contribution (less favorable), with regard to the binary complexes energetics. PMID:19527656
A Bioinformatics Classifier and Database for Heme-Copper Oxygen Reductases
Sousa, Filipa L.; Alves, Renato J.; Pereira-Leal, José B.; Teixeira, Miguel; Pereira, Manuela M.
2011-01-01
Background Heme-copper oxygen reductases (HCOs) are the last enzymatic complexes of most aerobic respiratory chains, reducing dioxygen to water and translocating up to four protons across the inner mitochondrial membrane (eukaryotes) or cytoplasmatic membrane (prokaryotes). The number of completely sequenced genomes is expanding exponentially, and concomitantly, the number and taxonomic distribution of HCO sequences. These enzymes were initially classified into three different types being this classification recently challenged. Methodology We reanalyzed the classification scheme and developed a new bioinformatics classifier for the HCO and Nitric oxide reductases (NOR), which we benchmark against a manually derived gold standard sequence set. It is able to classify any given sequence of subunit I from HCO and NOR with a global recall and precision both of 99.8%. We use this tool to classify this protein family in 552 completely sequenced genomes. Conclusions We concluded that the new and broader data set supports three functional and evolutionary groups of HCOs. Homology between NORs and HCOs is shown and NORs closest relationship with C Type HCOs demonstrated. We established and made available a classification web tool and an integrated Heme-Copper Oxygen reductase and NOR protein database (www.evocell.org/hco). PMID:21559461
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.
Tuo, Youlin; An, Ning; Zhang, Ming
2018-03-01
The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
Classification of protein quaternary structure by functional domain composition
Yu, Xiaojing; Wang, Chuan; Li, Yixue
2006-01-01
Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572
[Study on high temperature oxidation of Ni-Cr ceramic alloys. Effects of Cr and Mo].
Mizutani, M
1990-03-01
The effects of Cr and Mo addition to Ni-Cr alloys on high temperature oxidation were investigated. The alloys were prepared with the composition of Cr ranging from 5 to 40 wt%. Also 2, 4 and 9 wt% of Mo was added to both Ni-5% Cr and Ni-20% Cr binary alloys. The alloys were heated at 800 degrees C, 900 degrees C and 1000 degrees C for 15 minutes in air, and the weight change after heat treatment was measured by electric automatic balance. The weight change during heating was measured by thermogravimetric measurement (TG). The products after heat treatment were characterized by X-ray diffraction and scanning electron microscopy (SEM). The results are summarized as follows: The Ni-Cr binary alloys were classified into three types of Cr ranging from 5 to 20 wt%, Cr 25% and Cr from 30 wt% to 40 wt% according to the weight gains with oxidation. In the case of the more than 25 wt% Cr content of the Ni-Cr binary alloys, the weight gain was extremely low and the heating temperature effects on the weight change were also small. X-ray diffraction study showed that NiO, NiCr2O4 and Cr2O3 formed on the surface of the Ni-Cr binary alloys whose composition of Cr ranged from 5 to 25 wt%, whereas NiO and NiCr2O4 rarely formed on the Ni-Cr binary alloys whose composition of Cr ranged from 30 to 40 wt%. This suggests that the formation of Cr2O3 prevents the formation of NiO on the alloy with a high Cr content. The weight gain of the Ni-Cr-Mo ternary alloys was smaller than that of the Ni-Cr binary alloys without Mo, and the temperature effects on the weight gain of the Ni-Cr-Mo ternary alloys were different for each Cr content. However, the effect of the amounts of Mo was small. NiO, NiCr2O4, Cr2O3 and MoO2 were identified by X-ray diffraction on the surface of the Ni-Cr-Mo ternary alloys. According to the SEM observation, it seems that NiO was formed at the outermost layer, both NiCr2O4 and Cr2O3 at the inside layer, and MoO2 at the innermost layer. The formation of both NiO and Cr2O3 on the Ni-Cr-Mo ternary alloys was restrained compared with that of the Ni-Cr binary alloys. However, the adhesion of oxides to the Ni-Cr-Mo ternary alloys was lower than that of the Ni-Cr binary alloys.
NASA Astrophysics Data System (ADS)
Schudlo, Larissa C.; Chau, Tom
2014-02-01
Objective. Near-infrared spectroscopy (NIRS) has recently gained attention as a modality for brain-computer interfaces (BCIs), which may serve as an alternative access pathway for individuals with severe motor impairments. For NIRS-BCIs to be used as a real communication pathway, reliable online operation must be achieved. Yet, only a limited number of studies have been conducted online to date. These few studies were carried out under a synchronous paradigm and did not accommodate an unconstrained resting state, precluding their practical clinical implication. Furthermore, the potentially discriminative power of spatiotemporal characteristics of activation has yet to be considered in an online NIRS system. Approach. In this study, we developed and evaluated an online system-paced NIRS-BCI which was driven by a mental arithmetic activation task and accommodated an unconstrained rest state. With a dual-wavelength, frequency domain near-infrared spectrometer, measurements were acquired over nine sites of the prefrontal cortex, while ten able-bodied participants selected letters from an on-screen scanning keyboard via intentionally controlled brain activity (using mental arithmetic). Participants were provided dynamic NIR topograms as continuous visual feedback of their brain activity as well as binary feedback of the BCI's decision (i.e. if the letter was selected or not). To classify the hemodynamic activity, temporal features extracted from the NIRS signals and spatiotemporal features extracted from the dynamic NIR topograms were used in a majority vote combination of multiple linear classifiers. Main results. An overall online classification accuracy of 77.4 ± 10.5% was achieved across all participants. The binary feedback was found to be very useful during BCI use, while not all participants found value in the continuous feedback provided. Significance. These results demonstrate that mental arithmetic is a potent mental task for driving an online system-paced NIRS-BCI. BCI feedback that reflects the classifier's decision has the potential to improve user performance. The proposed system can provide a framework for future online NIRS-BCI development and testing.
VSOP: the variable star one-shot project. I. Project presentation and first data release
NASA Astrophysics Data System (ADS)
Dall, T. H.; Foellmi, C.; Pritchard, J.; Lo Curto, G.; Allende Prieto, C.; Bruntt, H.; Amado, P. J.; Arentoft, T.; Baes, M.; Depagne, E.; Fernandez, M.; Ivanov, V.; Koesterke, L.; Monaco, L.; O'Brien, K.; Sarro, L. M.; Saviane, I.; Scharwächter, J.; Schmidtobreick, L.; Schütz, O.; Seifahrt, A.; Selman, F.; Stefanon, M.; Sterzik, M.
2007-08-01
Context: About 500 new variable stars enter the General Catalogue of Variable Stars (GCVS) every year. Most of them however lack spectroscopic observations, which remains critical for a correct assignement of the variability type and for the understanding of the object. Aims: The Variable Star One-shot Project (VSOP) is aimed at (1) providing the variability type and spectral type of all unstudied variable stars, (2) process, publish, and make the data available as automatically as possible, and (3) generate serendipitous discoveries. This first paper describes the project itself, the acquisition of the data, the dataflow, the spectroscopic analysis and the on-line availability of the fully calibrated and reduced data. We also present the results on the 221 stars observed during the first semester of the project. Methods: We used the high-resolution echelle spectrographs HARPS and FEROS in the ESO La Silla Observatory (Chile) to survey known variable stars. Once reduced by the dedicated pipelines, the radial velocities are determined from cross correlation with synthetic template spectra, and the spectral types are determined by an automatic minimum distance matching to synthetic spectra, with traditional manual spectral typing cross-checks. The variability types are determined by manually evaluating the available light curves and the spectroscopy. In the future, a new automatic classifier, currently being developed by members of the VSOP team, based on these spectroscopic data and on the photometric classifier developed for the COROT and Gaia space missions, will be used. Results: We confirm or revise spectral types of 221 variable stars from the GCVS. We identify 26 previously unknown multiple systems, among them several visual binaries with spectroscopic binary individual components. We present new individual results for the multiple systems V349 Vel and BC Gru, for the composite spectrum star V4385 Sgr, for the T Tauri star V1045 Sco, and for DM Boo which we re-classify as a BY Draconis variable. The complete data release can be accessed via the VSOP web site. Based on data obtained at the La Silla Observatory, European Southern Observatory, under program ID 077.D-0085.
Hensen, Ulf; Meyer, Tim; Haas, Jürgen; Rex, René; Vriend, Gert; Grubmüller, Helmut
2012-01-01
Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics. PMID:22606222
Method for protein structure alignment
Blankenbecler, Richard; Ohlsson, Mattias; Peterson, Carsten; Ringner, Markus
2005-02-22
This invention provides a method for protein structure alignment. More particularly, the present invention provides a method for identification, classification and prediction of protein structures. The present invention involves two key ingredients. First, an energy or cost function formulation of the problem simultaneously in terms of binary (Potts) assignment variables and real-valued atomic coordinates. Second, a minimization of the energy or cost function by an iterative method, where in each iteration (1) a mean field method is employed for the assignment variables and (2) exact rotation and/or translation of atomic coordinates is performed, weighted with the corresponding assignment variables.
Identification of two new HMXBs in the LMC: an ˜2013 s pulsar and a probable SFXT
NASA Astrophysics Data System (ADS)
Vasilopoulos, G.; Maitra, C.; Haberl, F.; Hatzidimitriou, D.; Petropoulou, M.
2018-03-01
We report on the X-ray and optical properties of two high-mass X-ray binary systems located in the Large Magellanic Cloud (LMC). Based on the obtained optical spectra, we classify the massive companion as a supergiant star in both systems. Timing analysis of the X-ray events collected by XMM-Newton revealed the presence of coherent pulsations (spin period ˜2013 s) for XMMU J053108.3-690923 and fast flaring behaviour for XMMU J053320.8-684122. The X-ray spectra of both systems can be modelled sufficiently well by an absorbed power law, yielding hard spectra and high intrinsic absorption from the environment of the systems. Due to their combined X-ray and optical properties, we classify both systems as SgXRBs: the 19th confirmed X-ray pulsar and a probable supergiant fast X-ray transient in the LMC, the second such candidate outside our Galaxy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brunton, Steven
Optical systems provide valuable information for evaluating interactions and associations between organisms and MHK energy converters and for capturing potentially rare encounters between marine organisms and MHK device. The deluge of optical data from cabled monitoring packages makes expert review time-consuming and expensive. We propose algorithms and a processing framework to automatically extract events of interest from underwater video. The open-source software framework consists of background subtraction, filtering, feature extraction and hierarchical classification algorithms. This principle classification pipeline was validated on real-world data collected with an experimental underwater monitoring package. An event detection rate of 100% was achieved using robustmore » principal components analysis (RPCA), Fourier feature extraction and a support vector machine (SVM) binary classifier. The detected events were then further classified into more complex classes – algae | invertebrate | vertebrate, one species | multiple species of fish, and interest rank. Greater than 80% accuracy was achieved using a combination of machine learning techniques.« less
Data Format Classification for Autonomous Software Defined Radios
NASA Technical Reports Server (NTRS)
Simon, Marvin; Divsalar, Dariush
2005-01-01
We present maximum-likelihood (ML) coherent and noncoherent classifiers for discriminating between NRZ and Manchester coded (biphase-L) data formats for binary phase-shift-keying (BPSK) modulation. Such classification of the data format is an essential element of so-called autonomous software defined radio (SDR) receivers (similar to so-called cognitive SDR receivers in the military application) where it is desired that the receiver perform each of its functions by extracting the appropriate knowledge from the received signal and, if possible, with as little information of the other signal parameters as possible. Small and large SNR approximations to the ML classifiers are also proposed that lead to simpler implementation with comparable performance in their respective SNR regions. Numerical performance results obtained by a combination of computer simulation and, wherever possible, theoretical analyses, are presented and comparisons are made among the various configurations based on the probability of misclassification as a performance criterion. Extensions to other modulations such as QPSK are readily accomplished using the same methods described in the paper.
Extending cluster Lot Quality Assurance Sampling designs for surveillance programs
Hund, Lauren; Pagano, Marcello
2014-01-01
Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible non-parametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. PMID:24633656
NASA Astrophysics Data System (ADS)
Uzbaş, Betül; Arslan, Ahmet
2018-04-01
Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-01-01
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-07-10
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Extending cluster lot quality assurance sampling designs for surveillance programs.
Hund, Lauren; Pagano, Marcello
2014-07-20
Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance on the basis of the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible nonparametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Ribó, M.; Negueruela, I.; Blay, P.; Torrejón, J. M.; Reig, P.
2006-04-01
Massive X-ray binaries are usually classified by the properties of the donor star in classical, supergiant and Be X-ray binaries, the main difference being the mass transfer mechanism between the two components. The massive X-ray binary 4U 2206+54 does not fit in any of these groups, and deserves a detailed study to understand how the transfer of matter and the accretion on to the compact object take place. To this end we study an IUE spectrum of the donor and obtain a wind terminal velocity (v_∞) of ~350 km s-1, which is abnormally slow for its spectral type. We also analyse here more than 9 years of available RXTE/ASM data. We study the long-term X-ray variability of the source and find it to be similar to that observed in the wind-fed supergiant system Vela X-1, reinforcing the idea that 4U 2206+54 is also a wind-fed system. We find a quasi-period decreasing from ~270 to ~130 d, noticed in previous works but never studied in detail. We discuss possible scenarios for its origin and conclude that long-term quasi-periodic variations in the mass-loss rate of the primary are probably driving such variability in the measured X-ray flux. We obtain an improved orbital period of P_orb=9.5591±0.0007 d with maximum X-ray flux at MJD 51856.6±0.1. Our study of the orbital X-ray variability in the context of wind accretion suggests a moderate eccentricity around 0.15 for this binary system. Moreover, the low value of v_∞ solves the long-standing problem of the relatively high X-ray luminosity for the unevolved nature of the donor, BD +53°2790, which is probably an O9.5 V star. We note that changes in v_∞ and/or the mass-loss rate of the primary alone cannot explain the different patterns displayed by the orbital X-ray variability. We finally emphasize that 4U 2206+54, together with LS 5039, could be part of a new population of wind-fed HMXBs with main sequence donors, the natural progenitors of supergiant X-ray binaries.
Protein-Protein Interaction Assays with Effector-GFP Fusions in Nicotiana benthamiana.
Petre, Benjamin; Win, Joe; Menke, Frank L H; Kamoun, Sophien
2017-01-01
Plant parasites secrete proteins known as effectors into host tissues to manipulate host cell structures and functions. One of the major goals in effector biology is to determine the host cell compartments and the protein complexes in which effectors accumulate. Here, we describe a five-step pipeline that we routinely use in our lab to achieve this goal, which consists of (1) Golden Gate assembly of pathogen effector-green fluorescent protein (GFP) fusions into binary vectors, (2) Agrobacterium-mediated heterologous protein expression in Nicotiana benthamiana leaf cells, (3) laser-scanning confocal microscopy assay, (4) anti-GFP coimmunoprecipitation-liquid chromatography-tandem mass spectrometry (coIP/MS) assay, and (5) anti-GFP western blotting. This pipeline is suitable for rapid, cost-effective, and medium-throughput screening of pathogen effectors in planta.
Recognition of emotions using multimodal physiological signals and an ensemble deep learning model.
Yin, Zhong; Zhao, Mengyuan; Wang, Yongxiong; Yang, Jingdong; Zhang, Jianhua
2017-03-01
Using deep-learning methodologies to analyze multimodal physiological signals becomes increasingly attractive for recognizing human emotions. However, the conventional deep emotion classifiers may suffer from the drawback of the lack of the expertise for determining model structure and the oversimplification of combining multimodal feature abstractions. In this study, a multiple-fusion-layer based ensemble classifier of stacked autoencoder (MESAE) is proposed for recognizing emotions, in which the deep structure is identified based on a physiological-data-driven approach. Each SAE consists of three hidden layers to filter the unwanted noise in the physiological features and derives the stable feature representations. An additional deep model is used to achieve the SAE ensembles. The physiological features are split into several subsets according to different feature extraction approaches with each subset separately encoded by a SAE. The derived SAE abstractions are combined according to the physiological modality to create six sets of encodings, which are then fed to a three-layer, adjacent-graph-based network for feature fusion. The fused features are used to recognize binary arousal or valence states. DEAP multimodal database was employed to validate the performance of the MESAE. By comparing with the best existing emotion classifier, the mean of classification rate and F-score improves by 5.26%. The superiority of the MESAE against the state-of-the-art shallow and deep emotion classifiers has been demonstrated under different sizes of the available physiological instances. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A classifier neural network for rotordynamic systems
NASA Astrophysics Data System (ADS)
Ganesan, R.; Jionghua, Jin; Sankar, T. S.
1995-07-01
A feedforward backpropagation neural network is formed to identify the stability characteristic of a high speed rotordynamic system. The principal focus resides in accounting for the instability due to the bearing clearance effects. The abnormal operating condition of 'normal-loose' Coulomb rub, that arises in units supported by hydrodynamic bearings or rolling element bearings, is analysed in detail. The multiple-parameter stability problem is formulated and converted to a set of three-parameter algebraic inequality equations. These three parameters map the wider range of physical parameters of commonly-used rotordynamic systems into a narrow closed region, that is used in the supervised learning of the neural network. A binary-type state of the system is expressed through these inequalities that are deduced from the analytical simulation of the rotor system. Both the hidden layer as well as functional-link networks are formed and the superiority of the functional-link network is established. Considering the real time interpretation and control of the rotordynamic system, the network reliability and the learning time are used as the evaluation criteria to assess the superiority of the functional-link network. This functional-link network is further trained using the parameter values of selected rotor systems, and the classifier network is formed. The success rate of stability status identification is obtained to assess the potentials of this classifier network. The classifier network is shown that it can also be used, for control purposes, as an 'advisory' system that suggests the optimum way of parameter adjustment.
Improved semi-supervised online boosting for object tracking
NASA Astrophysics Data System (ADS)
Li, Yicui; Qi, Lin; Tan, Shukun
2016-10-01
The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.
Quality grading of Atlantic salmon (Salmo salar) by computer vision.
Misimi, E; Erikson, U; Skavhaug, A
2008-06-01
In this study, we present a promising method of computer vision-based quality grading of whole Atlantic salmon (Salmo salar). Using computer vision, it was possible to differentiate among different quality grades of Atlantic salmon based on the external geometrical information contained in the fish images. Initially, before the image acquisition, the fish were subjectively graded and labeled into grading classes by a qualified human inspector in the processing plant. Prior to classification, the salmon images were segmented into binary images, and then feature extraction was performed on the geometrical parameters of the fish from the grading classes. The classification algorithm was a threshold-based classifier, which was designed using linear discriminant analysis. The performance of the classifier was tested by using the leave-one-out cross-validation method, and the classification results showed a good agreement between the classification done by human inspectors and by the computer vision. The computer vision-based method classified correctly 90% of the salmon from the data set as compared with the classification by human inspector. Overall, it was shown that computer vision can be used as a powerful tool to grade Atlantic salmon into quality grades in a fast and nondestructive manner by a relatively simple classifier algorithm. The low cost of implementation of today's advanced computer vision solutions makes this method feasible for industrial purposes in fish plants as it can replace manual labor, on which grading tasks still rely.
Weyhe, Martin; Eschen-Lippold, Lennart; Pecher, Pascal; Scheel, Dierk; Lee, Justin
2014-01-01
Out of the 34 members of the VQ-motif-containing protein (VQP) family, 10 are phosphorylated by the mitogen-activated protein kinases (MAPKs), MPK3 and MPK6. Most of these MPK3/6-targeted VQPs (MVQs) interacted with specific sub-groups of WRKY transcription factors in a VQ-motif-dependent manner. In some cases, the MAPK appears to phosphorylate either the MVQ or the WRKY, while in other cases, both proteins have been reported to act as MAPK substrates. We propose a network of dynamic interactions between members from the MAPK, MVQ and WRKY families - either as binary or as tripartite interactions. The compositions of the WRKY-MVQ transcriptional protein complexes may change - for instance, through MPK3/6-mediated modulation of protein stability - and therefore control defense gene transcription.
Using ensemble of classifiers for predicting HIV protease cleavage sites in proteins.
Nanni, Loris; Lumini, Alessandra
2009-03-01
The focus of this work is the use of ensembles of classifiers for predicting HIV protease cleavage sites in proteins. Due to the complex relationships in the biological data, several recent works show that often ensembles of learning algorithms outperform stand-alone methods. We show that the fusion of approaches based on different encoding models can be useful for improving the performance of this classification problem. In particular, in this work four different feature encodings for peptides are described and tested. An extensive evaluation on a large dataset according to a blind testing protocol is reported which demonstrates how different feature extraction methods and classifiers can be combined for obtaining a robust and reliable system. The comparison with other stand-alone approaches allows quantifying the performance improvement obtained by the ensembles proposed in this work.
Idkaidek, Nasir M.
2013-01-01
The aim of this commentary is to investigate the interplay of Biopharmaceutics Classification System (BCS), Biopharmaceutics Drug Disposition Classification System (BDDCS) and Salivary Excretion Classification System (SECS). BCS first classified drugs based on permeability and solubility for the purpose of predicting oral drug absorption. Then BDDCS linked permeability with hepatic metabolism and classified drugs based on metabolism and solubility for the purpose of predicting oral drug disposition. On the other hand, SECS classified drugs based on permeability and protein binding for the purpose of predicting the salivary excretion of drugs. The role of metabolism, rather than permeability, on salivary excretion is investigated and the results are not in agreement with BDDCS. Conclusion The proposed Salivary Excretion Classification System (SECS) can be used as a guide for drug salivary excretion based on permeability (not metabolism) and protein binding. PMID:24493977
Yin, Zhong; Zhang, Jianhua
2014-07-01
Identifying the abnormal changes of mental workload (MWL) over time is quite crucial for preventing the accidents due to cognitive overload and inattention of human operators in safety-critical human-machine systems. It is known that various neuroimaging technologies can be used to identify the MWL variations. In order to classify MWL into a few discrete levels using representative MWL indicators and small-sized training samples, a novel EEG-based approach by combining locally linear embedding (LLE), support vector clustering (SVC) and support vector data description (SVDD) techniques is proposed and evaluated by using the experimentally measured data. The MWL indicators from different cortical regions are first elicited by using the LLE technique. Then, the SVC approach is used to find the clusters of these MWL indicators and thereby to detect MWL variations. It is shown that the clusters can be interpreted as the binary class MWL. Furthermore, a trained binary SVDD classifier is shown to be capable of detecting slight variations of those indicators. By combining the two schemes, a SVC-SVDD framework is proposed, where the clear-cut (smaller) cluster is detected by SVC first and then a subsequent SVDD model is utilized to divide the overlapped (larger) cluster into two classes. Finally, three-class MWL levels (low, normal and high) can be identified automatically. The experimental data analysis results are compared with those of several existing methods. It has been demonstrated that the proposed framework can lead to acceptable computational accuracy and has the advantages of both unsupervised and supervised training strategies. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Yao, Dongren; Calhoun, Vince D; Fu, Zening; Du, Yuhui; Sui, Jing
2018-05-15
Discriminating Alzheimer's disease (AD) from its prodromal form, mild cognitive impairment (MCI), is a significant clinical problem that may facilitate early diagnosis and intervention, in which a more challenging issue is to classify MCI subtypes, i.e., those who eventually convert to AD (cMCI) versus those who do not (MCI). To solve this difficult 4-way classification problem (AD, MCI, cMCI and healthy controls), a competition was hosted by Kaggle to invite the scientific community to apply their machine learning approaches on pre-processed sets of T1-weighted magnetic resonance images (MRI) data and the demographic information from the international Alzheimer's disease neuroimaging initiative (ADNI) database. This paper summarizes our competition results. We first proposed a hierarchical process by turning the 4-way classification into five binary classification problems. A new feature selection technology based on relative importance was also proposed, aiming to identify a more informative and concise subset from 426 sMRI morphometric and 3 demographic features, to ensure each binary classifier to achieve its highest accuracy. As a result, about 2% of the original features were selected to build a new feature space, which can achieve the final four-way classification with a 54.38% accuracy on testing data through hierarchical grouping, higher than several alternative methods in comparison. More importantly, the selected discriminative features such as hippocampal volume, parahippocampal surface area, and medial orbitofrontal thickness, etc. as well as the MMSE score, are reasonable and consistent with those reported in AD/MCI deficits. In summary, the proposed method provides a new framework for multi-way classification using hierarchical grouping and precise feature selection. Copyright © 2018 Elsevier B.V. All rights reserved.
Discriminative Learning of Receptive Fields from Responses to Non-Gaussian Stimulus Ensembles
Meyer, Arne F.; Diepenbrock, Jan-Philipp; Happel, Max F. K.; Ohl, Frank W.; Anemüller, Jörn
2014-01-01
Analysis of sensory neurons' processing characteristics requires simultaneous measurement of presented stimuli and concurrent spike responses. The functional transformation from high-dimensional stimulus space to the binary space of spike and non-spike responses is commonly described with linear-nonlinear models, whose linear filter component describes the neuron's receptive field. From a machine learning perspective, this corresponds to the binary classification problem of discriminating spike-eliciting from non-spike-eliciting stimulus examples. The classification-based receptive field (CbRF) estimation method proposed here adapts a linear large-margin classifier to optimally predict experimental stimulus-response data and subsequently interprets learned classifier weights as the neuron's receptive field filter. Computational learning theory provides a theoretical framework for learning from data and guarantees optimality in the sense that the risk of erroneously assigning a spike-eliciting stimulus example to the non-spike class (and vice versa) is minimized. Efficacy of the CbRF method is validated with simulations and for auditory spectro-temporal receptive field (STRF) estimation from experimental recordings in the auditory midbrain of Mongolian gerbils. Acoustic stimulation is performed with frequency-modulated tone complexes that mimic properties of natural stimuli, specifically non-Gaussian amplitude distribution and higher-order correlations. Results demonstrate that the proposed approach successfully identifies correct underlying STRFs, even in cases where second-order methods based on the spike-triggered average (STA) do not. Applied to small data samples, the method is shown to converge on smaller amounts of experimental recordings and with lower estimation variance than the generalized linear model and recent information theoretic methods. Thus, CbRF estimation may prove useful for investigation of neuronal processes in response to natural stimuli and in settings where rapid adaptation is induced by experimental design. PMID:24699631
Discriminative learning of receptive fields from responses to non-Gaussian stimulus ensembles.
Meyer, Arne F; Diepenbrock, Jan-Philipp; Happel, Max F K; Ohl, Frank W; Anemüller, Jörn
2014-01-01
Analysis of sensory neurons' processing characteristics requires simultaneous measurement of presented stimuli and concurrent spike responses. The functional transformation from high-dimensional stimulus space to the binary space of spike and non-spike responses is commonly described with linear-nonlinear models, whose linear filter component describes the neuron's receptive field. From a machine learning perspective, this corresponds to the binary classification problem of discriminating spike-eliciting from non-spike-eliciting stimulus examples. The classification-based receptive field (CbRF) estimation method proposed here adapts a linear large-margin classifier to optimally predict experimental stimulus-response data and subsequently interprets learned classifier weights as the neuron's receptive field filter. Computational learning theory provides a theoretical framework for learning from data and guarantees optimality in the sense that the risk of erroneously assigning a spike-eliciting stimulus example to the non-spike class (and vice versa) is minimized. Efficacy of the CbRF method is validated with simulations and for auditory spectro-temporal receptive field (STRF) estimation from experimental recordings in the auditory midbrain of Mongolian gerbils. Acoustic stimulation is performed with frequency-modulated tone complexes that mimic properties of natural stimuli, specifically non-Gaussian amplitude distribution and higher-order correlations. Results demonstrate that the proposed approach successfully identifies correct underlying STRFs, even in cases where second-order methods based on the spike-triggered average (STA) do not. Applied to small data samples, the method is shown to converge on smaller amounts of experimental recordings and with lower estimation variance than the generalized linear model and recent information theoretic methods. Thus, CbRF estimation may prove useful for investigation of neuronal processes in response to natural stimuli and in settings where rapid adaptation is induced by experimental design.