Science.gov

Sample records for machine learning methods

  1. Machine learning methods in chemoinformatics

    PubMed Central

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  2. Machine learning methods for predictive proteomics.

    PubMed

    Barla, Annalisa; Jurman, Giuseppe; Riccadonna, Samantha; Merler, Stefano; Chierici, Marco; Furlanello, Cesare

    2008-03-01

    The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies. PMID:18310105

  3. Studying depression using imaging and machine learning methods

    PubMed Central

    Patel, Meenal J.; Khalaf, Alexander; Aizenstein, Howard J.

    2015-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies. PMID:26759786

  4. In silico machine learning methods in drug development.

    PubMed

    Dobchev, Dimitar A; Pillai, Girinath G; Karelson, Mati

    2014-01-01

    Machine learning (ML) computational methods for predicting compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are being increasingly applied in drug discovery and evaluation. Recently, machine learning techniques such as artificial neural networks, support vector machines and genetic programming have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic targets. These methods are particularly useful for screening compound libraries of diverse chemical structures, "noisy" and high-dimensional data to complement QSAR methods, and in cases of unavailable receptor 3D structure to complement structure-based methods. A variety of studies have demonstrated the potential of machine-learning methods for predicting compounds as potential drug candidates. The present review is intended to give an overview of the strategies and current progress in using machine learning methods for drug design and the potential of the respective model development tools. We also regard a number of applications of the machine learning algorithms based on common classes of diseases. PMID:25262800

  5. Risk prediction with machine learning and regression methods.

    PubMed

    Steyerberg, Ewout W; van der Ploeg, Tjeerd; Van Calster, Ben

    2014-07-01

    This is a discussion of issues in risk prediction based on the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler. PMID:24615859

  6. Machine Learning Methods for Articulatory Data

    ERIC Educational Resources Information Center

    Berry, Jeffrey James

    2012-01-01

    Humans make use of more than just the audio signal to perceive speech. Behavioral and neurological research has shown that a person's knowledge of how speech is produced influences what is perceived. With methods for collecting articulatory data becoming more ubiquitous, methods for extracting useful information are needed to make this data…

  7. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework. PMID:25807571

  8. A survey of machine learning methods for secondary and supersecondary protein structure prediction.

    PubMed

    Ho, Hui Kian; Zhang, Lei; Ramamohanarao, Kotagiri; Martin, Shawn

    2013-01-01

    In this chapter we provide a survey of protein secondary and supersecondary structure prediction using methods from machine learning. Our focus is on machine learning methods applicable to β-hairpin and β-sheet prediction, but we also discuss methods for more general supersecondary structure prediction. We provide background on the secondary and supersecondary structures that we discuss, the features used to describe them, and the basic theory behind the machine learning methods used. We survey the machine learning methods available for secondary and supersecondary structure prediction and compare them where possible. PMID:22987348

  9. Introduction to machine learning.

    PubMed

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods. PMID:24272434

  10. A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology

    PubMed Central

    Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi

    2013-01-01

    Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease. PMID:24228248

  11. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.

    PubMed

    Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi; Salleh, Abdul Hakim Mohamed

    2013-01-01

    Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease. PMID:24228248

  12. A novel virtual viewpoint merging method based on machine learning

    NASA Astrophysics Data System (ADS)

    Zheng, Di; Peng, Zongju; Wang, Hui; Jiang, Gangyi; Chen, Fen

    2014-11-01

    In multi-view video system, multiple video plus depth is main data format of 3D scene representation. Continuous virtual views can be generated by using depth image based rendering (DIBR) technique. DIBR process includes geometric mapping, hole filling and merging. Unique weights, inversely proportional to the distance between the virtual and real cameras, are used to merge the virtual views. However, the weights might not the optimal ones in terms of virtual view quality. In this paper, a novel virtual view merging algorithm is proposed. In the proposed algorithm, machine learning method is utilized to establish an optimal weight model. In the model, color, depth, color gradient and sequence parameters are taken into consideration. Firstly, we render the same virtual view from left and right views, and select the training samples by using a threshold. Then, the eigenvalues of the samples are extracted and the optimal merging weights are calculated as training labels. Finally, support vector classifier (SVC) is adopted to establish the model which is used for guiding virtual views rendering. Experimental results show that the proposed method can improve the quality of virtual views for most sequences. Especially, it is effective in the case of large distance between the virtual and real cameras. And compared to the original method of virtual view synthesis, the proposed method can obtain more than 0.1dB gain for some sequences.

  13. Detecting abbreviations in discharge summaries using machine learning methods.

    PubMed

    Wu, Yonghui; Rosenbloom, S Trent; Denny, Joshua C; Miller, Randolph A; Mani, Subramani; Giuse, Dario A; Xu, Hua

    2011-01-01

    Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%. PMID:22195219

  14. Machine learning methods for quantitative analysis of Raman spectroscopy data

    NASA Astrophysics Data System (ADS)

    Madden, Michael G.; Ryder, Alan G.

    2003-03-01

    The automated identification and quantification of illicit materials using Raman spectroscopy is of significant importance for law enforcement agencies. This paper explores the use of Machine Learning (ML) methods in comparison with standard statistical regression techniques for developing automated identification methods. In this work, the ML task is broken into two sub-tasks, data reduction and prediction. In well-conditioned data, the number of samples should be much larger than the number of attributes per sample, to limit the degrees of freedom in predictive models. In this spectroscopy data, the opposite is normally true. Predictive models based on such data have a high number of degrees of freedom, which increases the risk of models over-fitting to the sample data and having poor predictive power. In the work described here, an approach to data reduction based on Genetic Algorithms is described. For the prediction sub-task, the objective is to estimate the concentration of a component in a mixture, based on its Raman spectrum and the known concentrations of previously seen mixtures. Here, Neural Networks and k-Nearest Neighbours are used for prediction. Preliminary results are presented for the problem of estimating the concentration of cocaine in solid mixtures, and compared with previously published results in which statistical analysis of the same dataset was performed. Finally, this paper demonstrates how more accurate results may be achieved by using an ensemble of prediction techniques.

  15. Detecting Abbreviations in Discharge Summaries using Machine Learning Methods

    PubMed Central

    Wu, Yonghui; Rosenbloom, S. Trent; Denny, Joshua C.; Miller, Randolph A.; Mani, Subramani; Giuse, Dario A.; Xu, Hua

    2011-01-01

    Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%. PMID:22195219

  16. Survey of Machine Learning Methods for Database Security

    NASA Astrophysics Data System (ADS)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  17. Paradigms for machine learning

    NASA Technical Reports Server (NTRS)

    Schlimmer, Jeffrey C.; Langley, Pat

    1991-01-01

    Five paradigms are described for machine learning: connectionist (neural network) methods, genetic algorithms and classifier systems, empirical methods for inducing rules and decision trees, analytic learning methods, and case-based approaches. Some dimensions are considered along with these paradigms vary in their approach to learning, and the basic methods are reviewed that are used within each framework, together with open research issues. It is argued that the similarities among the paradigms are more important than their differences, and that future work should attempt to bridge the existing boundaries. Finally, some recent developments in the field of machine learning are discussed, and their impact on both research and applications is examined.

  18. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. PMID:27049046

  19. Floor-Fractured Craters through Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Thorey, C.

    2015-12-01

    Floor-fractured craters are impact craters that have undergone post impact deformations. They are characterized by shallow floors with a plate-like or convex appearance, wide floor moats, and radial, concentric, and polygonal floor-fractures. While the origin of these deformations has long been debated, it is now generally accepted that they are the result of the emplacement of shallow magmatic intrusions below their floor. These craters thus constitute an efficient tool to probe the importance of intrusive magmatism from the lunar surface. The most recent catalog of lunar-floor fractured craters references about 200 of them, mainly located around the lunar maria Herein, we will discuss the possibility of using machine learning algorithms to try to detect new floor-fractured craters on the Moon among the 60000 craters referenced in the most recent catalogs. In particular, we will use the gravity field provided by the Gravity Recovery and Interior Laboratory (GRAIL) mission, and the topographic dataset obtained from the Lunar Orbiter Laser Altimeter (LOLA) instrument to design a set of representative features for each crater. We will then discuss the possibility to design a binary supervised classifier, based on these features, to discriminate between the presence or absence of crater-centered intrusion below a specific crater. First predictions from different classifier in terms of their accuracy and uncertainty will be presented.

  20. Predicting Coronal Mass Ejections Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Bobra, M. G.; Ilonidis, S.

    2016-04-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections (CMEs). Usually, solar active regions that produce large flares will also produce a CME, but this is not always true. Despite advances in numerical modeling, it is still unclear which circumstances will produce a CME. Therefore, it is worthwhile to empirically determine which features distinguish flares associated with CMEs from flares that are not. At this time, no extensive study has used physically meaningful features of active regions to distinguish between these two populations. As such, we attempt to do so by using features derived from (1) photospheric vector magnetic field data taken by the Solar Dynamics Observatory’s Helioseismic and Magnetic Imager instrument and (2) X-ray flux data from the Geostationary Operational Environmental Satellite’s X-ray Flux instrument. We build a catalog of active regions that either produced both a flare and a CME (the positive class) or simply a flare (the negative class). We then use machine-learning algorithms to (1) determine which features distinguish these two populations, and (2) forecast whether an active region that produces an M- or X-class flare will also produce a CME. We compute the True Skill Statistic, a forecast verification metric, and find that it is a relatively high value of ∼0.8 ± 0.2. We conclude that a combination of six parameters, which are all intensive in nature, will capture most of the relevant information contained in the photospheric magnetic field.

  1. Recent progresses in the exploration of machine learning methods as in-silico ADME prediction tools.

    PubMed

    Tao, L; Zhang, P; Qin, C; Chen, S Y; Zhang, C; Chen, Z; Zhu, F; Yang, S Y; Wei, Y Q; Chen, Y Z

    2015-06-23

    In-silico methods have been explored as potential tools for assessing ADME and ADME regulatory properties particularly in early drug discovery stages. Machine learning methods, with their ability in classifying diverse structures and complex mechanisms, are well suited for predicting ADME and ADME regulatory properties. Recent efforts have been directed at the broadening of application scopes and the improvement of predictive performance with particular focuses on the coverage of ADME properties, and exploration of more diversified training data, appropriate molecular features, and consensus modeling. Moreover, several online machine learning ADME prediction servers have emerged. Here we review these progresses and discuss the performances, application prospects and challenges of exploring machine learning methods as useful tools in predicting ADME and ADME regulatory properties. PMID:26037068

  2. Web Mining: Machine Learning for Web Applications.

    ERIC Educational Resources Information Center

    Chen, Hsinchun; Chau, Michael

    2004-01-01

    Presents an overview of machine learning research and reviews methods used for evaluating machine learning systems. Ways that machine-learning algorithms were used in traditional information retrieval systems in the "pre-Web" era are described, and the field of Web mining and how machine learning has been used in different Web mining applications…

  3. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines

    PubMed Central

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  4. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines.

    PubMed

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  5. Machine Methods for Acquiring, Learning, and Applying Knowledge.

    ERIC Educational Resources Information Center

    Hayes-Roth, Frederick; And Others

    A research plan for identifying and acting upon constraints that impede the development of knowledge-based intelligent systems is described. The two primary problems identified are knowledge programming, the task of which is to create an intelligent system that does what an expert says it should, and learning, the problem requiring the criticizing…

  6. Can Machine Learning Methods Predict Extubation Outcome in Premature Infants as well as Clinicians?

    PubMed Central

    Mueller, Martina; Almeida, Jonas S.; Stanislaus, Romesh; Wagner, Carol L.

    2014-01-01

    Rationale Though treatment of the prematurely born infant breathing with assistance of a mechanical ventilator has much advanced in the past decades, predicting extubation outcome at a given point in time remains challenging. Numerous studies have been conducted to identify predictors for extubation outcome; however, the rate of infants failing extubation attempts has not declined. Objective To develop a decision-support tool for the prediction of extubation outcome in premature infants using a set of machine learning algorithms Methods A dataset assembled from 486 premature infants on mechanical ventilation was used to develop predictive models using machine learning algorithms such as artificial neural networks (ANN), support vector machine (SVM), naïve Bayesian classifier (NBC), boosted decision trees (BDT), and multivariable logistic regression (MLR). Performance of all models was evaluated using area under the curve (AUC). Results For some of the models (ANN, MLR and NBC) results were satisfactory (AUC: 0.63–0.76); however, two algorithms (SVM and BDT) showed poor performance with AUCs of ~0.5. Conclusion Clinician's predictions still outperform machine learning due to the complexity of the data and contextual information that may not be captured in clinical data used as input for the development of the machine learning algorithms. Inclusion of preprocessing steps in future studies may improve the performance of prediction models. PMID:25419493

  7. Solar Flare Predictions Using Time Series of SDO/HMI Observations and Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Ilonidis, Stathis; Bobra, Monica; Couvidat, Sebastien

    2015-08-01

    Solar active regions are dynamic systems that can rapidly evolve in time and produce flare eruptions. The temporal evolution of an active region can provide important information about its potential to produce major flares. In this study, we build a flare forecasting model using supervised machine learning methods and time series of SDO/HMI data for all the flaring regions with magnitude M1.0 or higher that have been observed with HMI and several thousand non-flaring regions. We define and compute hundreds of features that characterize the temporal evolution of physical properties related to the size, non-potentiality, and complexity of the active region, as well as its flaring history, for several days before the flare eruption. Using these features, we implement and test the performance of several machine learning algorithms, including support vector machines, neural networks, decision trees, discriminant analysis, and others. We also apply feature selection algorithms that aim to discard features with low predictive power and improve the performance of the machine learning methods. Our results show that support vector machines provide the best forecasts for the next 24 hours, achieving a True Skill Statistic of 0.923, an accuracy of 0.985, and a Heidke skill score of 0.861, which improve the scores obtained by Bobra and Couvidat (2015). The results of this study contribute to the development of a more reliable and fully automated data-driven flare forecasting system.

  8. e-Learning Application for Machine Maintenance Process using Iterative Method in XYZ Company

    NASA Astrophysics Data System (ADS)

    Nurunisa, Suaidah; Kurniawati, Amelia; Pramuditya Soesanto, Rayinda; Yunan Kurnia Septo Hediyanto, Umar

    2016-02-01

    XYZ Company is a company based on manufacturing part for airplane, one of the machine that is categorized as key facility in the company is Millac 5H6P. As a key facility, the machines should be assured to work well and in peak condition, therefore, maintenance process is needed periodically. From the data gathering, it is known that there are lack of competency from the maintenance staff to maintain different type of machine which is not assigned by the supervisor, this indicate that knowledge which possessed by maintenance staff are uneven. The purpose of this research is to create knowledge-based e-learning application as a realization from externalization process in knowledge transfer process to maintain the machine. The application feature are adjusted for maintenance purpose using e-learning framework for maintenance process, the content of the application support multimedia for learning purpose. QFD is used in this research to understand the needs from user. The application is built using moodle with iterative method for software development cycle and UML Diagram. The result from this research is e-learning application as sharing knowledge media for maintenance staff in the company. From the test, it is known that the application make maintenance staff easy to understand the competencies.

  9. A Distributed Learning Method for ℓ 1 -Regularized Kernel Machine over Wireless Sensor Networks.

    PubMed

    Ji, Xinrong; Hou, Cuiqin; Hou, Yibin; Gao, Fang; Wang, Shulong

    2016-01-01

    In wireless sensor networks, centralized learning methods have very high communication costs and energy consumption. These are caused by the need to transmit scattered training examples from various sensor nodes to the central fusion center where a classifier or a regression machine is trained. To reduce the communication cost, a distributed learning method for a kernel machine that incorporates ℓ 1 norm regularization ( ℓ 1 -regularized) is investigated, and a novel distributed learning algorithm for the ℓ 1 -regularized kernel minimum mean squared error (KMSE) machine is proposed. The proposed algorithm relies on in-network processing and a collaboration that transmits the sparse model only between single-hop neighboring nodes. This paper evaluates the proposed algorithm with respect to the prediction accuracy, the sparse rate of model, the communication cost and the number of iterations on synthetic and real datasets. The simulation results show that the proposed algorithm can obtain approximately the same prediction accuracy as that obtained by the batch learning method. Moreover, it is significantly superior in terms of the sparse rate of model and communication cost, and it can converge with fewer iterations. Finally, an experiment conducted on a wireless sensor network (WSN) test platform further shows the advantages of the proposed algorithm with respect to communication cost. PMID:27376298

  10. A Distributed Learning Method for ℓ1-Regularized Kernel Machine over Wireless Sensor Networks

    PubMed Central

    Ji, Xinrong; Hou, Cuiqin; Hou, Yibin; Gao, Fang; Wang, Shulong

    2016-01-01

    In wireless sensor networks, centralized learning methods have very high communication costs and energy consumption. These are caused by the need to transmit scattered training examples from various sensor nodes to the central fusion center where a classifier or a regression machine is trained. To reduce the communication cost, a distributed learning method for a kernel machine that incorporates ℓ1 norm regularization (ℓ1-regularized) is investigated, and a novel distributed learning algorithm for the ℓ1-regularized kernel minimum mean squared error (KMSE) machine is proposed. The proposed algorithm relies on in-network processing and a collaboration that transmits the sparse model only between single-hop neighboring nodes. This paper evaluates the proposed algorithm with respect to the prediction accuracy, the sparse rate of model, the communication cost and the number of iterations on synthetic and real datasets. The simulation results show that the proposed algorithm can obtain approximately the same prediction accuracy as that obtained by the batch learning method. Moreover, it is significantly superior in terms of the sparse rate of model and communication cost, and it can converge with fewer iterations. Finally, an experiment conducted on a wireless sensor network (WSN) test platform further shows the advantages of the proposed algorithm with respect to communication cost. PMID:27376298

  11. Comparisons of likelihood and machine learning methods of individual classification

    USGS Publications Warehouse

    Guinand, B.; Topchy, A.; Page, K.S.; Burnham-Curtis, M. K.; Punch, W.F.; Scribner, K.T.

    2002-01-01

    “Assignment tests” are designed to determine population membership for individuals. One particular application based on a likelihood estimate (LE) was introduced by Paetkau et al. (1995; see also Vásquez-Domínguez et al. 2001) to assign an individual to the population of origin on the basis of multilocus genotype and expectations of observing this genotype in each potential source population. The LE approach can be implemented statistically in a Bayesian framework as a convenient way to evaluate hypotheses of plausible genealogical relationships (e.g., that an individual possesses an ancestor in another population) (Dawson and Belkhir 2001;Pritchard et al. 2000; Rannala and Mountain 1997). Other studies have evaluated the confidence of the assignment (Almudevar 2000) and characteristics of genotypic data (e.g., degree of population divergence, number of loci, number of individuals, number of alleles) that lead to greater population assignment (Bernatchez and Duchesne 2000; Cornuet et al. 1999; Haig et al. 1997; Shriver et al. 1997; Smouse and Chevillon 1998). Main statistical and conceptual differences between methods leading to the use of an assignment test are given in, for example,Cornuet et al. (1999) and Rosenberg et al. (2001). Howeve

  12. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography

    PubMed Central

    Umut, İlhan; Çentik, Güven

    2016-01-01

    The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present. PMID:27213008

  13. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography.

    PubMed

    Umut, İlhan; Çentik, Güven

    2016-01-01

    The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present. PMID:27213008

  14. Machine Learning and Radiology

    PubMed Central

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  15. Machine Learning methods in fitting first-principles total energies for substitutionally disordered solid

    NASA Astrophysics Data System (ADS)

    Gao, Qin; Yao, Sanxi; Widom, Michael

    2015-03-01

    Density functional theory (DFT) provides an accurate and first-principles description of solid structures and total energies. However, it is highly time-consuming to calculate structures with hundreds of atoms in the unit cell and almost not possible to calculate thousands of atoms. We apply and adapt machine learning algorithms, including compressive sensing, support vector regression and artificial neural networks to fit the DFT total energies of substitutionally disordered boron carbide. The nonparametric kernel method is also included in our models. Our fitted total energy model reproduces the DFT energies with prediction error of around 1 meV/atom. The assumptions of these machine learning models and applications of the fitted total energies will also be discussed. Financial support from McWilliams Fellowship and the ONR-MURI under the Grant No. N00014-11-1-0678 is gratefully acknowledged.

  16. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    PubMed

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates. PMID:25874406

  17. Model-based machine learning

    PubMed Central

    Bishop, Christopher M.

    2013-01-01

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  18. Model-based machine learning.

    PubMed

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  19. An iterative learning control method with application for CNC machine tools

    SciTech Connect

    Kim, D.I.; Kim, S.

    1996-01-01

    A proportional, integral, and derivative (PID) type iterative learning controller is proposed for precise tracking control of industrial robots and computer numerical controller (CNC) machine tools performing repetitive tasks. The convergence of the output error by the proposed learning controller is guaranteed under a certain condition even when the system parameters are not known exactly and unknown external disturbances exist. As the proposed learning controller is repeatedly applied to the industrial robot or the CNC machine tool with the path-dependent repetitive task, the distance difference between the desired path and the actual tracked or machined path, which is one of the most significant factors in the evaluation of control performance, is progressively reduced. The experimental results demonstrate that the proposed learning controller can improve machining accuracy when the CNC machine tool performs repetitive machining tasks.

  20. Applications of Machine Learning in Information Retrieval.

    ERIC Educational Resources Information Center

    Cunningham, Sally Jo; Witten, Ian H.; Littin, James

    1999-01-01

    Introduces the basic ideas that underpin applications of machine learning to information retrieval. Describes applications of machine learning to text categorization. Considers how machine learning can be applied to the query-formulation process. Examines methods of document filtering, where the user specifies a query that is to be applied to an…

  1. Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications.

    PubMed

    Kruppa, Jochen; Liu, Yufeng; Diener, Hans-Christian; Holste, Theresa; Weimar, Christian; König, Inke R; Ziegler, Andreas

    2014-07-01

    Machine learning methods are applied to three different large datasets, all dealing with probability estimation problems for dichotomous or multicategory data. Specifically, we investigate k-nearest neighbors, bagged nearest neighbors, random forests for probability estimation trees, and support vector machines with the kernels of Bessel, linear, Laplacian, and radial basis type. Comparisons are made with logistic regression. The dataset from the German Stroke Study Collaboration with dichotomous and three-category outcome variables allows, in particular, for temporal and external validation. The other two datasets are freely available from the UCI learning repository and provide dichotomous outcome variables. One of them, the Cleveland Clinic Foundation Heart Disease dataset, uses data from one clinic for training and from three clinics for external validation, while the other, the thyroid disease dataset, allows for temporal validation by separating data into training and test data by date of recruitment into study. For dichotomous outcome variables, we use receiver operating characteristics, areas under the curve values with bootstrapped 95% confidence intervals, and Hosmer-Lemeshow-type figures as comparison criteria. For dichotomous and multicategory outcomes, we calculated bootstrap Brier scores with 95% confidence intervals and also compared them through bootstrapping. In a supplement, we provide R code for performing the analyses and for random forest analyses in Random Jungle, version 2.1.0. The learning machines show promising performance over all constructed models. They are simple to apply and serve as an alternative approach to logistic or multinomial logistic regression analysis. PMID:24989843

  2. [Quantitative retrieval of chlorophyll a concentration in Taihu Lake using machine learning methods].

    PubMed

    Zhang, Yu-Chao; Qian, Xin; Qian, Yu; Liu, Jian-Ping; Kong, Fan-Xiang

    2009-05-15

    We evaluated the performance of two machine learning methods, artificial neural net (ANN) and support vector machine (SVM), for estimation of chlorophyll a in Taihu Lake from remote sensing data. The theoretical analysis has been done from basic theory and learning target of these two methods first. Then two empirical algorithms have been developed to relate reflectance of MODIS to in situ concentrations of chlorophyll a. The performance of ANN and SVM is comparatively analyzed in terms of validation, stability and robustness assessment and chlorophyll a distribution of Taihu Lake from two algorithms. The root of mean square deviation (RMSE) and average relative error (ARE) of validation data is only 5.85 and 26.5% of SVM retrieval model, however, RMSE and ARE of ANN model is 13.04 and 46.8%. Stability and robustness assessment suggest that SVM provides the better performance than ANN. And the retrieval results show that the chlorophyll a distribution of the whole lake from two algorithms is similar, however, the chlorophyll a concentration in the eastern region and central region of Taihu Lake is distorted by ANN model because of the limitations, such as learning target setting and over-learning in net construction. PMID:19558096

  3. Briefing in application of machine learning methods in ion channel prediction.

    PubMed

    Lin, Hao; Chen, Wei

    2015-01-01

    In cells, ion channels are one of the most important classes of membrane proteins which allow inorganic ions to move across the membrane. A wide range of biological processes are involved and regulated by the opening and closing of ion channels. Ion channels can be classified into numerous classes and different types of ion channels exhibit different functions. Thus, the correct identification of ion channels and their types using computational methods will provide in-depth insights into their function in various biological processes. In this review, we will briefly introduce and discuss the recent progress in ion channel prediction using machine learning methods. PMID:25961077

  4. Briefing in Application of Machine Learning Methods in Ion Channel Prediction

    PubMed Central

    2015-01-01

    In cells, ion channels are one of the most important classes of membrane proteins which allow inorganic ions to move across the membrane. A wide range of biological processes are involved and regulated by the opening and closing of ion channels. Ion channels can be classified into numerous classes and different types of ion channels exhibit different functions. Thus, the correct identification of ion channels and their types using computational methods will provide in-depth insights into their function in various biological processes. In this review, we will briefly introduce and discuss the recent progress in ion channel prediction using machine learning methods. PMID:25961077

  5. Acceleration of ensemble machine learning methods using many-core devices

    NASA Astrophysics Data System (ADS)

    Tamerus, A.; Washbrook, A.; Wyeth, D.

    2015-12-01

    We present a case study into the acceleration of ensemble machine learning methods using many-core devices in collaboration with Toshiba Medical Visualisation Systems Europe (TMVSE). The adoption of GPUs to execute a key algorithm in the classification of medical image data was shown to significantly reduce overall processing time. Using a representative dataset and pre-trained decision trees as input we will demonstrate how the decision forest classification method can be mapped onto the GPU data processing model. It was found that a GPU-based version of the decision forest method resulted in over 138 times speed-up over a single-threaded CPU implementation with further improvements possible. The same GPU-based software was then directly applied to a suitably formed dataset to benefit supervised learning techniques applied in High Energy Physics (HEP) with similar improvements in performance.

  6. Similarity-based machine learning methods for predicting drug-target interactions: a brief review.

    PubMed

    Ding, Hao; Takigawa, Ichigaku; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2014-09-01

    Computationally predicting drug-target interactions is useful to select possible drug (or target) candidates for further biochemical verification. We focus on machine learning-based approaches, particularly similarity-based methods that use drug and target similarities, which show relationships among drugs and those among targets, respectively. These two similarities represent two emerging concepts, the chemical space and the genomic space. Typically, the methods combine these two types of similarities to generate models for predicting new drug-target interactions. This process is also closely related to a lot of work in pharmacogenomics or chemical biology that attempt to understand the relationships between the chemical and genomic spaces. This background makes the similarity-based approaches attractive and promising. This article reviews the similarity-based machine learning methods for predicting drug-target interactions, which are state-of-the-art and have aroused great interest in bioinformatics. We describe each of these methods briefly, and empirically compare these methods under a uniform experimental setting to explore their advantages and limitations. PMID:23933754

  7. Machine learning methods for credibility assessment of interviewees based on posturographic data.

    PubMed

    Saripalle, Sashi K; Vemulapalli, Spandana; King, Gregory W; Burgoon, Judee K; Derakhshani, Reza

    2015-01-01

    This paper discusses the advantages of using posturographic signals from force plates for non-invasive credibility assessment. The contributions of our work are two fold: first, the proposed method is highly efficient and non invasive. Second, feasibility for creating an autonomous credibility assessment system using machine-learning algorithms is studied. This study employs an interview paradigm that includes subjects responding with truthful and deceptive intent while their center of pressure (COP) signal is being recorded. Classification models utilizing sets of COP features for deceptive responses are derived and best accuracy of 93.5% for test interval is reported. PMID:26737832

  8. An Evaluation of Machine Learning Methods to Detect Malicious SCADA Communications

    SciTech Connect

    Beaver, Justin M; Borges, Raymond Charles; Buckner, Mark A

    2013-01-01

    Critical infrastructure Supervisory Control and Data Acquisition (SCADA) systems were designed to operate on closed, proprietary networks where a malicious insider posed the greatest threat potential. The centralization of control and the movement towards open systems and standards has improved the efficiency of industrial control, but has also exposed legacy SCADA systems to security threats that they were not designed to mitigate. This work explores the viability of machine learning methods in detecting the new threat scenarios of command and data injection. Similar to network intrusion detection systems in the cyber security domain, the command and control communications in a critical infrastructure setting are monitored, and vetted against examples of benign and malicious command traffic, in order to identify potential attack events. Multiple learning methods are evaluated using a dataset of Remote Terminal Unit communications, which included both normal operations and instances of command and data injection attack scenarios.

  9. Drug name recognition in biomedical texts: a machine-learning-based method.

    PubMed

    He, Linna; Yang, Zhihao; Lin, Hongfei; Li, Yanpeng

    2014-05-01

    Currently, there is an urgent need to develop a technology for extracting drug information automatically from biomedical texts, and drug name recognition is an essential prerequisite for extracting drug information. This article presents a machine-learning-based approach to recognize drug names in biomedical texts. In this approach, a drug name dictionary is first constructed with the external resource of DrugBank and PubMed. Then a semi-supervised learning method, feature coupling generalization, is used to filter this dictionary. Finally, the dictionary look-up and the condition random field method are combined to recognize drug names. Experimental results show that our approach achieves an F-score of 92.54% on the test set of DDIExtraction2011. PMID:24140287

  10. Machine Learning in Medicine.

    PubMed

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  11. A machine learning method for the prediction of receptor activation in the simulation of synapses.

    PubMed

    Montes, Jesus; Gomez, Elena; Merchán-Pérez, Angel; Defelipe, Javier; Peña, Jose-Maria

    2013-01-01

    Chemical synaptic transmission involves the release of a neurotransmitter that diffuses in the extracellular space and interacts with specific receptors located on the postsynaptic membrane. Computer simulation approaches provide fundamental tools for exploring various aspects of the synaptic transmission under different conditions. In particular, Monte Carlo methods can track the stochastic movements of neurotransmitter molecules and their interactions with other discrete molecules, the receptors. However, these methods are computationally expensive, even when used with simplified models, preventing their use in large-scale and multi-scale simulations of complex neuronal systems that may involve large numbers of synaptic connections. We have developed a machine-learning based method that can accurately predict relevant aspects of the behavior of synapses, such as the percentage of open synaptic receptors as a function of time since the release of the neurotransmitter, with considerably lower computational cost compared with the conventional Monte Carlo alternative. The method is designed to learn patterns and general principles from a corpus of previously generated Monte Carlo simulations of synapses covering a wide range of structural and functional characteristics. These patterns are later used as a predictive model of the behavior of synapses under different conditions without the need for additional computationally expensive Monte Carlo simulations. This is performed in five stages: data sampling, fold creation, machine learning, validation and curve fitting. The resulting procedure is accurate, automatic, and it is general enough to predict synapse behavior under experimental conditions that are different to the ones it has been trained on. Since our method efficiently reproduces the results that can be obtained with Monte Carlo simulations at a considerably lower computational cost, it is suitable for the simulation of high numbers of synapses and it is

  12. A Machine Learning Method for the Prediction of Receptor Activation in the Simulation of Synapses

    PubMed Central

    Montes, Jesus; Gomez, Elena; Merchán-Pérez, Angel; DeFelipe, Javier; Peña, Jose-Maria

    2013-01-01

    Chemical synaptic transmission involves the release of a neurotransmitter that diffuses in the extracellular space and interacts with specific receptors located on the postsynaptic membrane. Computer simulation approaches provide fundamental tools for exploring various aspects of the synaptic transmission under different conditions. In particular, Monte Carlo methods can track the stochastic movements of neurotransmitter molecules and their interactions with other discrete molecules, the receptors. However, these methods are computationally expensive, even when used with simplified models, preventing their use in large-scale and multi-scale simulations of complex neuronal systems that may involve large numbers of synaptic connections. We have developed a machine-learning based method that can accurately predict relevant aspects of the behavior of synapses, such as the percentage of open synaptic receptors as a function of time since the release of the neurotransmitter, with considerably lower computational cost compared with the conventional Monte Carlo alternative. The method is designed to learn patterns and general principles from a corpus of previously generated Monte Carlo simulations of synapses covering a wide range of structural and functional characteristics. These patterns are later used as a predictive model of the behavior of synapses under different conditions without the need for additional computationally expensive Monte Carlo simulations. This is performed in five stages: data sampling, fold creation, machine learning, validation and curve fitting. The resulting procedure is accurate, automatic, and it is general enough to predict synapse behavior under experimental conditions that are different to the ones it has been trained on. Since our method efficiently reproduces the results that can be obtained with Monte Carlo simulations at a considerably lower computational cost, it is suitable for the simulation of high numbers of synapses and it is

  13. Gaussian processes for machine learning.

    PubMed

    Seeger, Matthias

    2004-04-01

    Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided. PMID:15112367

  14. Estimating the complexity of 3D structural models using machine learning methods

    NASA Astrophysics Data System (ADS)

    Mejía-Herrera, Pablo; Kakurina, Maria; Royer, Jean-Jacques

    2016-04-01

    Quantifying the complexity of 3D geological structural models can play a major role in natural resources exploration surveys, for predicting environmental hazards or for forecasting fossil resources. This paper proposes a structural complexity index which can be used to help in defining the degree of effort necessary to build a 3D model for a given degree of confidence, and also to identify locations where addition efforts are required to meet a given acceptable risk of uncertainty. In this work, it is considered that the structural complexity index can be estimated using machine learning methods on raw geo-data. More precisely, the metrics for measuring the complexity can be approximated as the difficulty degree associated to the prediction of the geological objects distribution calculated based on partial information on the actual structural distribution of materials. The proposed methodology is tested on a set of 3D synthetic structural models for which the degree of effort during their building is assessed using various parameters (such as number of faults, number of part in a surface object, number of borders, ...), the rank of geological elements contained in each model, and, finally, their level of deformation (folding and faulting). The results show how the estimated complexity in a 3D model can be approximated by the quantity of partial data necessaries to simulated at a given precision the actual 3D model without error using machine learning algorithms.

  15. Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches

    PubMed Central

    Hauschild, Anne-Christin; Kopczynski, Dominik; D’Addario, Marianna; Baumbach, Jörg Ingo; Rahmann, Sven; Baumbach, Jan

    2013-01-01

    Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME). We manually generated a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors’ results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications. PMID:24957992

  16. Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image

    NASA Astrophysics Data System (ADS)

    Pirotti, F.; Sunar, F.; Piragnolo, M.

    2016-06-01

    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random

  17. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances.

    PubMed

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance. PMID:26346869

  18. Comparison of machine learning methods for data infilling in hydrological forecasting

    NASA Astrophysics Data System (ADS)

    Chacon Hurtado, Juan Carlos; Alfonso, Leonardo; Solomatine, Dimitri

    2014-05-01

    The continuous measurement of hydrological variables requires sensors that must be deployed in the field, increasing the risk of failure due to natural or anthropic conditions inherent to its deployment. The failure of these sensors will interrupt the data stream, which in operational hydrological systems might lead to unsatisfactory performance of forecasting models, or biases due to lack of information in simulation models. To mitigate this, various techniques to fill these missing values can be used, varying from simple regression techniques, to more complex machine learning methods. This research aims at exploring the performance of the latter, considering particular properties of the measurements, length of the missing data series and particular properties of the missing variable. This study is carried out in two different European catchments which differ in geographical conditions, mechanisms and monitoring frequency.

  19. Classifying Force Spectroscopy of DNA Pulling Measurements Using Supervised and Unsupervised Machine Learning Methods.

    PubMed

    Karatay, Durmus U; Zhang, Jie; Harrison, Jeffrey S; Ginger, David S

    2016-04-25

    Dynamic force spectroscopy (DFS) measurements on biomolecules typically require classifying thousands of repeated force spectra prior to data analysis. Here, we study classification of atomic force microscope-based DFS measurements using machine-learning algorithms in order to automate selection of successful force curves. Notably, we collect a data set that has a testable positive signal using photoswitch-modified DNA before and after illumination with UV (365 nm) light. We generate a feature set consisting of six properties of force-distance curves to train supervised models and use principal component analysis (PCA) for an unsupervised model. For supervised classification, we train random forest models for binary and multiclass classification of force-distance curves. Random forest models predict successful pulls with an accuracy of 94% and classify them into five classes with an accuracy of 90%. The unsupervised method using Gaussian mixture models (GMM) reaches an accuracy of approximately 80% for binary classification. PMID:27010122

  20. Classification of P-glycoprotein-interacting compounds using machine learning methods

    PubMed Central

    Prachayasittikul, Veda; Worachartcheewan, Apilak; Shoombuatong, Watshara; Prachayasittikul, Virapong; Nantasenamat, Chanin

    2015-01-01

    P-glycoprotein (Pgp) is a drug transporter that plays important roles in multidrug resistance and drug pharmacokinetics. The inhibition of Pgp has become a notable strategy for combating multidrug-resistant cancers and improving therapeutic outcomes. However, the polyspecific nature of Pgp, together with inconsistent results in experimental assays, renders the determination of endpoints for Pgp-interacting compounds a great challenge. In this study, the classification of a large set of 2,477 Pgp-interacting compounds (i.e., 1341 inhibitors, 913 non-inhibitors, 197 substrates and 26 non-substrates) was performed using several machine learning methods (i.e., decision tree induction, artificial neural network modelling and support vector machine) as a function of their physicochemical properties. The models provided good predictive performance, producing MCC values in the range of 0.739-1 for internal cross-validation and 0.665-1 for external validation. The study provided simple and interpretable models for important properties that influence the activity of Pgp-interacting compounds, which are potentially beneficial for screening and rational design of Pgp inhibitors that are of clinical importance. PMID:26862321

  1. Time and spectral analysis methods with machine learning for the authentication of digital audio recordings.

    PubMed

    Korycki, Rafal

    2013-07-10

    This paper addresses the problem of tampering detection and discusses new methods that can be used for authenticity analysis of digital audio recordings. Nowadays, the only method referred to digital audio files commonly approved by forensic experts is the ENF criterion. It consists in fluctuation analysis of the mains frequency induced in electronic circuits of recording devices. Therefore, its effectiveness is strictly dependent on the presence of mains signal in the recording, which is a rare occurrence. This article presents the existing methods of time and spectral analysis along with their modifications as proposed by the author involving spectral analysis of residual signal enhanced by machine learning algorithms. The effectiveness of tampering detection methods described in this paper is tested on a predefined music database. The results are compared graphically using ROC-like curves. Furthermore, time-frequency plots are presented and enhanced by reassignment method in purpose of visual inspection of modified recordings. Using this solution, enables analysis of minimal changes of background sounds, which may indicate tampering. PMID:23481673

  2. Comparison of Machine Learning methods for incipient motion in gravel bed rivers

    NASA Astrophysics Data System (ADS)

    Valyrakis, Manousos

    2013-04-01

    Soil erosion and sediment transport of natural gravel bed streams are important processes which affect both the morphology as well as the ecology of earth's surface. For gravel bed rivers at near incipient flow conditions, particle entrainment dynamics are highly intermittent. This contribution reviews the use of modern Machine Learning (ML) methods implemented for short term prediction of entrainment instances of individual grains exposed in fully developed near boundary turbulent flows. Results obtained by network architectures of variable complexity based on two different ML methods namely the Artificial Neural Network (ANN) and the Adaptive Neuro-Fuzzy Inference System (ANFIS) are compared in terms of different error and performance indices, computational efficiency and complexity as well as predictive accuracy and forecast ability. Different model architectures are trained and tested with experimental time series obtained from mobile particle flume experiments. The experimental setup consists of a Laser Doppler Velocimeter (LDV) and a laser optics system, which acquire data for the instantaneous flow and particle response respectively, synchronously. The first is used to record the flow velocity components directly upstream of the test particle, while the later tracks the particle's displacements. The lengthy experimental data sets (millions of data points) are split into the training and validation subsets used to perform the corresponding learning and testing of the models. It is demonstrated that the ANFIS hybrid model, which is based on neural learning and fuzzy inference principles, better predicts the critical flow conditions above which sediment transport is initiated. In addition, it is illustrated that empirical knowledge can be extracted, validating the theoretical assumption that particle ejections occur due to energetic turbulent flow events. Such a tool may find application in management and regulation of stream flows downstream of dams for stream

  3. Machine learning plus optical flow: a simple and sensitive method to detect cardioactive drugs

    NASA Astrophysics Data System (ADS)

    Lee, Eugene K.; Kurokawa, Yosuke K.; Tu, Robin; George, Steven C.; Khine, Michelle

    2015-07-01

    Current preclinical screening methods do not adequately detect cardiotoxicity. Using human induced pluripotent stem cell-derived cardiomyocytes (iPS-CMs), more physiologically relevant preclinical or patient-specific screening to detect potential cardiotoxic effects of drug candidates may be possible. However, one of the persistent challenges for developing a high-throughput drug screening platform using iPS-CMs is the need to develop a simple and reliable method to measure key electrophysiological and contractile parameters. To address this need, we have developed a platform that combines machine learning paired with brightfield optical flow as a simple and robust tool that can automate the detection of cardiomyocyte drug effects. Using three cardioactive drugs of different mechanisms, including those with primarily electrophysiological effects, we demonstrate the general applicability of this screening method to detect subtle changes in cardiomyocyte contraction. Requiring only brightfield images of cardiomyocyte contractions, we detect changes in cardiomyocyte contraction comparable to - and even superior to - fluorescence readouts. This automated method serves as a widely applicable screening tool to characterize the effects of drugs on cardiomyocyte function.

  4. Machine learning plus optical flow: a simple and sensitive method to detect cardioactive drugs.

    PubMed

    Lee, Eugene K; Kurokawa, Yosuke K; Tu, Robin; George, Steven C; Khine, Michelle

    2015-01-01

    Current preclinical screening methods do not adequately detect cardiotoxicity. Using human induced pluripotent stem cell-derived cardiomyocytes (iPS-CMs), more physiologically relevant preclinical or patient-specific screening to detect potential cardiotoxic effects of drug candidates may be possible. However, one of the persistent challenges for developing a high-throughput drug screening platform using iPS-CMs is the need to develop a simple and reliable method to measure key electrophysiological and contractile parameters. To address this need, we have developed a platform that combines machine learning paired with brightfield optical flow as a simple and robust tool that can automate the detection of cardiomyocyte drug effects. Using three cardioactive drugs of different mechanisms, including those with primarily electrophysiological effects, we demonstrate the general applicability of this screening method to detect subtle changes in cardiomyocyte contraction. Requiring only brightfield images of cardiomyocyte contractions, we detect changes in cardiomyocyte contraction comparable to - and even superior to - fluorescence readouts. This automated method serves as a widely applicable screening tool to characterize the effects of drugs on cardiomyocyte function. PMID:26139150

  5. Machine learning plus optical flow: a simple and sensitive method to detect cardioactive drugs

    PubMed Central

    Lee, Eugene K.; Kurokawa, Yosuke K.; Tu, Robin; George, Steven C.; Khine, Michelle

    2015-01-01

    Current preclinical screening methods do not adequately detect cardiotoxicity. Using human induced pluripotent stem cell-derived cardiomyocytes (iPS-CMs), more physiologically relevant preclinical or patient-specific screening to detect potential cardiotoxic effects of drug candidates may be possible. However, one of the persistent challenges for developing a high-throughput drug screening platform using iPS-CMs is the need to develop a simple and reliable method to measure key electrophysiological and contractile parameters. To address this need, we have developed a platform that combines machine learning paired with brightfield optical flow as a simple and robust tool that can automate the detection of cardiomyocyte drug effects. Using three cardioactive drugs of different mechanisms, including those with primarily electrophysiological effects, we demonstrate the general applicability of this screening method to detect subtle changes in cardiomyocyte contraction. Requiring only brightfield images of cardiomyocyte contractions, we detect changes in cardiomyocyte contraction comparable to – and even superior to – fluorescence readouts. This automated method serves as a widely applicable screening tool to characterize the effects of drugs on cardiomyocyte function. PMID:26139150

  6. Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models.

    PubMed

    Toplak, Marko; Močnik, Rok; Polajnar, Matija; Bosnić, Zoran; Carlsson, Lars; Hasselgren, Catrin; Demšar, Janez; Boyer, Scott; Zupan, Blaž; Stålring, Jonna

    2014-02-24

    The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite. PMID:24490838

  7. A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules.

    PubMed

    Vyas, Renu; Bapat, Sanket; Jain, Esha; Tambe, Sanjeev S; Karthikeyan, Muthukumarasamy; Kulkarni, Bhaskar D

    2015-01-01

    The ligand-based virtual screening of combinatorial libraries employs a number of statistical modeling and machine learning methods. A comprehensive analysis of the application of these methods for the diversity oriented virtual screening of biological targets/drug classes is presented here. A number of classification models have been built using three types of inputs namely structure based descriptors, molecular fingerprints and therapeutic category for performing virtual screening. The activity and affinity descriptors of a set of inhibitors of four target classes DHFR, COX, LOX and NMDA have been utilized to train a total of six classifiers viz. Artificial Neural Network (ANN), k nearest neighbor (k-NN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree--(DT) and Random Forest--(RF). Among these classifiers, the ANN was found as the best classifier with an AUC of 0.9 irrespective of the target. New molecular fingerprints based on pharmacophore, toxicophore and chemophore (PTC), were used to build the ANN models for each dataset. A good accuracy of 87.27% was obtained using 296 chemophoric binary fingerprints for the COX-LOX inhibitors compared to pharmacophoric (67.82%) and toxicophoric (70.64%). The methodology was validated on the classical Ames mutagenecity dataset of 4337 molecules. To evaluate it further, selectivity and promiscuity of molecules from five drug classes viz. anti-anginal, anti-convulsant, anti-depressant, anti-arrhythmic and anti-diabetic were studied. The TPC fingerprints computed for each category were able to capture the drug-class specific features using the k-NN classifier. These models can be useful for selecting optimal molecules for drug design. PMID:26138573

  8. Comparison of Machine Learning Methods for the Purpose Of Human Fall Detection

    NASA Astrophysics Data System (ADS)

    Strémy, Maximilián; Peterková, Andrea

    2014-12-01

    According to several studies, the European population is rapidly aging far over last years. It is therefore important to ensure that aging population is able to live independently without the support of working-age population. In accordance with the studies, fall is the most dangerous and frequent accident in the everyday life of aging population. In our paper, we present a system to track the human fall by a visual detection, i.e. using no wearable equipment. For this purpose, we used a Kinect sensor, which provides the human body position in the Cartesian coordinates. It is possible to directly capture a human body because the Kinect sensor has a depth and also an infrared camera. The first step in our research was to detect postures and classify the fall accident. We experimented and compared the selected machine learning methods including Naive Bayes, decision trees and SVM method to compare the performance in recognizing the human postures (standing, sitting and lying). The highest classification accuracy of over 93.3% was achieved by the decision tree method.

  9. On Plant Detection of Intact Tomato Fruits Using Image Analysis and Machine Learning Methods

    PubMed Central

    Yamamoto, Kyosuke; Guo, Wei; Yoshioka, Yosuke; Ninomiya, Seishi

    2014-01-01

    Fully automated yield estimation of intact fruits prior to harvesting provides various benefits to farmers. Until now, several studies have been conducted to estimate fruit yield using image-processing technologies. However, most of these techniques require thresholds for features such as color, shape and size. In addition, their performance strongly depends on the thresholds used, although optimal thresholds tend to vary with images. Furthermore, most of these techniques have attempted to detect only mature and immature fruits, although the number of young fruits is more important for the prediction of long-term fluctuations in yield. In this study, we aimed to develop a method to accurately detect individual intact tomato fruits including mature, immature and young fruits on a plant using a conventional RGB digital camera in conjunction with machine learning approaches. The developed method did not require an adjustment of threshold values for fruit detection from each image because image segmentation was conducted based on classification models generated in accordance with the color, shape, texture and size of the images. The results of fruit detection in the test images showed that the developed method achieved a recall of 0.80, while the precision was 0.88. The recall values of mature, immature and young fruits were 1.00, 0.80 and 0.78, respectively. PMID:25010694

  10. On plant detection of intact tomato fruits using image analysis and machine learning methods.

    PubMed

    Yamamoto, Kyosuke; Guo, Wei; Yoshioka, Yosuke; Ninomiya, Seishi

    2014-01-01

    Fully automated yield estimation of intact fruits prior to harvesting provides various benefits to farmers. Until now, several studies have been conducted to estimate fruit yield using image-processing technologies. However, most of these techniques require thresholds for features such as color, shape and size. In addition, their performance strongly depends on the thresholds used, although optimal thresholds tend to vary with images. Furthermore, most of these techniques have attempted to detect only mature and immature fruits, although the number of young fruits is more important for the prediction of long-term fluctuations in yield. In this study, we aimed to develop a method to accurately detect individual intact tomato fruits including mature, immature and young fruits on a plant using a conventional RGB digital camera in conjunction with machine learning approaches. The developed method did not require an adjustment of threshold values for fruit detection from each image because image segmentation was conducted based on classification models generated in accordance with the color, shape, texture and size of the images. The results of fruit detection in the test images showed that the developed method achieved a recall of 0.80, while the precision was 0.88. The recall values of mature, immature and young fruits were 1.00, 0.80 and 0.78, respectively. PMID:25010694

  11. Unsupervised nonlinear dimensionality reduction machine learning methods applied to multiparametric MRI in cerebral ischemia: preliminary results

    NASA Astrophysics Data System (ADS)

    Parekh, Vishwa S.; Jacobs, Jeremy R.; Jacobs, Michael A.

    2014-03-01

    The evaluation and treatment of acute cerebral ischemia requires a technique that can determine the total area of tissue at risk for infarction using diagnostic magnetic resonance imaging (MRI) sequences. Typical MRI data sets consist of T1- and T2-weighted imaging (T1WI, T2WI) along with advanced MRI parameters of diffusion-weighted imaging (DWI) and perfusion weighted imaging (PWI) methods. Each of these parameters has distinct radiological-pathological meaning. For example, DWI interrogates the movement of water in the tissue and PWI gives an estimate of the blood flow, both are critical measures during the evolution of stroke. In order to integrate these data and give an estimate of the tissue at risk or damaged; we have developed advanced machine learning methods based on unsupervised non-linear dimensionality reduction (NLDR) techniques. NLDR methods are a class of algorithms that uses mathematically defined manifolds for statistical sampling of multidimensional classes to generate a discrimination rule of guaranteed statistical accuracy and they can generate a two- or three-dimensional map, which represents the prominent structures of the data and provides an embedded image of meaningful low-dimensional structures hidden in their high-dimensional observations. In this manuscript, we develop NLDR methods on high dimensional MRI data sets of preclinical animals and clinical patients with stroke. On analyzing the performance of these methods, we observed that there was a high of similarity between multiparametric embedded images from NLDR methods and the ADC map and perfusion map. It was also observed that embedded scattergram of abnormal (infarcted or at risk) tissue can be visualized and provides a mechanism for automatic methods to delineate potential stroke volumes and early tissue at risk.

  12. A Multi-Label Learning Based Kernel Automatic Recommendation Method for Support Vector Machine

    PubMed Central

    Zhang, Xueying; Song, Qinbao

    2015-01-01

    Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance. PMID:25893896

  13. Machine learning algorithms for predicting protein folding rates and stability of mutant proteins: comparison with statistical methods.

    PubMed

    Gromiha, M Michael; Huang, Liang-Tsung

    2011-09-01

    Machine learning algorithms have wide range of applications in bioinformatics and computational biology such as prediction of protein secondary structures, solvent accessibility, binding site residues in protein complexes, protein folding rates, stability of mutant proteins, and discrimination of proteins based on their structure and function. In this work, we focus on two aspects of predictions: (i) protein folding rates and (ii) stability of proteins upon mutations. We briefly introduce the concepts of protein folding rates and stability along with available databases, features for prediction methods and measures for prediction performance. Subsequently, the development of structure based parameters and their relationship with protein folding rates will be outlined. The structure based parameters are helpful to understand the physical basis for protein folding and stability. Further, basic principles of major machine learning techniques will be mentioned and their applications for predicting protein folding rates and stability of mutant proteins will be illustrated. The machine learning techniques could achieve the highest accuracy of predicting protein folding rates and stability. In essence, statistical methods and machine learning algorithms are complimenting each other for understanding and predicting protein folding rates and the stability of protein mutants. The available online resources on protein folding rates and stability will be listed. PMID:21787301

  14. Prediction of core cancer genes using a hybrid of feature selection and machine learning methods.

    PubMed

    Liu, Y X; Zhang, N N; He, Y; Lun, L J

    2015-01-01

    Machine learning techniques are of great importance in the analysis of microarray expression data, and provide a systematic and promising way to predict core cancer genes. In this study, a hybrid strategy was introduced based on machine learning techniques to select a small set of informative genes, which will lead to improving classification accuracy. First feature filtering algorithms were applied to select a set of top-ranked genes, and then hierarchical clustering and collapsing dense clusters were used to select core cancer genes. Through empirical study, our approach is capable of selecting relatively few core cancer genes while making high-accuracy predictions. The biological significance of these genes was evaluated using systems biology analysis. Extensive functional pathway and network analyses have confirmed findings in previous studies and can bring new insights into common cancer mechanisms. PMID:26345818

  15. Identifying relatively high-risk group of coronary artery calcification based on progression rate: statistical and machine learning methods.

    PubMed

    Kim, Ha-Young; Yoo, Sanghyun; Lee, Jihyun; Kam, Hye Jin; Woo, Kyoung-Gu; Choi, Yoon-Ho; Sung, Jidong; Kang, Mira

    2012-01-01

    Coronary artery calcification (CAC) score is an important predictor of coronary artery disease (CAD), which is the primary cause of death in advanced countries. Early prediction of high-risk of CAC based on progression rate enables people to prevent CAD from developing into severe symptoms and diseases. In this study, we developed various classifiers to identify patients in high risk of CAC using statistical and machine learning methods, and compared them with performance accuracy. For statistical approaches, linear regression based classifier and logistic regression model were developed. For machine learning approaches, we suggested three kinds of ensemble-based classifiers (best, top-k, and voting method) to deal with imbalanced distribution of our data set. Ensemble voting method outperformed all other methods including regression methods as AUC was 0.781. PMID:23366360

  16. Estimating Corn Yield in the United States with Modis Evi and Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Kuwata, K.; Shibasaki, R.

    2016-06-01

    Satellite remote sensing is commonly used to monitor crop yield in wide areas. Because many parameters are necessary for crop yield estimation, modelling the relationships between parameters and crop yield is generally complicated. Several methodologies using machine learning have been proposed to solve this issue, but the accuracy of county-level estimation remains to be improved. In addition, estimating county-level crop yield across an entire country has not yet been achieved. In this study, we applied a deep neural network (DNN) to estimate corn yield. We evaluated the estimation accuracy of the DNN model by comparing it with other models trained by different machine learning algorithms. We also prepared two time-series datasets differing in duration and confirmed the feature extraction performance of models by inputting each dataset. As a result, the DNN estimated county-level corn yield for the entire area of the United States with a determination coefficient (R2) of 0.780 and a root mean square error (RMSE) of 18.2 bushels/acre. In addition, our results showed that estimation models that were trained by a neural network extracted features from the input data better than an existing machine learning algorithm.

  17. Daily streamflow forecasting by machine learning methods with weather and climate inputs

    NASA Astrophysics Data System (ADS)

    Rasouli, Kabir; Hsieh, William W.; Cannon, Alex J.

    2012-01-01

    SummaryWeather forecast data generated by the NOAA Global Forecasting System (GFS) model, climate indices, and local meteo-hydrologic observations were used to forecast daily streamflows for a small watershed in British Columbia, Canada, at lead times of 1-7 days. Three machine learning methods - Bayesian neural network (BNN), support vector regression (SVR) and Gaussian process (GP) - were used and compared with multiple linear regression (MLR). The nonlinear models generally outperformed MLR, and BNN tended to slightly outperform the other nonlinear models. Among various combinations of predictors, local observations plus the GFS output were generally best at shorter lead times, while local observations plus climate indices were best at longer lead times. The climate indices selected include the sea surface temperature in the Niño 3.4 region, the Pacific-North American teleconnection (PNA), the Arctic Oscillation (AO) and the North Atlantic Oscillation (NAO). In the binary forecasts for extreme (high) streamflow events, the best predictors to use were the local observations plus GFS output. Interestingly, climate indices contribute to daily streamflow forecast scores during longer lead times of 5-7 days, but not to forecast scores for extreme streamflow events for all lead times studied (1-7 days).

  18. Integrating Symbolic and Statistical Methods for Testing Intelligent Systems Applications to Machine Learning and Computer Vision

    SciTech Connect

    Jha, Sumit Kumar; Pullum, Laura L; Ramanathan, Arvind

    2016-01-01

    Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studying the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.

  19. Stacked Extreme Learning Machines.

    PubMed

    Zhou, Hongming; Huang, Guang-Bin; Lin, Zhiping; Wang, Han; Soh, Yeng Chai

    2015-09-01

    Extreme learning machine (ELM) has recently attracted many researchers' interest due to its very fast learning speed, good generalization ability, and ease of implementation. It provides a unified solution that can be used directly to solve regression, binary, and multiclass classification problems. In this paper, we propose a stacked ELMs (S-ELMs) that is specially designed for solving large and complex data problems. The S-ELMs divides a single large ELM network into multiple stacked small ELMs which are serially connected. The S-ELMs can approximate a very large ELM network with small memory requirement. To further improve the testing accuracy on big data problems, the ELM autoencoder can be implemented during each iteration of the S-ELMs algorithm. The simulation results show that the S-ELMs even with random hidden nodes can achieve similar testing accuracy to support vector machine (SVM) while having low memory requirements. With the help of ELM autoencoder, the S-ELMs can achieve much better testing accuracy than SVM and slightly better accuracy than deep belief network (DBN) with much faster training speed. PMID:25361517

  20. Machine learning for medical images analysis.

    PubMed

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods. PMID:27374127

  1. On the Use of Machine Learning Methods for Characterization of Contaminant Source Zone Architecture

    NASA Astrophysics Data System (ADS)

    Zhang, H.; Mendoza-Sanchez, I.; Christ, J.; Miller, E. L.; Abriola, L. M.

    2011-12-01

    Recent research has identified the importance of DNAPL mass distribution in the evolution of down-gradient contaminant plumes and the control of source zone remediation effectiveness. Advances in the management of sites containing DNAPL source zones, however, are currently limited by the difficulty associated with characterizing subsurface DNAPL source zone 'architecture'. Specifically, knowledge of the ganglia to pool ratio (GTP) has been demonstrated useful in the assessment and prediction of system behavior. In this paper, we present an approach to the estimation of a quantity related to GTP, the pool fraction (PF), defined as the percentage of the source zone volume occupied by pools, based on observations of plume concentrations. Here we discuss the development and initial validation of an approach for PF estimation based on machine learning method. The algorithm is constructed in a way that, when given new concentration data, prediction of the PF of the associated source zone is attained. An ideal solution would make use of the concentration signals to estimate a single value for PF. Unfortunately, this problem is not well-posed given the data at our disposal. Thus, we relax the regression approach to one of classification. We quantize pool fraction (i.e., the interval between zero and one) into a number of intervals and employ machine learning methods to use the concentration data to determine the interval containing the PF for a given set of data. This approach is predicated on the assumption that quantities (i.e., features) derived from the concentration data of evolving plumes with similar source zone PFs will in fact be similar to one another. Thus, within the training process we must determine a suitable collection of features and build methods for evaluating and optimizing similarity in features space that results in high accuracy in terms of predicting the correct PF interval. Moreover, the number and boundaries of these intervals must also be

  2. A MACHINE-LEARNING METHOD TO INFER FUNDAMENTAL STELLAR PARAMETERS FROM PHOTOMETRIC LIGHT CURVES

    SciTech Connect

    Miller, A. A.; Bloom, J. S.; Richards, J. W.; Starr, D. L.; Lee, Y. S.; Butler, N. R.; Tokarz, S.; Smith, N.; Eisner, J. A.

    2015-01-10

    A fundamental challenge for wide-field imaging surveys is obtaining follow-up spectroscopic observations: there are >10{sup 9} photometrically cataloged sources, yet modern spectroscopic surveys are limited to ∼few× 10{sup 6} targets. As we approach the Large Synoptic Survey Telescope era, new algorithmic solutions are required to cope with the data deluge. Here we report the development of a machine-learning framework capable of inferring fundamental stellar parameters (T {sub eff}, log g, and [Fe/H]) using photometric-brightness variations and color alone. A training set is constructed from a systematic spectroscopic survey of variables with Hectospec/Multi-Mirror Telescope. In sum, the training set includes ∼9000 spectra, for which stellar parameters are measured using the SEGUE Stellar Parameters Pipeline (SSPP). We employed the random forest algorithm to perform a non-parametric regression that predicts T {sub eff}, log g, and [Fe/H] from photometric time-domain observations. Our final optimized model produces a cross-validated rms error (RMSE) of 165 K, 0.39 dex, and 0.33 dex for T {sub eff}, log g, and [Fe/H], respectively. Examining the subset of sources for which the SSPP measurements are most reliable, the RMSE reduces to 125 K, 0.37 dex, and 0.27 dex, respectively, comparable to what is achievable via low-resolution spectroscopy. For variable stars this represents a ≈12%-20% improvement in RMSE relative to models trained with single-epoch photometric colors. As an application of our method, we estimate stellar parameters for ∼54,000 known variables. We argue that this method may convert photometric time-domain surveys into pseudo-spectrographic engines, enabling the construction of extremely detailed maps of the Milky Way, its structure, and history.

  3. A Machine-learning Method to Infer Fundamental Stellar Parameters from Photometric Light Curves

    NASA Astrophysics Data System (ADS)

    Miller, A. A.; Bloom, J. S.; Richards, J. W.; Lee, Y. S.; Starr, D. L.; Butler, N. R.; Tokarz, S.; Smith, N.; Eisner, J. A.

    2015-01-01

    A fundamental challenge for wide-field imaging surveys is obtaining follow-up spectroscopic observations: there are >109 photometrically cataloged sources, yet modern spectroscopic surveys are limited to ~few× 106 targets. As we approach the Large Synoptic Survey Telescope era, new algorithmic solutions are required to cope with the data deluge. Here we report the development of a machine-learning framework capable of inferring fundamental stellar parameters (T eff, log g, and [Fe/H]) using photometric-brightness variations and color alone. A training set is constructed from a systematic spectroscopic survey of variables with Hectospec/Multi-Mirror Telescope. In sum, the training set includes ~9000 spectra, for which stellar parameters are measured using the SEGUE Stellar Parameters Pipeline (SSPP). We employed the random forest algorithm to perform a non-parametric regression that predicts T eff, log g, and [Fe/H] from photometric time-domain observations. Our final optimized model produces a cross-validated rms error (RMSE) of 165 K, 0.39 dex, and 0.33 dex for T eff, log g, and [Fe/H], respectively. Examining the subset of sources for which the SSPP measurements are most reliable, the RMSE reduces to 125 K, 0.37 dex, and 0.27 dex, respectively, comparable to what is achievable via low-resolution spectroscopy. For variable stars this represents a ≈12%-20% improvement in RMSE relative to models trained with single-epoch photometric colors. As an application of our method, we estimate stellar parameters for ~54,000 known variables. We argue that this method may convert photometric time-domain surveys into pseudo-spectrographic engines, enabling the construction of extremely detailed maps of the Milky Way, its structure, and history.

  4. Classification of lung cancer using ensemble-based feature selection and machine learning methods.

    PubMed

    Cai, Zhihua; Xu, Dong; Zhang, Qing; Zhang, Jiexia; Ngai, Sai-Ming; Shao, Jianlin

    2015-03-01

    Lung cancer is one of the leading causes of death worldwide. There are three major types of lung cancers, non-small cell lung cancer (NSCLC), small cell lung cancer (SCLC) and carcinoid. NSCLC is further classified into lung adenocarcinoma (LADC), squamous cell lung cancer (SQCLC) as well as large cell lung cancer. Many previous studies demonstrated that DNA methylation has emerged as potential lung cancer-specific biomarkers. However, whether there exists a set of DNA methylation markers simultaneously distinguishing such three types of lung cancers remains elusive. In the present study, ROC (Receiving Operating Curve), RFs (Random Forests) and mRMR (Maximum Relevancy and Minimum Redundancy) were proposed to capture the unbiased, informative as well as compact molecular signatures followed by machine learning methods to classify LADC, SQCLC and SCLC. As a result, a panel of 16 DNA methylation markers exhibits an ideal classification power with an accuracy of 86.54%, 84.6% and a recall 84.37%, 85.5% in the leave-one-out cross-validation (LOOCV) and independent data set test experiments, respectively. Besides, comparison results indicate that ensemble-based feature selection methods outperform individual ones when combined with the incremental feature selection (IFS) strategy in terms of the informative and compact property of features. Taken together, results obtained suggest the effectiveness of the ensemble-based feature selection approach and the possible existence of a common panel of DNA methylation markers among such three types of lung cancer tissue, which would facilitate clinical diagnosis and treatment. PMID:25512221

  5. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  6. Forecasting Urban Water Demand via Machine Learning Methods Coupled with a Bootstrap Rank-Ordered Conditional Mutual Information Input Variable Selection Method

    NASA Astrophysics Data System (ADS)

    Adamowski, J. F.; Quilty, J.; Khalil, B.; Rathinasamy, M.

    2014-12-01

    This paper explores forecasting short-term urban water demand (UWD) (using only historical records) through a variety of machine learning techniques coupled with a novel input variable selection (IVS) procedure. The proposed IVS technique termed, bootstrap rank-ordered conditional mutual information for real-valued signals (brCMIr), is multivariate, nonlinear, nonparametric, and probabilistic. The brCMIr method was tested in a case study using water demand time series for two urban water supply system pressure zones in Ottawa, Canada to select the most important historical records for use with each machine learning technique in order to generate forecasts of average and peak UWD for the respective pressure zones at lead times of 1, 3, and 7 days ahead. All lead time forecasts are computed using Artificial Neural Networks (ANN) as the base model, and are compared with Least Squares Support Vector Regression (LSSVR), as well as a novel machine learning method for UWD forecasting: the Extreme Learning Machine (ELM). Results from one-way analysis of variance (ANOVA) and Tukey Honesty Significance Difference (HSD) tests indicate that the LSSVR and ELM models are the best machine learning techniques to pair with brCMIr. However, ELM has significant computational advantages over LSSVR (and ANN) and provides a new and promising technique to explore in UWD forecasting.

  7. Multipolar electrostatics based on the Kriging machine learning method: an application to serine.

    PubMed

    Yuan, Yongna; Mills, Matthew J L; Popelier, Paul L A

    2014-04-01

    A multipolar, polarizable electrostatic method for future use in a novel force field is described. Quantum Chemical Topology (QCT) is used to partition the electron density of a chemical system into atoms, then the machine learning method Kriging is used to build models that relate the multipole moments of the atoms to the positions of their surrounding nuclei. The pilot system serine is used to study both the influence of the level of theory and the set of data generator methods used. The latter consists of: (i) sampling of protein structures deposited in the Protein Data Bank (PDB), or (ii) normal mode distortion along either (a) Cartesian coordinates, or (b) redundant internal coordinates. Wavefunctions for the sampled geometries were obtained at the HF/6-31G(d,p), B3LYP/apc-1, and MP2/cc-pVDZ levels of theory, prior to calculation of the atomic multipole moments by volume integration. The average absolute error (over an independent test set of conformations) in the total atom-atom electrostatic interaction energy of serine, using Kriging models built with the three data generator methods is 11.3 kJ mol⁻¹ (PDB), 8.2 kJ mol⁻¹ (Cartesian distortion), and 10.1 kJ mol⁻¹ (redundant internal distortion) at the HF/6-31G(d,p) level. At the B3LYP/apc-1 level, the respective errors are 7.7 kJ mol⁻¹, 6.7 kJ mol⁻¹, and 4.9 kJ mol⁻¹, while at the MP2/cc-pVDZ level they are 6.5 kJ mol⁻¹, 5.3 kJ mol⁻¹, and 4.0 kJ mol⁻¹. The ranges of geometries generated by the redundant internal coordinate distortion and by extraction from the PDB are much wider than the range generated by Cartesian distortion. The atomic multipole moment and electrostatic interaction energy predictions for the B3LYP/apc-1 and MP2/cc-pVDZ levels are similar, and both are better than the corresponding predictions at the HF/6-31G(d,p) level. PMID:24633774

  8. Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods.

    PubMed

    Shan, Juan; Alam, S Kaisar; Garra, Brian; Zhang, Yingtao; Ahmed, Tahira

    2016-04-01

    This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a "bottom-up" approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions. PMID:26806441

  9. Machine learning applications in genetics and genomics.

    PubMed

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets. PMID:25948244

  10. A Machine Learning Method for Power Prediction on the Mobile Devices.

    PubMed

    Chen, Da-Ren; Chen, You-Shyang; Chen, Lin-Chih; Hsu, Ming-Yang; Chiang, Kai-Feng

    2015-10-01

    Energy profiling and estimation have been popular areas of research in multicore mobile architectures. While short sequences of system calls have been recognized by machine learning as pattern descriptions for anomalous detection, power consumption of running processes with respect to system-call patterns are not well studied. In this paper, we propose a fuzzy neural network (FNN) for training and analyzing process execution behaviour with respect to series of system calls, parameters and their power consumptions. On the basis of the patterns of a series of system calls, we develop a power estimation daemon (PED) to analyze and predict the energy consumption of the running process. In the initial stage, PED categorizes sequences of system calls as functional groups and predicts their energy consumptions by FNN. In the operational stage, PED is applied to identify the predefined sequences of system calls invoked by running processes and estimates their energy consumption. PMID:26306877

  11. Machine Learning in Systems Biology

    PubMed Central

    d'Alché-Buc, Florence; Wehenkel, Louis

    2008-01-01

    This supplement contains extended versions of a selected subset of papers presented at the workshop MLSB 2007, Machine Learning in Systems Biology, Evry, France, from September 24 to 25, 2007. PMID:19091048

  12. Machine learning in systems biology.

    PubMed

    d'Alché-Buc, Florence; Wehenkel, Louis

    2008-01-01

    This supplement contains extended versions of a selected subset of papers presented at the workshop MLSB 2007, Machine Learning in Systems Biology, Evry, France, from September 24 to 25, 2007. PMID:19091048

  13. Machine learning-based method for personalized and cost-effective detection of Alzheimer's disease.

    PubMed

    Escudero, Javier; Ifeachor, Emmanuel; Zajicek, John P; Green, Colin; Shearer, James; Pearson, Stephen

    2013-01-01

    Diagnosis of Alzheimer's disease (AD) is often difficult, especially early in the disease process at the stage of mild cognitive impairment (MCI). Yet, it is at this stage that treatment is most likely to be effective, so there would be great advantages in improving the diagnosis process. We describe and test a machine learning approach for personalized and cost-effective diagnosis of AD. It uses locally weighted learning to tailor a classifier model to each patient and computes the sequence of biomarkers most informative or cost-effective to diagnose patients. Using ADNI data, we classified AD versus controls and MCI patients who progressed to AD within a year, against those who did not. The approach performed similarly to considering all data at once, while significantly reducing the number (and cost) of the biomarkers needed to achieve a confident diagnosis for each patient. Thus, it may contribute to a personalized and effective detection of AD, and may prove useful in clinical settings. PMID:22893371

  14. Machine learning methods for the classification of extreme rainfall and hail events

    NASA Astrophysics Data System (ADS)

    Teschl, Reinhard; Süsser-Rechberger, Barbara; Paulitsch, Helmut

    2015-04-01

    In this study, an analysis of a meteorological data set with machine learning tools is presented. The aim was to identify characteristic patterns in different sources of remote sensing data that are associated with hazards like extreme rainfall and hail. The data set originates from a project that was started in 2007 with the goal to document and mitigate hail events in the province of Styria, Austria. It consists of three dimensional weather radar data from a C-band Doppler radar, cloud top temperature information from infrared channels of a weather satellite, as well as the height of the 0° C isotherm from the forecast of the national weather service. The 3D radar dataset has a spatial resolution of 1 km x 1 km x 1 km, up to a height of 16 km above mean sea level, and a temporal resolution of 5 minutes. The infrared satellite image resolution is about 3 km x 3 km, the images are updated every 30 minutes. The study area has approx. 16,000 square kilometers. So far, different criteria for the occurrence of hail (and its discrimination from heavy rain) have been found and are documented in the literature. When applying these criteria to our data and contrasting them with damage reports from an insurance company, a need for adaption was identified. Here we are using supervised learning paradigms to find tailored relationships for the study area, validated by a sub-dataset that was not involved in the training process.

  15. Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

    NASA Astrophysics Data System (ADS)

    Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis; Boone, William J.

    2013-07-01

    The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding

  16. A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli

    PubMed Central

    2014-01-01

    Background Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. Results This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. Conclusions This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving

  17. An introduction to quantum machine learning

    NASA Astrophysics Data System (ADS)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  18. Machine Shop. Student Learning Guide.

    ERIC Educational Resources Information Center

    Palm Beach County Board of Public Instruction, West Palm Beach, FL.

    This student learning guide contains eight modules for completing a course in machine shop. It is designed especially for use in Palm Beach County, Florida. Each module covers one task, and consists of a purpose, performance objective, enabling objectives, learning activities and resources, information sheets, student self-check with answer key,…

  19. Missing-Data Estimation for Daily Rainfall in Everglades Florida Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Lima, C.; Lall, U.; Landot, T.; Pathak, C.

    2008-05-01

    In the present study we derive a novel model to fill in gaps of daily rainfall data from 43 rainfall stations of South Florida. The filling-in process consists of two stages: prediction of rainfall occurrence and prediction of rainfall amounts. In the first step we identify the stations with available daily rainfall data and assign 1 for wet states and - 1 for dry states. Support Vector Machines (SVM) is then applied to derive an optimal spatial boundary in order to define the spatial pattern of wet and dry states. The missing-data station is classified, based on its spatial location, into the wet or dry class. In the second stage we use historical data of available stations as predictors for the rainfall amounts of the missing data gauge. We evaluate three different models to predict rainfall amounts: linear, local regression (locfit) and Support Vector Machines. We compare these models with common methods used in the literature, namely ordinary Kriging and nearest neighbor methods. The results show that the methodology proposed here yields accurate estimates of daily rainfall.

  20. Machine Learning Methods for the Sampling of Chemical Space From First Principles

    NASA Astrophysics Data System (ADS)

    von Lilienfeld, Anatole

    2015-03-01

    Computational brute force high-throughput screening of compounds is beyond any capacity for all but the most restricted systems due to the combinatorial nature of chemical space, i.e. all the compositional, constitutional, and conformational isomers. Efficient computational materials design algorithms must therefore make good trade-offs between the accuracy of the applied model and computational speed. Overall, rapid convergence in terms of number of compounds visited is highly desirable. In this talk, I will describe recent contributions in this field based on statistical approaches that can serve as inexpensive surrogate models to reduce the computational load of quantum mechanical calculations. Such surrogate machine learning (ML) models infer quantum mechanical observables of novel materials, rather than solving approximate variants of Schroedinger's equation. We developed accurate ML models for the rapid prediction of atomization energies and enthalpies, cohesive energies, and electronic properties that conventionally can only be predicted using quantum mechanics. All our ML models have been trained using large data bases containing properties of thousands of chemical compounds and materials. I will exemplify our approach for the prediction of properties from scratch for out-of-sample compounds. These predictions reach quantum chemical accuracy and are basically instantaneous, i.e. at a computational cost reduced by several orders of magnitude.

  1. Application of Geostatistical Methods and Machine Learning for spatio-temporal Earthquake Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Schaefer, A. M.; Daniell, J. E.; Wenzel, F.

    2014-12-01

    Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.

  2. Comparative analysis of expert and machine-learning methods for classification of body cavity effusions in companion animals.

    PubMed

    Hotz, Christine S; Templeton, Steven J; Christopher, Mary M

    2005-03-01

    A rule-based expert system using CLIPS programming language was created to classify body cavity effusions as transudates, modified transudates, exudates, chylous, and hemorrhagic effusions. The diagnostic accuracy of the rule-based system was compared with that produced by 2 machine-learning methods: Rosetta, a rough sets algorithm and RIPPER, a rule-induction method. Results of 508 body cavity fluid analyses (canine, feline, equine) obtained from the University of California-Davis Veterinary Medical Teaching Hospital computerized patient database were used to test CLIPS and to test and train RIPPER and Rosetta. The CLIPS system, using 17 rules, achieved an accuracy of 93.5% compared with pathologist consensus diagnoses. Rosetta accurately classified 91% of effusions by using 5,479 rules. RIPPER achieved the greatest accuracy (95.5%) using only 10 rules. When the original rules of the CLIPS application were replaced with those of RIPPER, the accuracy rates were identical. These results suggest that both rule-based expert systems and machine-learning methods hold promise for the preliminary classification of body fluids in the clinical laboratory. PMID:15825497

  3. Data Processing And Machine Learning Methods For Multi-Modal Operator State Classification Systems

    NASA Technical Reports Server (NTRS)

    Hearn, Tristan A.

    2015-01-01

    This document is intended as an introduction to a set of common signal processing learning methods that may be used in the software portion of a functional crew state monitoring system. This includes overviews of both the theory of the methods involved, as well as examples of implementation. Practical considerations are discussed for implementing modular, flexible, and scalable processing and classification software for a multi-modal, multi-channel monitoring system. Example source code is also given for all of the discussed processing and classification methods.

  4. Game-powered machine learning

    PubMed Central

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-01-01

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786

  5. Game-powered machine learning.

    PubMed

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786

  6. Recent Advances in Predictive (Machine) Learning

    SciTech Connect

    Friedman, J

    2004-01-24

    Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

  7. Machine learning: Trends, perspectives, and prospects.

    PubMed

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. PMID:26185243

  8. Machine learning in sedimentation modelling.

    PubMed

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    The paper presents machine learning (ML) models that predict sedimentation in the harbour basin of the Port of Rotterdam. The important factors affecting the sedimentation process such as waves, wind, tides, surge, river discharge, etc. are studied, the corresponding time series data is analysed, missing values are estimated and the most important variables behind the process are chosen as the inputs. Two ML methods are used: MLP ANN and M5 model tree. The latter is a collection of piece-wise linear regression models, each being an expert for a particular region of the input space. The models are trained on the data collected during 1992-1998 and tested by the data of 1999-2000. The predictive accuracy of the models is found to be adequate for the potential use in the operational decision making. PMID:16530383

  9. Machine learning in motion control

    NASA Technical Reports Server (NTRS)

    Su, Renjeng; Kermiche, Noureddine

    1989-01-01

    The existing methodologies for robot programming originate primarily from robotic applications to manufacturing, where uncertainties of the robots and their task environment may be minimized by repeated off-line modeling and identification. In space application of robots, however, a higher degree of automation is required for robot programming because of the desire of minimizing the human intervention. We discuss a new paradigm of robotic programming which is based on the concept of machine learning. The goal is to let robots practice tasks by themselves and the operational data are used to automatically improve their motion performance. The underlying mathematical problem is to solve the problem of dynamical inverse by iterative methods. One of the key questions is how to ensure the convergence of the iterative process. There have been a few small steps taken into this important approach to robot programming. We give a representative result on the convergence problem.

  10. Blind steganalysis method for JPEG steganography combined with the semisupervised learning and soft margin support vector machine

    NASA Astrophysics Data System (ADS)

    Dong, Yu; Zhang, Tao; Xi, Ling

    2015-01-01

    Stego images embedded by unknown steganographic algorithms currently may not be detected by using steganalysis detectors based on binary classifier. However, it is difficult to obtain high detection accuracy by using universal steganalysis based on one-class classifier. For solving this problem, a blind detection method for JPEG steganography was proposed from the perspective of information theory. The proposed method combined the semisupervised learning and soft margin support vector machine with steganalysis detector based on one-class classifier to utilize the information in test data for improving detection performance. Reliable blind detection for JPEG steganography was realized only using cover images for training. The experimental results show that the proposed method can contribute to improving the detection accuracy of steganalysis detector based on one-class classifier and has good robustness under different source mismatch conditions.

  11. Prediction of Backbreak in Open-Pit Blasting Operations Using the Machine Learning Method

    NASA Astrophysics Data System (ADS)

    Khandelwal, Manoj; Monjezi, M.

    2013-03-01

    Backbreak is an undesirable phenomenon in blasting operations. It can cause instability of mine walls, falling down of machinery, improper fragmentation, reduced efficiency of drilling, etc. The existence of various effective parameters and their unknown relationships are the main reasons for inaccuracy of the empirical models. Presently, the application of new approaches such as artificial intelligence is highly recommended. In this paper, an attempt has been made to predict backbreak in blasting operations of Soungun iron mine, Iran, incorporating rock properties and blast design parameters using the support vector machine (SVM) method. To investigate the suitability of this approach, the predictions by SVM have been compared with multivariate regression analysis (MVRA). The coefficient of determination (CoD) and the mean absolute error (MAE) were taken as performance measures. It was found that the CoD between measured and predicted backbreak was 0.987 and 0.89 by SVM and MVRA, respectively, whereas the MAE was 0.29 and 1.07 by SVM and MVRA, respectively.

  12. Remotely controlling of mobile robots using gesture captured by the Kinect and recognized by machine learning method

    NASA Astrophysics Data System (ADS)

    Hsu, Roy CHaoming; Jian, Jhih-Wei; Lin, Chih-Chuan; Lai, Chien-Hung; Liu, Cheng-Ting

    2013-01-01

    The main purpose of this paper is to use machine learning method and Kinect and its body sensation technology to design a simple, convenient, yet effective robot remote control system. In this study, a Kinect sensor is used to capture the human body skeleton with depth information, and a gesture training and identification method is designed using the back propagation neural network to remotely command a mobile robot for certain actions via the Bluetooth. The experimental results show that the designed mobile robots remote control system can achieve, on an average, more than 96% of accurate identification of 7 types of gestures and can effectively control a real e-puck robot for the designed commands.

  13. Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds

    NASA Astrophysics Data System (ADS)

    Shortridge, Julie E.; Guikema, Seth D.; Zaitchik, Benjamin F.

    2016-07-01

    In the past decade, machine learning methods for empirical rainfall-runoff modeling have seen extensive development and been proposed as a useful complement to physical hydrologic models, particularly in basins where data to support process-based models are limited. However, the majority of research has focused on a small number of methods, such as artificial neural networks, despite the development of multiple other approaches for non-parametric regression in recent years. Furthermore, this work has often evaluated model performance based on predictive accuracy alone, while not considering broader objectives, such as model interpretability and uncertainty, that are important if such methods are to be used for planning and management decisions. In this paper, we use multiple regression and machine learning approaches (including generalized additive models, multivariate adaptive regression splines, artificial neural networks, random forests, and M5 cubist models) to simulate monthly streamflow in five highly seasonal rivers in the highlands of Ethiopia and compare their performance in terms of predictive accuracy, error structure and bias, model interpretability, and uncertainty when faced with extreme climate conditions. While the relative predictive performance of models differed across basins, data-driven approaches were able to achieve reduced errors when compared to physical models developed for the region. Methods such as random forests and generalized additive models may have advantages in terms of visualization and interpretation of model structure, which can be useful in providing insights into physical watershed function. However, the uncertainty associated with model predictions under extreme climate conditions should be carefully evaluated, since certain models (especially generalized additive models and multivariate adaptive regression splines) become highly variable when faced with high temperatures.

  14. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). PMID:25461071

  15. New Learning Method of a Lecture of ‘Machine Fabrication’ by Self-study with Investigation and Presentation Incorporated

    NASA Astrophysics Data System (ADS)

    Kasuga, Yukio

    A new teaching method was developed in learningmachine fabrication’ for the undergraduate students. This consists of a few times of lectures, grouping, decision of industrial products which each group wants to investigate, investigation work by library books and internet, arrangement of data containing characteristics of the products, employed materials and processing methods, presentation, discussions and revision followed by another presentation. This new method is derived from one of the Finland‧s way of primary school education. Their way of education is believed to have boosted up to the top ranking in PISA tests by OECD. After starting the new way of learning, students have fresh impressions on this lesson, especially for self-study, the way of investigation, collaborate work and presentation. Also, after four years of implementation, some improvements have been made including less use of internet, and determination of products and fabricating methods in advance which should be investigated. By this, students‧ lecture assessment shows further encouraging results.

  16. A machine learning method for extracting symbolic knowledge from recurrent neural networks.

    PubMed

    Vahed, A; Omlin, C W

    2004-01-01

    Neural networks do not readily provide an explanation of the knowledge stored in their weights as part of their information processing. Until recently, neural networks were considered to be black boxes, with the knowledge stored in their weights not readily accessible. Since then, research has resulted in a number of algorithms for extracting knowledge in symbolic form from trained neural networks. This article addresses the extraction of knowledge in symbolic form from recurrent neural networks trained to behave like deterministic finite-state automata (DFAs). To date, methods used to extract knowledge from such networks have relied on the hypothesis that networks' states tend to cluster and that clusters of network states correspond to DFA states. The computational complexity of such a cluster analysis has led to heuristics that either limit the number of clusters that may form during training or limit the exploration of the space of hidden recurrent state neurons. These limitations, while necessary, may lead to decreased fidelity, in which the extracted knowledge may not model the true behavior of a trained network, perhaps not even for the training set. The method proposed here uses a polynomial time, symbolic learning algorithm to infer DFAs solely from the observation of a trained network's input-output behavior. Thus, this method has the potential to increase the fidelity of the extracted knowledge. PMID:15006023

  17. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  18. Entanglement-Based Machine Learning on a Quantum Computer

    NASA Astrophysics Data System (ADS)

    Cai, X.-D.; Wu, D.; Su, Z.-E.; Chen, M.-C.; Wang, X.-L.; Li, Li; Liu, N.-L.; Lu, C.-Y.; Pan, J.-W.

    2015-03-01

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  19. Entanglement-based machine learning on a quantum computer.

    PubMed

    Cai, X-D; Wu, D; Su, Z-E; Chen, M-C; Wang, X-L; Li, Li; Liu, N-L; Lu, C-Y; Pan, J-W

    2015-03-20

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning. PMID:25839250

  20. Vitrification: Machines learn to recognize glasses

    NASA Astrophysics Data System (ADS)

    Ceriotti, Michele; Vitelli, Vincenzo

    2016-05-01

    The dynamics of a viscous liquid undergo a dramatic slowdown when it is cooled to form a solid glass. Recognizing the structural changes across such a transition remains a major challenge. Machine-learning methods, similar to those Facebook uses to recognize groups of friends, have now been applied to this problem.

  1. Machine learning phases of matter

    NASA Astrophysics Data System (ADS)

    Carrasquilla, Juan; Stoudenmire, Miles; Melko, Roger

    We show how the technology that allows automatic teller machines read hand-written digits in cheques can be used to encode and recognize phases of matter and phase transitions in many-body systems. In particular, we analyze the (quasi-)order-disorder transitions in the classical Ising and XY models. Furthermore, we successfully use machine learning to study classical Z2 gauge theories that have important technological application in the coming wave of quantum information technologies and whose phase transitions have no conventional order parameter.

  2. A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches

    PubMed Central

    Sampson, Dayle L.; Parker, Tony J.; Upton, Zee; Hurst, Cameron P.

    2011-01-01

    The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems. PMID:21969867

  3. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  4. Learning Machine Learning: A Case Study

    ERIC Educational Resources Information Center

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  5. Solar Flare Forecasting Using Time Series of SDO/HMI Vector Magnetic Field Data and Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Ilonidis, Stathis; Bobra, Monica G.; Couvidat, Sebastien

    2015-04-01

    This project is motivated by the need to understand the physical mechanisms that generate solar flares, and assess whether reliable data-driven flare forecasts are possible. We build a flare forecasting model that takes into account the temporal evolution of the active regions and provides improved forecasts for the next 24 hours. We use SDO/HMI vector magnetic field data for all the flaring regions with magnitude M1.0 or higher that have been observed with HMI and several thousand non-flaring regions. Each region is characterized by hundreds of features, including physical properties, such as the current helicity and the Lorentz force, as well as parameters that describe the temporal evolution of these properties over a two-day interval, starting 3 days and ending 1 day before the flare eruption. All of these features were used to train a Support Vector Machine (SVM), which is a supervised machine learning method used in classification problems. The results show that the SVM algorithm can achieve a True Skill Statistic of 0.91, an accuracy of 0.985, and a Heidke skill score of 0.861, improving the results of Bobra and Couvidat (2015).

  6. Patient-centered yes/no prognosis using learning machines

    PubMed Central

    König, I.R.; Malley, J.D.; Pajevic, S.; Weimar, C.; Diener, H-C.

    2009-01-01

    In the last 15 years several machine learning approaches have been developed for classification and regression. In an intuitive manner we introduce the main ideas of classification and regression trees, support vector machines, bagging, boosting and random forests. We discuss differences in the use of machine learning in the biomedical community and the computer sciences. We propose methods for comparing machines on a sound statistical basis. Data from the German Stroke Study Collaboration is used for illustration. We compare the results from learning machines to those obtained by a published logistic regression and discuss similarities and differences. PMID:19216340

  7. The Higgs Machine Learning Challenge

    NASA Astrophysics Data System (ADS)

    Adam-Bourdarios, C.; Cowan, G.; Germain-Renaud, C.; Guyon, I.; Kégl, B.; Rousseau, D.

    2015-12-01

    The Higgs Machine Learning Challenge was an open data analysis competition that took place between May and September 2014. Samples of simulated data from the ATLAS Experiment at the LHC corresponding to signal events with Higgs bosons decaying to τ+τ- together with background events were made available to the public through the website of the data science organization Kaggle (kaggle.com). Participants attempted to identify the search region in a space of 30 kinematic variables that would maximize the expected discovery significance of the signal process. One of the primary goals of the Challenge was to promote communication of new ideas between the Machine Learning (ML) and HEP communities. In this regard it was a resounding success, with almost 2,000 participants from HEP, ML and other areas. The process of understanding and integrating the new ideas, particularly from ML into HEP, is currently underway.

  8. Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study

    PubMed Central

    Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala

    2016-01-01

    Background It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals’ opinions and the decision-making process. Objective We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. Methods We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms “cervical cancer,” “Pap smear,” and “Pap test” were analyzed during: (1) January 1–March 13, 2012, and (2) March 14–June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . Results This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, “New Screening Guidelines for Cervical Cancer”), and many topics in both time periods were addressing basic screening promotion and education (eg, “It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low

  9. A machine learning method for identifying morphological patterns in reflectance confocal microscopy mosaics of melanocytic skin lesions in-vivo

    NASA Astrophysics Data System (ADS)

    Kose, Kivanc; Alessi-Fox, Christi; Gill, Melissa; Dy, Jennifer G.; Brooks, Dana H.; Rajadhyaksha, Milind

    2016-02-01

    We present a machine learning algorithm that can imitate the clinicians qualitative and visual process of analyzing reflectance confocal microscopy (RCM) mosaics at the dermal epidermal junction (DEJ) of skin. We divide the mosaics into localized areas of processing, and capture the textural appearance of each area using dense Speeded Up Robust Feature (SURF). Using these features, we train a support vector machine (SVM) classifier that can distinguish between meshwork, ring, clod, aspecific and background patterns in benign conditions and melanomas. Preliminary results on 20 RCM mosaics labeled by expert readers show classification with 55 - 81% sensitivity and 81 - 89% specificity in distinguishing these patterns.

  10. Accurate prediction of polarised high order electrostatic interactions for hydrogen bonded complexes using the machine learning method kriging.

    PubMed

    Hughes, Timothy J; Kandathil, Shaun M; Popelier, Paul L A

    2015-02-01

    As intermolecular interactions such as the hydrogen bond are electrostatic in origin, rigorous treatment of this term within force field methodologies should be mandatory. We present a method able of accurately reproducing such interactions for seven van der Waals complexes. It uses atomic multipole moments up to hexadecupole moment mapped to the positions of the nuclear coordinates by the machine learning method kriging. Models were built at three levels of theory: HF/6-31G(**), B3LYP/aug-cc-pVDZ and M06-2X/aug-cc-pVDZ. The quality of the kriging models was measured by their ability to predict the electrostatic interaction energy between atoms in external test examples for which the true energies are known. At all levels of theory, >90% of test cases for small van der Waals complexes were predicted within 1 kJ mol(-1), decreasing to 60-70% of test cases for larger base pair complexes. Models built on moments obtained at B3LYP and M06-2X level generally outperformed those at HF level. For all systems the individual interactions were predicted with a mean unsigned error of less than 1 kJ mol(-1). PMID:24274986

  11. ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction

    PubMed Central

    2013-01-01

    Background Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case–control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. Results We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual’s continental and sub-continental ancestry. To predict an individual’s continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control’s λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of

  12. Scaling up: Distributed machine learning with cooperation

    SciTech Connect

    Provost, F.J.; Hennessy, D.N.

    1996-12-31

    Machine-learning methods are becoming increasingly popular for automated data analysis. However, standard methods do not scale up to massive scientific and business data sets without expensive hardware. This paper investigates a practical alternative for scaling up: the use of distributed processing to take advantage of the often dormant PCs and workstations available on local networks. Each workstation runs a common rule-learning program on a subset of the data. We first show that for commonly used rule-evaluation criteria, a simple form of cooperation can guarantee that a rule will look good to the set of cooperating learners if and only if it would look good to a single learner operating with the entire data set. We then show how such a system can further capitalize on different perspectives by sharing learned knowledge for significant reduction in search effort. We demonstrate the power of the method by learning from a massive data set taken from the domain of cellular fraud detection. Finally, we provide an overview of other methods for scaling up machine learning.

  13. Food category consumption and obesity prevalence across countries: an application of Machine Learning method to big data analysis

    NASA Astrophysics Data System (ADS)

    Dunstan, Jocelyn; Fallah-Fini, Saeideh; Nau, Claudia; Glass, Thomas; Global Obesity Prevention Center Team

    The applications of sophisticated mathematical and numerical tools in public health has been demonstrated to be useful in predicting the outcome of public intervention as well as to study, for example, the main causes of obesity without doing experiments with the population. In this project we aim to understand which kind of food consumed in different countries over time best defines the rate of obesity in those countries. The use of Machine Learning is particularly useful because we do not need to create a hypothesis and test it with the data, but instead we learn from the data to find the groups of food that best describe the prevalence of obesity.

  14. Introducing Machine Learning Concepts with WEKA.

    PubMed

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information. PMID:27008023

  15. Topics in Machine Learning for Astronomers

    NASA Astrophysics Data System (ADS)

    Cisewski, Jessi

    2016-01-01

    As astronomical datasets continue to increase in size and complexity, innovative statistical and machine learning tools are required to address the scientific questions of interest in a computationally efficient manner. I will introduce some tools that astronomers can employ for such problems with a focus on clustering and classification techniques. I will introduce standard methods, but also get into more recent developments that may be of use to the astronomical community.

  16. Using machine learning methods for predicting inhospital mortality in patients undergoing open repair of abdominal aortic aneurysm.

    PubMed

    Monsalve-Torra, Ana; Ruiz-Fernandez, Daniel; Marin-Alonso, Oscar; Soriano-Payá, Antonio; Camacho-Mackenzie, Jaime; Carreño-Jaimes, Marisol

    2016-08-01

    An abdominal aortic aneurysm is an abnormal dilatation of the aortic vessel at abdominal level. This disease presents high rate of mortality and complications causing a decrease in the quality of life and increasing the cost of treatment. To estimate the mortality risk of patients undergoing surgery is complex due to the variables associated. The use of clinical decision support systems based on machine learning could help medical staff to improve the results of surgery and get a better understanding of the disease. In this work, the authors present a predictive system of inhospital mortality in patients who were undergoing to open repair of abdominal aortic aneurysm. Different methods as multilayer perceptron, radial basis function and Bayesian networks are used. Results are measured in terms of accuracy, sensitivity and specificity of the classifiers, achieving an accuracy higher than 95%. The developing of a system based on the algorithms tested can be useful for medical staff in order to make a better planning of care and reducing undesirable surgery results and the cost of the post-surgical treatments. PMID:27395372

  17. Machine Learning Methods for Binary and Multiclass Classification of Melanoma Thickness From Dermoscopic Images.

    PubMed

    Saez, Aurora; Sanchez-Monedero, Javier; Gutierrez, Pedro Antonio; Hervas-Martinez, Cesar

    2016-04-01

    Thickness of the melanoma is the most important factor associated with survival in patients with melanoma. It is most commonly reported as a measurement of depth given in millimeters (mm) and computed by means of pathological examination after a biopsy of the suspected lesion. In order to avoid the use of an invasive method in the estimation of the thickness of melanoma before surgery, we propose a computational image analysis system from dermoscopic images. The proposed feature extraction is based on the clinical findings that correlate certain characteristics present in dermoscopic images and tumor depth. Two supervised classification schemes are proposed: a binary classification in which melanomas are classified into thin or thick, and a three-class scheme (thin, intermediate, and thick). The performance of several nominal classification methods, including a recent interpretable method combining logistic regression with artificial neural networks (Logistic regression using Initial variables and Product Units, LIPU), is compared. For the three-class problem, a set of ordinal classification methods (considering ordering relation between the three classes) is included. For the binary case, LIPU outperforms all the other methods with an accuracy of 77.6%, while, for the second scheme, although LIPU reports the highest overall accuracy, the ordinal classification methods achieve a better balance between the performances of all classes. PMID:26672031

  18. An analysis of feature relevance in the classification of astronomical transients with machine learning methods

    NASA Astrophysics Data System (ADS)

    D'Isanto, A.; Cavuoti, S.; Brescia, M.; Donalek, C.; Longo, G.; Riccio, G.; Djorgovski, S. G.

    2016-04-01

    The exploitation of present and future synoptic (multiband and multi-epoch) surveys requires an extensive use of automatic methods for data processing and data interpretation. In this work, using data extracted from the Catalina Real Time Transient Survey (CRTS), we investigate the classification performance of some well tested methods: Random Forest, MultiLayer Perceptron with Quasi Newton Algorithm and K-Nearest Neighbours, paying special attention to the feature selection phase. In order to do so, several classification experiments were performed. Namely: identification of cataclysmic variables, separation between galactic and extragalactic objects and identification of supernovae.

  19. Harnessing the power of big data: infusing the scientific method with machine learning to transform ecology

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Most efforts to harness the power of big data for ecology and environmental sciences focus on data and metadata sharing, standardization, and accuracy. However, many scientists have not accepted the data deluge as an integral part of their research because the current scientific method is not scalab...

  20. Machine Learning Methods for the Understanding and Prediction of Climate Systems: Tropical Pacific Ocean Thermocline and ENSO events

    NASA Astrophysics Data System (ADS)

    Lima, C. H.; Lall, U.

    2012-12-01

    In this work we explore recently developed methods from the machine learning community for dimensionality reduction and model selection of very large datasets. We apply the nonlinear maximum variance unfolding (MVU) method to find a short dimensional space for the thermocline of the Tropical Pacific Ocean as indicated by the ocean depth of the 200C isotherm from the NOAA/NCEP GODAS dataset. The leading modes are then used as covariates in an ENSO forecast model based on LASSO regression, where parameters are shrunk in order to find the best subset of predictors. A comparison with Principal Component Analysis (PCA) reveals that MVU is able to reduce the thermocline data from 21009 dimensions to three main components that collectively explain 77% of the system variance, whereas the first three PCs respond to 47% of the variance only. The series of the first three leading MVU and PCA based modes and their associated spatial patterns show also different features, including an enhanced monotonic upward trend in the first MVU mode that is hardly detected in the correspondent first PCA mode. Correlation analysis between the MVU components and the NINO3 index shows that each of the modes has peak correlations across different lag times, with statistically significant correlation coefficients up to two years. After combining the three MVU components across several lag times, a forecast model for NINO3 based on LASSO regression was built and tested using the ten-fold cross-validation method. Based on metrics such as the RMSE and correlation scores, the results show appreciable skills for lead times that go beyond ten months, particularly for the December NINO3, which is responsible for several floods and droughts across the globe.; Spatial signature of the first MVU mode. ; First MVU series featuring the upward trend.

  1. Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method.

    PubMed

    Jiang, Min; Denny, Josh C; Tang, Buzhou; Cao, Hongxin; Xu, Hua

    2012-01-01

    Semantic lexicons that link words and phrases to specific semantic types such as diseases are valuable assets for clinical natural language processing (NLP) systems. Although terminological terms with predefined semantic types can be generated easily from existing knowledge bases such as the Unified Medical Language Systems (UMLS), they are often limited and do not have good coverage for narrative clinical text. In this study, we developed a method for building semantic lexicons from clinical corpus. It extracts candidate semantic terms using a conditional random field (CRF) classifier and then selects terms using the C-Value algorithm. We applied the method to a corpus containing 10 years of discharge summaries from Vanderbilt University Hospital (VUH) and extracted 44,957 new terms for three semantic groups: Problem, Treatment, and Test. A manual analysis of 200 randomly selected terms not found in the UMLS demonstrated that 59% of them were meaningful new clinical concepts and 25% were lexical variants of exiting concepts in the UMLS. Furthermore, we compared the effectiveness of corpus-derived and UMLS-derived semantic lexicons in the concept extraction task of the 2010 i2b2 clinical NLP challenge. Our results showed that the classifier with corpus-derived semantic lexicons as features achieved a better performance (F-score 82.52%) than that with UMLS-derived semantic lexicons as features (F-score 82.04%). We conclude that such corpus-based methods are effective for generating semantic lexicons, which may improve named entity recognition tasks and may aid in augmenting synonymy within existing terminologies. PMID:23304311

  2. Applying GIS and Machine Learning Methods to Twitter Data for Multiscale Surveillance of Influenza

    PubMed Central

    Aslam, Anoshe; Nagel, Anna; Gawron, Jean-Mark

    2016-01-01

    Traditional methods for monitoring influenza are haphazard and lack fine-grained details regarding the spatial and temporal dynamics of outbreaks. Twitter gives researchers and public health officials an opportunity to examine the spread of influenza in real-time and at multiple geographical scales. In this paper, we introduce an improved framework for monitoring influenza outbreaks using the social media platform Twitter. Relying upon techniques from geographic information science (GIS) and data mining, Twitter messages were collected, filtered, and analyzed for the thirty most populated cities in the United States during the 2013–2014 flu season. The results of this procedure are compared with national, regional, and local flu outbreak reports, revealing a statistically significant correlation between the two data sources. The main contribution of this paper is to introduce a comprehensive data mining process that enhances previous attempts to accurately identify tweets related to influenza. Additionally, geographical information systems allow us to target, filter, and normalize Twitter messages. PMID:27455108

  3. Applying GIS and Machine Learning Methods to Twitter Data for Multiscale Surveillance of Influenza.

    PubMed

    Allen, Chris; Tsou, Ming-Hsiang; Aslam, Anoshe; Nagel, Anna; Gawron, Jean-Mark

    2016-01-01

    Traditional methods for monitoring influenza are haphazard and lack fine-grained details regarding the spatial and temporal dynamics of outbreaks. Twitter gives researchers and public health officials an opportunity to examine the spread of influenza in real-time and at multiple geographical scales. In this paper, we introduce an improved framework for monitoring influenza outbreaks using the social media platform Twitter. Relying upon techniques from geographic information science (GIS) and data mining, Twitter messages were collected, filtered, and analyzed for the thirty most populated cities in the United States during the 2013-2014 flu season. The results of this procedure are compared with national, regional, and local flu outbreak reports, revealing a statistically significant correlation between the two data sources. The main contribution of this paper is to introduce a comprehensive data mining process that enhances previous attempts to accurately identify tweets related to influenza. Additionally, geographical information systems allow us to target, filter, and normalize Twitter messages. PMID:27455108

  4. Rapid estimation of compost enzymatic activity by spectral analysis method combined with machine learning.

    PubMed

    Chakraborty, Somsubhra; Das, Bhabani S; Ali, Md Nasim; Li, Bin; Sarathjith, M C; Majumdar, K; Ray, D P

    2014-03-01

    The aim of this study was to investigate the feasibility of using visible near-infrared (VisNIR) diffuse reflectance spectroscopy (DRS) as an easy, inexpensive, and rapid method to predict compost enzymatic activity, which traditionally measured by fluorescein diacetate hydrolysis (FDA-HR) assay. Compost samples representative of five different compost facilities were scanned by DRS, and the raw reflectance spectra were preprocessed using seven spectral transformations for predicting compost FDA-HR with six multivariate algorithms. Although principal component analysis for all spectral pretreatments satisfactorily identified the clusters by compost types, it could not separate different FDA contents. Furthermore, the artificial neural network multilayer perceptron (residual prediction deviation=3.2, validation r(2)=0.91 and RMSE=13.38 μg g(-1) h(-1)) outperformed other multivariate models to capture the highly non-linear relationships between compost enzymatic activity and VisNIR reflectance spectra after Savitzky-Golay first derivative pretreatment. This work demonstrates the efficiency of VisNIR DRS for predicting compost enzymatic as well as microbial activity. PMID:24398221

  5. Using Machine learning method to estimate Air Temperature from MODIS over Berlin

    NASA Astrophysics Data System (ADS)

    Marzban, F.; Preusker, R.; Sodoudi, S.; Taheri, H.; Allahbakhshi, M.

    2015-12-01

    Land Surface Temperature (LST) is defined as the temperature of the interface between the Earth's surface and its atmosphere and thus it is a critical variable to understand land-atmosphere interactions and a key parameter in meteorological and hydrological studies, which is involved in energy fluxes. Air temperature (Tair) is one of the most important input variables in different spatially distributed hydrological, ecological models. The estimation of near surface air temperature is useful for a wide range of applications. Some applications from traffic or energy management, require Tair data in high spatial and temporal resolution at two meters height above the ground (T2m), sometimes in near-real-time. Thus, a parameterization based on boundary layer physical principles was developed that determines the air temperature from remote sensing data (MODIS). Tair is commonly obtained from synoptic measurements in weather stations. However, the derivation of near surface air temperature from the LST derived from satellite is far from straight forward. T2m is not driven directly by the sun, but indirectly by LST, thus T2m can be parameterized from the LST and other variables such as Albedo, NDVI, Water vapor and etc. Most of the previous studies have focused on estimating T2m based on simple and advanced statistical approaches, Temperature-Vegetation index and energy-balance approaches but the main objective of this research is to explore the relationships between T2m and LST in Berlin by using Artificial intelligence method with the aim of studying key variables to allow us establishing suitable techniques to obtain Tair from satellite Products and ground data. Secondly, an attempt was explored to identify an individual mix of attributes that reveals a particular pattern to better understanding variation of T2m during day and nighttime over the different area of Berlin. For this reason, a three layer Feedforward neural networks is considered with LMA algorithm

  6. Discriminative clustering via extreme learning machine.

    PubMed

    Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng

    2015-10-01

    Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036

  7. Defect classification using machine learning

    NASA Astrophysics Data System (ADS)

    Carr, Adra; Kegelmeyer, L.; Liao, Z. M.; Abdulla, G.; Cross, D.; Kegelmeyer, W. P.; Ravizza, F.; Carr, C. W.

    2008-10-01

    Laser-induced damage growth on the surface of fused silica optics has been extensively studied and has been found to depend on a number of factors including fluence and the surface on which the damage site resides. It has been demonstrated that damage sites as small as a few tens of microns can be detected and tracked on optics installed a fusion-class laser, however, determining the surface of an optic on which a damage site resides in situ can be a significant challenge. In this work demonstrate that a machine-learning algorithm can successfully predict the surface location of the damage site using an expanded set of characteristics for each damage site, some of which are not historically associated with growth rate.

  8. Defect Classification Using Machine Learning

    SciTech Connect

    Carr, A; Kegelmeyer, L; Liao, Z M; Abdulla, G; Cross, D; Kegelmeyer, W P; Raviza, F; Carr, C W

    2008-10-24

    Laser-induced damage growth on the surface of fused silica optics has been extensively studied and has been found to depend on a number of factors including fluence and the surface on which the damage site resides. It has been demonstrated that damage sites as small as a few tens of microns can be detected and tracked on optics installed a fusion-class laser, however, determining the surface of an optic on which a damage site resides in situ can be a significant challenge. In this work demonstrate that a machine-learning algorithm can successfully predict the surface location of the damage site using an expanded set of characteristics for each damage site, some of which are not historically associated with growth rate.

  9. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    PubMed Central

    Subbulakshmi, C. V.; Deepa, S. N.

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers. PMID:26491713

  10. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier.

    PubMed

    Subbulakshmi, C V; Deepa, S N

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers. PMID:26491713

  11. Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries

    PubMed Central

    Xu, Yan; Hong, Kai; Tsujii, Junichi

    2012-01-01

    Objective A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification. Design The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features. Measurements Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results. Results The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification. Conclusions The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance. PMID:22586067

  12. Adaptive Learning Systems: Beyond Teaching Machines

    ERIC Educational Resources Information Center

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  13. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery. PMID:26017444

  14. Probabilistic machine learning and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  15. Optimizing transition states via kernel-based machine learning.

    PubMed

    Pozun, Zachary D; Hansen, Katja; Sheppard, Daniel; Rupp, Matthias; Müller, Klaus-Robert; Henkelman, Graeme

    2012-05-01

    We present a method for optimizing transition state theory dividing surfaces with support vector machines. The resulting dividing surfaces require no a priori information or intuition about reaction mechanisms. To generate optimal dividing surfaces, we apply a cycle of machine-learning and refinement of the surface by molecular dynamics sampling. We demonstrate that the machine-learned surfaces contain the relevant low-energy saddle points. The mechanisms of reactions may be extracted from the machine-learned surfaces in order to identify unexpected chemically relevant processes. Furthermore, we show that the machine-learned surfaces significantly increase the transmission coefficient for an adatom exchange involving many coupled degrees of freedom on a (100) surface when compared to a distance-based dividing surface. PMID:22583204

  16. Machine vision systems using machine learning for industrial product inspection

    NASA Astrophysics Data System (ADS)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  17. Modeling quantum physics with machine learning

    NASA Astrophysics Data System (ADS)

    Lopez-Bezanilla, Alejandro; Arsenault, Louis-Francois; Millis, Andrew; Littlewood, Peter; von Lilienfeld, Anatole

    2014-03-01

    Machine Learning (ML) is a systematic way of inferring new results from sparse information. It directly allows for the resolution of computationally expensive sets of equations by making sense of accumulated knowledge and it is therefore an attractive method for providing computationally inexpensive 'solvers' for some of the important systems of condensed matter physics. In this talk a non-linear regression statistical model is introduced to demonstrate the utility of ML methods in solving quantum physics related problem, and is applied to the calculation of electronic transport in 1D channels. DOE contract number DE-AC02-06CH11357.

  18. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  19. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  20. Photonic Neurocomputers And Learning Machines

    NASA Astrophysics Data System (ADS)

    Farhat, Nabil H.

    1990-05-01

    The study of complex multidimensional nonlinear dynamical systems and the modeling and emulation of cognitive brain-like processing of sensory information (neural network research), including the study of chaos and its role in such systems would benefit immensely from the development of a new generation of programmable analog computers capable of carrying out collective, nonlinear and iterative computations at very high speed. The massive interconnectivity and nonlinearity needed in such analog computing structures indicate that a mix of optics and electronics mediated by judicial choice of device physics offer benefits for realizing networks with the following desirable properties: (a) large scale nets, i.e. nets with high number of decision making elements (neurons), (b) modifiable structure, i.e. ability to partition the net into any desired number of layers of prescribed size (number of neurons per layer) with any prescribed pattern of communications between them (e.g. feed forward or feedback (recurrent)), (c) programmable and/or adaptive connectivity weights between the neurons for self-organization and learning, (d) both synchroneous or asynchroneous update rules be possible, (e) high speed update i.e. neurons with lisec response time to enable rapid iteration and convergence, (f) can be used in the study and evaluation of a variety of adaptive learning algorithms, (g) can be used in rapid solution by fast simulated annealing of complex optimization problems of the kind encountered in adaptive learning, pattern recognition, and image processing. The aim of this paper is to describe recent efforts and progress made towards achieving these desirable attributes in analog photonic (optoelectronic and/or electron optical) hardware that utilizes primarily incoherent light. A specific example, hardware implementation of a stochastic Boltzmann learning machine, is used as vehicle for identifying generic issues and clarify research and development areas for further

  1. Applying Machine Learning to Star Cluster Classification

    NASA Astrophysics Data System (ADS)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  2. Galaxy morphology - An unsupervised machine learning approach

    NASA Astrophysics Data System (ADS)

    Schutter, A.; Shamir, L.

    2015-09-01

    Structural properties poses valuable information about the formation and evolution of galaxies, and are important for understanding the past, present, and future universe. Here we use unsupervised machine learning methodology to analyze a network of similarities between galaxy morphological types, and automatically deduce a morphological sequence of galaxies. Application of the method to the EFIGI catalog show that the morphological scheme produced by the algorithm is largely in agreement with the De Vaucouleurs system, demonstrating the ability of computer vision and machine learning methods to automatically profile galaxy morphological sequences. The unsupervised analysis method is based on comprehensive computer vision techniques that compute the visual similarities between the different morphological types. Rather than relying on human cognition, the proposed system deduces the similarities between sets of galaxy images in an automatic manner, and is therefore not limited by the number of galaxies being analyzed. The source code of the method is publicly available, and the protocol of the experiment is included in the paper so that the experiment can be replicated, and the method can be used to analyze user-defined datasets of galaxy images.

  3. Multistrategy machine-learning vision system

    NASA Astrophysics Data System (ADS)

    Roberts, Barry A.

    1993-04-01

    Advances in the field of machine learning technology have yielded learning techniques with solid theoretical foundations that are applicable to the problems being encountered by object recognition systems. At Honeywell an object recognition system that works with high-level, symbolic, object features is under development. This system, named object recognition accomplished through combined learning expertise (ORACLE), employs both an inductive learning technique (i.e., conceptual clustering, CC) and a deductive technique (i.e., explanation-based learning, EBL) that are combined in a synergistic manner. This paper provides an overview of the ORACLE system, describes the machine learning mechanisms (EBL and CC) that it employs, and provides example results of system operation. The paper emphasizes the beneficial effect of integrating machine learning into object recognition systems.

  4. Machine learning in soil classification.

    PubMed

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    In a number of engineering problems, e.g. in geotechnics, petroleum engineering, etc. intervals of measured series data (signals) are to be attributed a class maintaining the constraint of contiguity and standard classification methods could be inadequate. Classification in this case needs involvement of an expert who observes the magnitude and trends of the signals in addition to any a priori information that might be available. In this paper, an approach for automating this classification procedure is presented. Firstly, a segmentation algorithm is developed and applied to segment the measured signals. Secondly, the salient features of these segments are extracted using boundary energy method. Based on the measured data and extracted features to assign classes to the segments classifiers are built; they employ Decision Trees, ANN and Support Vector Machines. The methodology was tested in classifying sub-surface soil using measured data from Cone Penetration Testing and satisfactory results were obtained. PMID:16530382

  5. Machine Learning and Cosmological Simulations

    NASA Astrophysics Data System (ADS)

    Kamdar, Harshil; Turk, Matthew; Brunner, Robert

    2016-01-01

    We explore the application of machine learning (ML) to the problem of galaxy formation and evolution in a hierarchical universe. Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively evaluating the extent of the influence of dark matter halo properties on small-scale structure formation. For our analyses, we use both semi-analytical models (Millennium simulation) and N-body + hydrodynamical simulations (Illustris simulation). The ML algorithms are trained on important dark matter halo properties (inputs) and galaxy properties (outputs). The trained models are able to robustly predict the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity. Moreover, the ML simulated galaxies obey fundamental observational constraints implying that the population of ML predicted galaxies is physically and statistically robust. Next, ML algorithms are trained on an N-body + hydrodynamical simulation and applied to an N-body only simulation (Dark Sky simulation, Illustris Dark), populating this new simulation with galaxies. We can examine how structure formation changes with different cosmological parameters and are able to mimic a full-blown hydrodynamical simulation in a computation time that is orders of magnitude smaller. We find that the set of ML simulated galaxies in Dark Sky obey the same observational constraints, further solidifying ML's place as an intriguing and promising technique in future galaxy formation studies and rapid mock galaxy catalog creation.

  6. Memristor models for machine learning.

    PubMed

    Carbajal, Juan Pablo; Dambre, Joni; Hermans, Michiel; Schrauwen, Benjamin

    2015-03-01

    In the quest for alternatives to traditional complementary metal-oxide-semiconductor, it is being suggested that digital computing efficiency and power can be improved by matching the precision to the application. Many applications do not need the high precision that is being used today. In particular, large gains in area and power efficiency could be achieved by dedicated analog realizations of approximate computing engines. In this work we explore the use of memristor networks for analog approximate computation, based on a machine learning framework called reservoir computing. Most experimental investigations on the dynamics of memristors focus on their nonvolatile behavior. Hence, the volatility that is present in the developed technologies is usually unwanted and is not included in simulation models. In contrast, in reservoir computing, volatility is not only desirable but necessary. Therefore, in this work, we propose two different ways to incorporate it into memristor simulation models. The first is an extension of Strukov's model, and the second is an equivalent Wiener model approximation. We analyze and compare the dynamical properties of these models and discuss their implications for the memory and the nonlinear processing capacity of memristor networks. Our results indicate that device variability, increasingly causing problems in traditional computer design, is an asset in the context of reservoir computing. We conclude that although both models could lead to useful memristor-based reservoir computing systems, their computational performance will differ. Therefore, experimental modeling research is required for the development of accurate volatile memristor models. PMID:25602769

  7. Extreme Learning Machine for Multilayer Perceptron.

    PubMed

    Tang, Jiexiong; Deng, Chenwei; Huang, Guang-Bin

    2016-04-01

    Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via l1 constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme. PMID:25966483

  8. Classification of collective behavior: a comparison of tracking and machine learning methods to study the effect of ambient light on fish shoaling.

    PubMed

    Butail, Sachit; Salerno, Philip; Bollt, Erik M; Porfiri, Maurizio

    2015-12-01

    Traditional approaches for the analysis of collective behavior entail digitizing the position of each individual, followed by evaluation of pertinent group observables, such as cohesion and polarization. Machine learning may enable considerable advancements in this area by affording the classification of these observables directly from images. While such methods have been successfully implemented in the classification of individual behavior, their potential in the study collective behavior is largely untested. In this paper, we compare three methods for the analysis of collective behavior: simple tracking (ST) without resolving occlusions, machine learning with real data (MLR), and machine learning with synthetic data (MLS). These methods are evaluated on videos recorded from an experiment studying the effect of ambient light on the shoaling tendency of Giant danios. In particular, we compute average nearest-neighbor distance (ANND) and polarization using the three methods and compare the values with manually-verified ground-truth data. To further assess possible dependence on sampling rate for computing ANND, the comparison is also performed at a low frame rate. Results show that while ST is the most accurate at higher frame rate for both ANND and polarization, at low frame rate for ANND there is no significant difference in accuracy between the three methods. In terms of computational speed, MLR and MLS take significantly less time to process an image, with MLS better addressing constraints related to generation of training data. Finally, all methods are able to successfully detect a significant difference in ANND as the ambient light intensity is varied irrespective of the direction of intensity change. PMID:25294042

  9. Alternating minimization and Boltzmann machine learning.

    PubMed

    Byrne, W

    1992-01-01

    Training a Boltzmann machine with hidden units is appropriately treated in information geometry using the information divergence and the technique of alternating minimization. The resulting algorithm is shown to be closely related to gradient descent Boltzmann machine learning rules, and the close relationship of both to the EM algorithm is described. An iterative proportional fitting procedure for training machines without hidden units is described and incorporated into the alternating minimization algorithm. PMID:18276461

  10. Predicting fecal sources in waters with diverse pollution loads using general and molecular host-specific indicators and applying machine learning methods.

    PubMed

    Casanovas-Massana, Arnau; Gómez-Doñate, Marta; Sánchez, David; Belanche-Muñoz, Lluís A; Muniesa, Maite; Blanch, Anicet R

    2015-03-15

    In this study we use a machine learning software (Ichnaea) to generate predictive models for water samples with different concentrations of fecal contamination (point source, moderate and low). We applied several MST methods (host-specific Bacteroides phages, mitochondrial DNA genetic markers, Bifidobacterium adolescentis and Bifidobacterium dentium markers, and bifidobacterial host-specific qPCR), and general indicators (Escherichia coli, enterococci and somatic coliphages) to evaluate the source of contamination in the samples. The results provided data to the Ichnaea software, that evaluated the performance of each method in the different scenarios and determined the source of the contamination. Almost all MST methods in this study determined correctly the origin of fecal contamination at point source and in moderate concentration samples. When the dilution of the fecal pollution increased (below 3 log10 CFU E. coli/100 ml) some of these indicators (bifidobacterial host-specific qPCR, some mitochondrial markers or B. dentium marker) were not suitable because their concentrations decreased below the detection limit. Using the data from source point samples, the software Ichnaea produced models for waters with low levels of fecal pollution. These models included some MST methods, on the basis of their best performance, that were used to determine the source of pollution in this area. Regardless the methods selected, that could vary depending on the scenario, inductive machine learning methods are a promising tool in MST studies and may represent a leap forward in solving MST cases. PMID:25585145

  11. On machine learning classification of otoneurological data.

    PubMed

    Juhola, Martti

    2008-01-01

    A dataset including cases of six otoneurological diseases was analysed using machine learning methods to investigate the classification problem of these diseases and to compare the effectiveness of different methods for this data. Linear discriminant analysis was the best method and next multilayer perceptron neural networks provided that the data was input into a network in the form of principal components. Nearest neighbour searching, k-means clustering and Kohonen neural networks achieved almost as good results as the former, but decision trees slightly worse. Thus, these methods fared well, but Naïve Bayes rule could not be used since some data matrices were singular. Otoneurological cases subject to the six diseases given can be reliably distinguished. PMID:18487733

  12. Development of E-Learning Materials for Machining Safety Education

    NASA Astrophysics Data System (ADS)

    Nakazawa, Tsuyoshi; Mita, Sumiyoshi; Matsubara, Masaaki; Takashima, Takeo; Tanaka, Koichi; Izawa, Satoru; Kawamura, Takashi

    We developed two e-learning materials for Manufacturing Practice safety education: movie learning materials and hazard-detection learning materials. Using these video and sound media, students can learn how to operate machines safely with movie learning materials, which raise the effectiveness of preparation and review for manufacturing practice. Using these materials, students can realize safety operation well. Students can apply knowledge learned in lectures to the detection of hazards and use study methods for hazard detection during machine operation using the hazard-detection learning materials. Particularly, the hazard-detection learning materials raise students‧ safety consciousness and increase students‧ comprehension of knowledge from lectures and comprehension of operations during Manufacturing Practice.

  13. Machine Learning for Biomedical Literature Triage

    PubMed Central

    Almeida, Hayda; Meurs, Marie-Jean; Kosseim, Leila; Butler, Greg; Tsang, Adrian

    2014-01-01

    This paper presents a machine learning system for supporting the first task of the biological literature manual curation process, called triage. We compare the performance of various classification models, by experimenting with dataset sampling factors and a set of features, as well as three different machine learning algorithms (Naive Bayes, Support Vector Machine and Logistic Model Trees). The results show that the most fitting model to handle the imbalanced datasets of the triage classification task is obtained by using domain relevant features, an under-sampling technique, and the Logistic Model Trees algorithm. PMID:25551575

  14. Photometric Supernova Classification with Machine Learning

    NASA Astrophysics Data System (ADS)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  15. Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction

    NASA Astrophysics Data System (ADS)

    Belayneh, A.; Adamowski, J.; Khalil, B.; Quilty, J.

    2016-05-01

    This study explored the ability of coupled machine learning models and ensemble techniques to predict drought conditions in the Awash River Basin of Ethiopia. The potential of wavelet transforms coupled with the bootstrap and boosting ensemble techniques to develop reliable artificial neural network (ANN) and support vector regression (SVR) models was explored in this study for drought prediction. Wavelet analysis was used as a pre-processing tool and was shown to improve drought predictions. The Standardized Precipitation Index (SPI) (in this case SPI 3, SPI 12 and SPI 24) is a meteorological drought index that was forecasted using the aforementioned models and these SPI values represent short and long-term drought conditions. The performances of all models were compared using RMSE, MAE, and R2. The prediction results indicated that the use of the boosting ensemble technique consistently improved the correlation between observed and predicted SPIs. In addition, the use of wavelet analysis improved the prediction results of all models. Overall, the wavelet boosting ANN (WBS-ANN) and wavelet boosting SVR (WBS-SVR) models provided better prediction results compared to the other model types evaluated.

  16. A method for the evaluation of image quality according to the recognition effectiveness of objects in the optical remote sensing image using machine learning algorithm.

    PubMed

    Yuan, Tao; Zheng, Xinqi; Hu, Xuan; Zhou, Wei; Wang, Wei

    2014-01-01

    Objective and effective image quality assessment (IQA) is directly related to the application of optical remote sensing images (ORSI). In this study, a new IQA method of standardizing the target object recognition rate (ORR) is presented to reflect quality. First, several quality degradation treatments with high-resolution ORSIs are implemented to model the ORSIs obtained in different imaging conditions; then, a machine learning algorithm is adopted for recognition experiments on a chosen target object to obtain ORRs; finally, a comparison with commonly used IQA indicators was performed to reveal their applicability and limitations. The results showed that the ORR of the original ORSI was calculated to be up to 81.95%, whereas the ORR ratios of the quality-degraded images to the original images were 65.52%, 64.58%, 71.21%, and 73.11%. The results show that these data can more accurately reflect the advantages and disadvantages of different images in object identification and information extraction when compared with conventional digital image assessment indexes. By recognizing the difference in image quality from the application effect perspective, using a machine learning algorithm to extract regional gray scale features of typical objects in the image for analysis, and quantitatively assessing quality of ORSI according to the difference, this method provides a new approach for objective ORSI assessment. PMID:24489739

  17. A Machine Learning Based Framework for Adaptive Mobile Learning

    NASA Astrophysics Data System (ADS)

    Al-Hmouz, Ahmed; Shen, Jun; Yan, Jun

    Advances in wireless technology and handheld devices have created significant interest in mobile learning (m-learning) in recent years. Students nowadays are able to learn anywhere and at any time. Mobile learning environments must also cater for different user preferences and various devices with limited capability, where not all of the information is relevant and critical to each learning environment. To address this issue, this paper presents a framework that depicts the process of adapting learning content to satisfy individual learner characteristics by taking into consideration his/her learning style. We use a machine learning based algorithm for acquiring, representing, storing, reasoning and updating each learner acquired profile.

  18. Machine Learning for Biological Trajectory Classification Applications

    NASA Technical Reports Server (NTRS)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  19. Extreme Learning Machines for spatial environmental data

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2015-12-01

    The use of machine learning algorithms has increased in a wide variety of domains (from finance to biocomputing and astronomy), and nowadays has a significant impact on the geoscience community. In most real cases geoscience data modelling problems are multivariate, high dimensional, variable at several spatial scales, and are generated by non-linear processes. For such complex data, the spatial prediction of continuous (or categorical) variables is a challenging task. The aim of this paper is to investigate the potential of the recently developed Extreme Learning Machine (ELM) for environmental data analysis, modelling and spatial prediction purposes. An important contribution of this study deals with an application of a generic self-consistent methodology for environmental data driven modelling based on Extreme Learning Machine. Both real and simulated data are used to demonstrate applicability of ELM at different stages of the study to understand and justify the results.

  20. Introduction to machine learning for brain imaging.

    PubMed

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. PMID:21172442

  1. Learning in brains and machines.

    PubMed

    Poggio, T; Shelton, C R

    2000-01-01

    The problem of learning is arguably at the very core of the problem of intelligence, both biological and artificial. In this paper we sketch some of our work over the last ten years in the area of supervised learning, focusing on three interlinked directions of research: theory, engineering applications (that is, making intelligent software) and neuroscience (that is, understanding the brain's mechanisms of learning). PMID:11198239

  2. Machine Learning for Dynamical Mean Field Theory

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-Francois; Lopez-Bezanilla, Alejandro; von Lilienfeld, O. Anatole; Littlewood, P. B.; Millis, Andy

    2014-03-01

    Machine Learning (ML), an approach that infers new results from accumulated knowledge, is in use for a variety of tasks ranging from face and voice recognition to internet searching and has recently been gaining increasing importance in chemistry and physics. In this talk, we investigate the possibility of using ML to solve the equations of dynamical mean field theory which otherwise requires the (numerically very expensive) solution of a quantum impurity model. Our ML scheme requires the relation between two functions: the hybridization function describing the bare (local) electronic structure of a material and the self-energy describing the many body physics. We discuss the parameterization of the two functions for the exact diagonalization solver and present examples, beginning with the Anderson Impurity model with a fixed bath density of states, demonstrating the advantages and the pitfalls of the method. DOE contract DE-AC02-06CH11357.

  3. A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories.

    PubMed

    Hasan, Mehedi; Kotov, Alexander; Idalski Carcone, April; Dong, Ming; Naar, Sylvie; Brogan Hartlieb, Kathryn

    2016-08-01

    This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. PMID:27185608

  4. Machine learning in cell biology - teaching computers to recognize phenotypes.

    PubMed

    Sommer, Christoph; Gerlich, Daniel W

    2013-12-15

    Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline. PMID:24259662

  5. 3D Visualization of Machine Learning Algorithms with Astronomical Data

    NASA Astrophysics Data System (ADS)

    Kent, Brian R.

    2016-01-01

    We present innovative machine learning (ML) methods using unsupervised clustering with minimum spanning trees (MSTs) to study 3D astronomical catalogs. Utilizing Python code to build trees based on galaxy catalogs, we can render the results with the visualization suite Blender to produce interactive 360 degree panoramic videos. The catalogs and their ML results can be explored in a 3D space using mobile devices, tablets or desktop browsers. We compare the statistics of the MST results to a number of machine learning methods relating to optimization and efficiency.

  6. Distributed fuzzy learning using the MULTISOFT machine.

    PubMed

    Russo, M

    2001-01-01

    Describes PARGEFREX, a distributed approach to genetic-neuro-fuzzy learning which has been implemented using the MULTISOFT machine, a low-cost form of personal computers built at the University of Messina. The performance of the serial version is hugely enhanced with the simple parallelization scheme described in the paper. Once a learning dataset is fixed, there is a very high super linear speedup in the average time needed to reach a prefixed learning error, i.e., if the number of personal computers increases by n times, the mean learning time becomes less than 1/n times. PMID:18249882

  7. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology

    PubMed Central

    Ju, Ying

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics. PMID:27478823

  8. Machine Learning Toolkit for Extreme Scale

    SciTech Connect

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are considered in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets

  9. Machine Learning Toolkit for Extreme Scale

    Energy Science and Technology Software Center (ESTSC)

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination ofmore » samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are considered in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less

  10. Hypervelocity cutting machine and method

    DOEpatents

    Powell, J.R.; Reich, M.

    1996-11-12

    A method and machine are provided for cutting a workpiece such as concrete. A gun barrel is provided for repetitively loading projectiles therein and is supplied with a pressurized propellant from a storage tank. A thermal storage tank is disposed between the propellant storage tank and the gun barrel for repetitively receiving and heating propellant charges which are released in the gun barrel for repetitively firing projectiles therefrom toward the workpiece. In a preferred embodiment, hypervelocity of the projectiles is obtained for cutting the concrete workpiece by fracturing thereof. 10 figs.

  11. Hypervelocity cutting machine and method

    DOEpatents

    Powell, James R.; Reich, Morris

    1996-11-12

    A method and machine 14 are provided for cutting a workpiece 12 such as concrete. A gun barrel 16 is provided for repetitively loading projectiles 22 therein and is supplied with a pressurized propellant from a storage tank 28. A thermal storage tank 32,32A is disposed between the propellant storage tank 28 and the gun barrel 16 for repetitively receiving and heating propellant charges which are released in the gun barrel 16 for repetitively firing projectiles 22 therefrom toward the workpiece 12. In a preferred embodiment, hypervelocity of the projectiles 22 is obtained for cutting the concrete workpiece 12 by fracturing thereof.

  12. Movement error rate for evaluation of machine learning methods for sEMG-based hand movement classification.

    PubMed

    Gijsberts, Arjan; Atzori, Manfredo; Castellini, Claudio; Muller, Henning; Caputo, Barbara

    2014-07-01

    There has been increasing interest in applying learning algorithms to improve the dexterity of myoelectric prostheses. In this work, we present a large-scale benchmark evaluation on the second iteration of the publicly released NinaPro database, which contains surface electromyography data for 6 DOF force activations as well as for 40 discrete hand movements. The evaluation involves a modern kernel method and compares performance of three feature representations and three kernel functions. Both the force regression and movement classification problems can be learned successfully when using a nonlinear kernel function, while the exp- χ(2) kernel outperforms the more popular radial basis function kernel in all cases. Furthermore, combining surface electromyography and accelerometry in a multimodal classifier results in significant increases in accuracy as compared to when either modality is used individually. Since window-based classification accuracy should not be considered in isolation to estimate prosthetic controllability, we also provide results in terms of classification mistakes and prediction delay. To this extent, we propose the movement error rate as an alternative to the standard window-based accuracy. This error rate is insensitive to prediction delays and it allows us therefore to quantify mistakes and delays as independent performance characteristics. This type of analysis confirms that the inclusion of accelerometry is superior, as it results in fewer mistakes while at the same time reducing prediction delay. PMID:24760932

  13. Using Simple Machines to Leverage Learning

    ERIC Educational Resources Information Center

    Dotger, Sharon

    2008-01-01

    What would your students say if you told them they could lift you off the ground using a block and a board? Using a simple machine, they'll find out they can, and they'll learn about work, energy, and motion in the process! In addition, this integrated lesson gives students the opportunity to investigate variables while practicing measurement…

  14. Machine learning for real time remote detection

    NASA Astrophysics Data System (ADS)

    Labbé, Benjamin; Fournier, Jérôme; Henaff, Gilles; Bascle, Bénédicte; Canu, Stéphane

    2010-10-01

    Infrared systems are key to providing enhanced capability to military forces such as automatic control of threats and prevention from air, naval and ground attacks. Key requirements for such a system to produce operational benefits are real-time processing as well as high efficiency in terms of detection and false alarm rate. These are serious issues since the system must deal with a large number of objects and categories to be recognized (small vehicles, armored vehicles, planes, buildings, etc.). Statistical learning based algorithms are promising candidates to meet these requirements when using selected discriminant features and real-time implementation. This paper proposes a new decision architecture benefiting from recent advances in machine learning by using an effective method for level set estimation. While building decision function, the proposed approach performs variable selection based on a discriminative criterion. Moreover, the use of level set makes it possible to manage rejection of unknown or ambiguous objects thus preserving the false alarm rate. Experimental evidences reported on real world infrared images demonstrate the validity of our approach.

  15. Application of advanced machine learning methods on resting-state fMRI network for identification of mild cognitive impairment and Alzheimer's disease.

    PubMed

    Khazaee, Ali; Ebrahimzadeh, Ata; Babajani-Feremi, Abbas

    2016-09-01

    The study of brain networks by resting-state functional magnetic resonance imaging (rs-fMRI) is a promising method for identifying patients with dementia from healthy controls (HC). Using graph theory, different aspects of the brain network can be efficiently characterized by calculating measures of integration and segregation. In this study, we combined a graph theoretical approach with advanced machine learning methods to study the brain network in 89 patients with mild cognitive impairment (MCI), 34 patients with Alzheimer's disease (AD), and 45 age-matched HC. The rs-fMRI connectivity matrix was constructed using a brain parcellation based on a 264 putative functional areas. Using the optimal features extracted from the graph measures, we were able to accurately classify three groups (i.e., HC, MCI, and AD) with accuracy of 88.4 %. We also investigated performance of our proposed method for a binary classification of a group (e.g., MCI) from two other groups (e.g., HC and AD). The classification accuracies for identifying HC from AD and MCI, AD from HC and MCI, and MCI from HC and AD, were 87.3, 97.5, and 72.0 %, respectively. In addition, results based on the parcellation of 264 regions were compared to that of the automated anatomical labeling atlas (AAL), consisted of 90 regions. The accuracy of classification of three groups using AAL was degraded to 83.2 %. Our results show that combining the graph measures with the machine learning approach, on the basis of the rs-fMRI connectivity analysis, may assist in diagnosis of AD and MCI. PMID:26363784

  16. Energy landscapes for a machine learning application to series data

    NASA Astrophysics Data System (ADS)

    Ballard, Andrew J.; Stevenson, Jacob D.; Das, Ritankar; Wales, David J.

    2016-03-01

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in terms of distributions of local minima and their properties.

  17. Recognition of printed Arabic text using machine learning

    NASA Astrophysics Data System (ADS)

    Amin, Adnan

    1998-04-01

    Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.

  18. Acceleration of saddle-point searches with machine learning.

    PubMed

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community. PMID:27544086

  19. Combining data mining and machine learning for effective user profiling

    SciTech Connect

    Fawcett, T.; Provost, F.

    1996-12-31

    This paper describes the automatic design of methods for detecting fraudulent behavior. Much of the design is accomplished using a series of machine learning methods. In particular, we combine data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior. Specifically, we use a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls. These indicators are used to create profilers, which then serve as features to a system that combines evidence from multiple profilers to generate high-confidence alarms. Experiments indicate that this automatic approach performs nearly as well as the best hand-tuned methods for detecting fraud.

  20. Research on knowledge representation, machine learning, and knowledge acquisition

    NASA Technical Reports Server (NTRS)

    Buchanan, Bruce G.

    1987-01-01

    Research in knowledge representation, machine learning, and knowledge acquisition performed at Knowledge Systems Lab. is summarized. The major goal of the research was to develop flexible, effective methods for representing the qualitative knowledge necessary for solving large problems that require symbolic reasoning as well as numerical computation. The research focused on integrating different representation methods to describe different kinds of knowledge more effectively than any one method can alone. In particular, emphasis was placed on representing and using spatial information about three dimensional objects and constraints on the arrangement of these objects in space. Another major theme is the development of robust machine learning programs that can be integrated with a variety of intelligent systems. To achieve this goal, learning methods were designed, implemented and experimented within several different problem solving environments.

  1. Machine learning techniques for fault isolation and sensor placement

    NASA Technical Reports Server (NTRS)

    Carnes, James R.; Fisher, Douglas H.

    1993-01-01

    Fault isolation and sensor placement are vital for monitoring and diagnosis. A sensor conveys information about a system's state that guides troubleshooting if problems arise. We are using machine learning methods to uncover behavioral patterns over snapshots of system simulations that will aid fault isolation and sensor placement, with an eye towards minimality, fault coverage, and noise tolerance.

  2. Machine Learning of Maritime Fog Forecast Rules.

    NASA Astrophysics Data System (ADS)

    Tag, Paul M.; Peak, James E.

    1996-05-01

    In recent years, the field of artificial intelligence has contributed significantly to the science of meteorology, most notably in the now familiar form of expert systems. Expert systems have focused on rules or heuristics by establishing, in computer code, the reasoning process of a weather forecaster predicting, for example, thunderstorms or fog. In addition to the years of effort that goes into developing such a knowledge base is the time-consuming task of extracting such knowledge and experience from experts. In this paper, the induction of rules directly from meteorological data is explored-a process called machine learning. A commercial machine learning program called C4.5, is applied to a meteorological problem, forecasting maritime fog, for which a reliable expert system has been previously developed. Two detasets are used: 1) weather ship observations originally used for testing and evaluating the expert system, and 2) buoy measurements taken off the coast of California. For both datasets, the rules produced by C4.5 are reasonable and make physical sense, thus demonstrating that an objective induction approach can reveal physical processes directly from data. For the ship database, the machine-generated rules are not as accurate as those from the expert system but are still significantly better than persistence forecasts. For the buoy data, the forecast accuracies are very high, but only slightly superior to persistence. The results indicate that the machine learning approach is a viable tool for developing meteorological expertise, but only when applied to reliable data with sufficient cases of known outcome. In those instances when such databases are available, the use of machine learning can provide useful insight that otherwise might take considerable human analysis to produce.

  3. Many-body physics via machine learning

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-Francois; von Lilienfeld, O. Anatole; Millis, Andrew J.

    We demonstrate a method for the use of machine learning (ML) to solve the equations of many-body physics, which are functional equations linking a bare to an interacting Green's function (or self-energy) offering transferable power of prediction for physical quantities for both the forward and the reverse engineering problem of materials. Functions are represented by coefficients in an orthogonal polynomial expansion and kernel ridge regression is used. The method is demonstrated using as an example a database built from Dynamical Mean Field theory (DMFT) calculations on the three dimensional Hubbard model. We discuss the extension to a database for real materials. We also discuss some new area of investigation concerning high throughput predictions for real materials by offering a perspective of how our scheme is general enough for applications to other problems involving the inversion of integral equations from the integrated knowledge such as the analytical continuation of the Green's function and the reconstruction of lattice structures from X-ray spectra. Office of Science of the U.S. Department of Energy under SubContract DOE No. 3F-3138 and FG-ER04169.

  4. Outsmarting neural networks: an alternative paradigm for machine learning

    SciTech Connect

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  5. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution

    PubMed Central

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis. PMID:26958271

  6. Committee of machine learning predictors of hydrological models uncertainty

    NASA Astrophysics Data System (ADS)

    Kayastha, Nagendra; Solomatine, Dimitri

    2014-05-01

    In prediction of uncertainty based on machine learning methods, the results of various sampling schemes namely, Monte Carlo sampling (MCS), generalized likelihood uncertainty estimation (GLUE), Markov chain Monte Carlo (MCMC), shuffled complex evolution metropolis algorithm (SCEMUA), differential evolution adaptive metropolis (DREAM), particle swarm optimization (PSO) and adaptive cluster covering (ACCO)[1] used to build a predictive models. These models predict the uncertainty (quantiles of pdf) of a deterministic output from hydrological model [2]. Inputs to these models are the specially identified representative variables (past events precipitation and flows). The trained machine learning models are then employed to predict the model output uncertainty which is specific for the new input data. For each sampling scheme three machine learning methods namely, artificial neural networks, model tree, locally weighted regression are applied to predict output uncertainties. The problem here is that different sampling algorithms result in different data sets used to train different machine learning models which leads to several models (21 predictive uncertainty models). There is no clear evidence which model is the best since there is no basis for comparison. A solution could be to form a committee of all models and to sue a dynamic averaging scheme to generate the final output [3]. This approach is applied to estimate uncertainty of streamflows simulation from a conceptual hydrological model HBV in the Nzoia catchment in Kenya. [1] N. Kayastha, D. L. Shrestha and D. P. Solomatine. Experiments with several methods of parameter uncertainty estimation in hydrological modeling. Proc. 9th Intern. Conf. on Hydroinformatics, Tianjin, China, September 2010. [2] D. L. Shrestha, N. Kayastha, and D. P. Solomatine, and R. Price. Encapsulation of parameteric uncertainty statistics by various predictive machine learning models: MLUE method, Journal of Hydroinformatic, in press

  7. Paradigms for Realizing Machine Learning Algorithms.

    PubMed

    Agneeswaran, Vijay Srinivas; Tonpay, Pranay; Tiwary, Jayati

    2013-12-01

    The article explains the three generations of machine learning algorithms-with all three trying to operate on big data. The first generation tools are SAS, SPSS, etc., while second generation realizations include Mahout and RapidMiner (that work over Hadoop), and the third generation paradigms include Spark and GraphLab, among others. The essence of the article is that for a number of machine learning algorithms, it is important to look beyond the Hadoop's Map-Reduce paradigm in order to make them work on big data. A number of promising contenders have emerged in the third generation that can be exploited to realize deep analytics on big data. PMID:27447253

  8. Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

    PubMed

    Howard, Rebecca; Rattray, Magnus; Prosperi, Mattia; Custovic, Adnan

    2015-07-01

    Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as 'asthma endotypes'. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies. PMID:26143394

  9. AstroML: Python-powered Machine Learning for Astronomy

    NASA Astrophysics Data System (ADS)

    Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.

    2014-01-01

    As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.

  10. Machine Learning and Geometric Technique for SLAM

    NASA Astrophysics Data System (ADS)

    Bernal-Marin, Miguel; Bayro-Corrochano, Eduardo

    This paper describes a new approach for building 3D geometric maps using a laser rangefinder, a stereo camera system and a mathematical system the Conformal Geometric Algebra. The use of a known visual landmarks in the map helps to carry out a good localization of the robot. A machine learning technique is used for recognition of objects in the environment. These landmarks are found using the Viola and Jones algorithm and are represented with their position in the 3D virtual map.

  11. Prototype-based models in machine learning.

    PubMed

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of potentially high-dimensional, complex datasets. We discuss basic schemes of competitive vector quantization as well as the so-called neural gas approach and Kohonen's topology-preserving self-organizing map. Supervised learning in prototype systems is exemplified in terms of learning vector quantization. Most frequently, the familiar Euclidean distance serves as a dissimilarity measure. We present extensions of the framework to nonstandard measures and give an introduction to the use of adaptive distances in relevance learning. PMID:26800334

  12. Dimension Reduction With Extreme Learning Machine.

    PubMed

    Kasun, Liyanaarachchi Lekamalage Chamara; Yang, Yan; Huang, Guang-Bin; Zhang, Zhengyou

    2016-08-01

    Data may often contain noise or irrelevant information, which negatively affect the generalization capability of machine learning algorithms. The objective of dimension reduction algorithms, such as principal component analysis (PCA), non-negative matrix factorization (NMF), random projection (RP), and auto-encoder (AE), is to reduce the noise or irrelevant information of the data. The features of PCA (eigenvectors) and linear AE are not able to represent data as parts (e.g. nose in a face image). On the other hand, NMF and non-linear AE are maimed by slow learning speed and RP only represents a subspace of original data. This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed, and learns the between-class scatter subspace. To this end, this paper investigates a linear and non-linear dimension reduction framework referred to as extreme learning machine AE (ELM-AE) and sparse ELM-AE (SELM-AE). In contrast to tied weight AE, the hidden neurons in ELM-AE and SELM-AE need not be tuned, and their parameters (e.g, input weights in additive neurons) are initialized using orthogonal and sparse random weights, respectively. Experimental results on USPS handwritten digit recognition data set, CIFAR-10 object recognition, and NORB object recognition data set show the efficacy of linear and non-linear ELM-AE and SELM-AE in terms of discriminative capability, sparsity, training time, and normalized mean square error. PMID:27214902

  13. Machine learning optimization of cross docking accuracy.

    PubMed

    Bjerrum, Esben J

    2016-06-01

    Performance of small molecule automated docking programs has conceptually been divided into docking -, scoring -, ranking - and screening power, which focuses on the crystal pose prediction, affinity prediction, ligand ranking and database screening capabilities of the docking program, respectively. Benchmarks show that different docking programs can excel in individual benchmarks which suggests that the scoring function employed by the programs can be optimized for a particular task. Here the scoring function of Smina is re-optimized towards enhancing the docking power using a supervised machine learning approach and a manually curated database of ligands and cross docking receptor pairs. The optimization method does not need associated binding data for the receptor-ligand examples used in the data set and works with small train sets. The re-optimization of the weights for the scoring function results in a similar docking performance with regard to docking power towards a cross docking test set. A ligand decoy based benchmark indicates a better discrimination between poses with high and low RMSD. The reported parameters for Smina are compatible with Autodock Vina and represent ready-to-use alternative parameters for researchers who aim at pose prediction rather than affinity prediction. PMID:27179709

  14. Machine learning: how to get more out of HEP data and the Higgs Boson Machine Learning Challenge

    NASA Astrophysics Data System (ADS)

    Wolter, Marcin

    2015-09-01

    Multivariate techniques using machine learning algorithms have become an integral part in many High Energy Physics (HEP) data analyses. The article shows the gain in physics reach of the physics experiments due to the adaptation of machine learning techniques. Rapid development in the field of machine learning in the last years is a challenge for the HEP community. The open competition for machine learning experts "Higgs Boson Machine Learning Challenge" shows, that the modern techniques developed outside HEP can significantly improve the analysis of data from HEP experiments and improve the sensitivity of searches for new particles and processes.

  15. Relative optical navigation around small bodies via Extreme Learning Machine

    NASA Astrophysics Data System (ADS)

    Law, Andrew M.

    To perform close proximity operations under a low-gravity environment, relative and absolute positions are vital information to the maneuver. Hence navigation is inseparably integrated in space travel. Extreme Learning Machine (ELM) is presented as an optical navigation method around small celestial bodies. Optical Navigation uses visual observation instruments such as a camera to acquire useful data and determine spacecraft position. The required input data for operation is merely a single image strip and a nadir image. ELM is a machine learning Single Layer feed-Forward Network (SLFN), a type of neural network (NN). The algorithm is developed on the predicate that input weights and biases can be randomly assigned and does not require back-propagation. The learned model is the output layer weights which are used to calculate a prediction. Together, Extreme Learning Machine Optical Navigation (ELM OpNav) utilizes optical images and ELM algorithm to train the machine to navigate around a target body. In this thesis the asteroid, Vesta, is the designated celestial body. The trained ELMs estimate the position of the spacecraft during operation with a single data set. The results show the approach is promising and potentially suitable for on-board navigation.

  16. Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆

    PubMed Central

    Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-01-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969

  17. Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods.

    PubMed

    Zang, Qingda; Rotroff, Daniel M; Judson, Richard S

    2013-12-23

    There are thousands of environmental chemicals subject to regulatory decisions for endocrine disrupting potential. The ToxCast and Tox21 programs have tested ∼8200 chemicals in a broad screening panel of in vitro high-throughput screening (HTS) assays for estrogen receptor (ER) agonist and antagonist activity. The present work uses this large data set to develop in silico quantitative structure-activity relationship (QSAR) models using machine learning (ML) methods and a novel approach to manage the imbalanced data distribution. Training compounds from the ToxCast project were categorized as active or inactive (binding or nonbinding) classes based on a composite ER Interaction Score derived from a collection of 13 ER in vitro assays. A total of 1537 chemicals from ToxCast were used to derive and optimize the binary classification models while 5073 additional chemicals from the Tox21 project, evaluated in 2 of the 13 in vitro assays, were used to externally validate the model performance. In order to handle the imbalanced distribution of active and inactive chemicals, we developed a cluster-selection strategy to minimize information loss and increase predictive performance and compared this strategy to three currently popular techniques: cost-sensitive learning, oversampling of the minority class, and undersampling of the majority class. QSAR classification models were built to relate the molecular structures of chemicals to their ER activities using linear discriminant analysis (LDA), classification and regression trees (CART), and support vector machines (SVM) with 51 molecular descriptors from QikProp and 4328 bits of structural fingerprints as explanatory variables. A random forest (RF) feature selection method was employed to extract the structural features most relevant to the ER activity. The best model was obtained using SVM in combination with a subset of descriptors identified from a large set via the RF algorithm, which recognized the active and

  18. Stacking for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-08-01

    We present an analysis of a general machine learning technique called `stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We show how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organizing maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9 per cent and 21 per cent on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4 per cent and 2.5 per cent for the explored metrics and comes at almost no additional computational cost.

  19. Stacking for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-08-01

    We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organising maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9% and 21% on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4% and 2.5% for the explored metrics and comes at almost no additional computational cost.

  20. Protein secondary structure prediction using logic-based machine learning.

    PubMed

    Muggleton, S; King, R D; Sternberg, M J

    1992-10-01

    Many attempts have been made to solve the problem of predicting protein secondary structure from the primary sequence but the best performance results are still disappointing. In this paper, the use of a machine learning algorithm which allows relational descriptions is shown to lead to improved performance. The Inductive Logic Programming computer program, Golem, was applied to learning secondary structure prediction rules for alpha/alpha domain type proteins. The input to the program consisted of 12 non-homologous proteins (1612 residues) of known structure, together with a background knowledge describing the chemical and physical properties of the residues. Golem learned a small set of rules that predict which residues are part of the alpha-helices--based on their positional relationships and chemical and physical properties. The rules were tested on four independent non-homologous proteins (416 residues) giving an accuracy of 81% (+/- 2%). This is an improvement, on identical data, over the previously reported result of 73% by King and Sternberg (1990, J. Mol. Biol., 216, 441-457) using the machine learning program PROMIS, and of 72% using the standard Garnier-Osguthorpe-Robson method. The best previously reported result in the literature for the alpha/alpha domain type is 76%, achieved using a neural net approach. Machine learning also has the advantage over neural network and statistical methods in producing more understandable results. PMID:1480619

  1. Finding new perovskite halides via machine learning

    DOE PAGESBeta

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-26

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach toward rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning, henceforth referred to as ML) via building a support vectormore » machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br, or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 185 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor, and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. As a result, the trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.« less

  2. Finding New Perovskite Halides via Machine learning

    NASA Astrophysics Data System (ADS)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  3. In Silico Calculation of Infinite Dilution Activity Coefficients of Molecular Solutes in Ionic Liquids: Critical Review of Current Methods and New Models Based on Three Machine Learning Algorithms.

    PubMed

    Paduszyński, Kamil

    2016-08-22

    The aim of the paper is to address all the disadvantages of currently available models for calculating infinite dilution activity coefficients (γ(∞)) of molecular solutes in ionic liquids (ILs)-a relevant property from the point of view of many applications of ILs, particularly in separations. Three new models are proposed, each of them based on distinct machine learning algorithm: stepwise multiple linear regression (SWMLR), feed-forward artificial neural network (FFANN), and least-squares support vector machine (LSSVM). The models were established based on the most comprehensive γ(∞) data bank reported so far (>34 000 data points for 188 ILs and 128 solutes). Following the paper published previously [J. Chem. Inf. Model 2014, 54, 1311-1324], the ILs were treated in terms of group contributions, whereas the Abraham solvation parameters were used to quantify an impact of solute structure. Temperature is also included in the input data of the models so that they can be utilized to obtain temperature-dependent data and thus related thermodynamic functions. Both internal and external validation techniques were applied to assess the statistical significance and explanatory power of the final correlations. A comparative study of the overall performance of the investigated SWMLR/FFANN/LSSVM approaches is presented in terms of root-mean-square error and average absolute relative deviation between calculated and experimental γ(∞), evaluated for different families of ILs and solutes, as well as between calculated and experimental infinite dilution selectivity for separation problems benzene from n-hexane and thiophene from n-heptane. LSSVM is shown to be a method with the lowest values of both training and generalization errors. It is finally demonstrated that the established models exhibit an improved accuracy compared to the state-of-the-art model, namely, temperature-dependent group contribution linear solvation energy relationship, published in 2011 [J. Chem

  4. Data Mining and Machine Learning in Astronomy

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  5. Prediction of drug-induced nephrotoxicity and injury mechanisms with human induced pluripotent stem cell-derived cells and machine learning methods

    PubMed Central

    Kandasamy, Karthikeyan; Chuah, Jacqueline Kai Chin; Su, Ran; Huang, Peng; Eng, Kim Guan; Xiong, Sijing; Li, Yao; Chia, Chun Siang; Loo, Lit-Hsin; Zink, Daniele

    2015-01-01

    The renal proximal tubule is a main target for drug-induced toxicity. The prediction of proximal tubular toxicity during drug development remains difficult. Any in vitro methods based on induced pluripotent stem cell-derived renal cells had not been developed, so far. Here, we developed a rapid 1-step protocol for the differentiation of human induced pluripotent stem cells (hiPSC) into proximal tubular-like cells. These proximal tubular-like cells had a purity of >90% after 8 days of differentiation and could be directly applied for compound screening. The nephrotoxicity prediction performance of the cells was determined by evaluating their responses to 30 compounds. The results were automatically determined using a machine learning algorithm called random forest. In this way, proximal tubular toxicity in humans could be predicted with 99.8% training accuracy and 87.0% test accuracy. Further, we studied the underlying mechanisms of injury and drug-induced cellular pathways in these hiPSC-derived renal cells, and the results were in agreement with human and animal data. Our methods will enable the development of personalized or disease-specific hiPSC-based renal in vitro models for compound screening and nephrotoxicity prediction. PMID:26212763

  6. Machine learning strategies for systems with invariance properties

    NASA Astrophysics Data System (ADS)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  7. Machine learning: An artificial intelligence approach. Vol. II

    SciTech Connect

    Michalski, R.S.; Carbonell, J.G.; Mitchell, T.M.

    1986-01-01

    This book reflects the expansion of machine learning research through presentation of recent advances in the field. The book provides an account of current research directions. Major topics covered include the following: learning concepts and rules from examples; cognitive aspects of learning; learning by analogy; learning by observation and discovery; and an exploration of general aspects of learning.

  8. A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs.

    PubMed

    Kao, Hui-Ju; Huang, Chien-Hsun; Bretaña, Neil Arvin; Lu, Cheng-Tsung; Huang, Kai-Yao; Weng, Shun-Long; Lee, Tzong-Yi

    2015-01-01

    Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/. PMID:26680539

  9. Machine learning strategies for systems with invariance properties

    SciTech Connect

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    2016-01-01

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.

  10. Machine learning strategies for systems with invariance properties

    DOE PAGESBeta

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    2016-05-06

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neuralmore » networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.« less

  11. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    ERIC Educational Resources Information Center

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  12. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment

    PubMed Central

    2011-01-01

    Background Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. Results This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. Conclusions AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of

  13. Normal tissue complication probability (NTCP) modelling using spatial dose metrics and machine learning methods for severe acute oral mucositis resulting from head and neck radiotherapy

    PubMed Central

    Dean, Jamie A; Wong, Kee H; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Nutting, Christopher M; Gulliford, Sarah L

    2016-01-01

    Background and Purpose Severe acute mucositis commonly results from head and neck (chemo)radiotherapy. A predictive model of mucositis could guide clinical decision-making and inform treatment planning. We aimed to generate such a model using spatial dose metrics and machine learning. Material and Methods Predictive models of severe acute mucositis were generated using radiotherapy dose (dose-volume and spatial dose metrics) and clinical data. Penalised logistic regression, support vector classification and random forest classification (RFC) models were generated and compared. Internal validation was performed (with 100-iteration cross-validation), using multiple metrics, including area under the receiver operating characteristic curve (AUC) and calibration slope, to assess performance. Associations between covariates and severe mucositis were explored using the models. Results The dose-volume-based models (standard) performed equally to those incorporating spatial information. Discrimination was similar between models, but the RFCstandard had the best calibration. The mean AUC and calibration slope for this model were 0.71 (s.d.=0.09) and 3.9 (s.d.=2.2), respectively. The volumes of oral cavity receiving intermediate and high doses were associated with severe mucositis. Conclusions The RFCstandard model performance is modest-to-good, but should be improved, and requires external validation. Reducing the volumes of oral cavity receiving intermediate and high doses may reduce mucositis incidence. PMID:27240717

  14. Recognition of explosives fingerprints on objects for courier services using machine learning methods and laser-induced breakdown spectroscopy.

    PubMed

    Moros, J; Serrano, J; Gallego, F J; Macías, J; Laserna, J J

    2013-06-15

    During recent years laser-induced breakdown spectroscopy (LIBS) has been considered one of the techniques with larger ability for trace detection of explosives. However, despite of the high sensitivity exhibited for this application, LIBS suffers from a limited selectivity due to difficulties in assigning the molecular origin of the spectral emissions observed. This circumstance makes the recognition of fingerprints a latent challenging problem. In the present manuscript the sorting of six explosives (chloratite, ammonal, DNT, TNT, RDX and PETN) against a broad list of potential harmless interferents (butter, fuel oil, hand cream, olive oil, …), all of them in the form of fingerprints deposited on the surfaces of objects for courier services, has been carried out. When LIBS information is processed through a multi-stage architecture algorithm built from a suitable combination of 3 learning classifiers, an unknown fingerprint may be labeled into a particular class. Neural network classifiers trained by the Levenberg-Marquardt rule were decided within 3D scatter plots projected onto the subspace of the most useful features extracted from the LIBS spectra. Experimental results demonstrate that the presented algorithm sorts fingerprints according to their hazardous character, although its spectral information is virtually identical in appearance, with rates of false negatives and false positives not beyond of 10%. These reported achievements mean a step forward in the technology readiness level of LIBS for this complex application related to defense, homeland security and force protection. PMID:23618183

  15. Machine learning of user profiles: Representational issues

    SciTech Connect

    Bloedorn, E.; Mani, I.; MacMillan, T.R.

    1996-12-31

    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (ER) problem.

  16. Estimation of octanol/water partition coefficient and aqueous solubility of environmental chemicals using molecular fingerprints and machine learning methods

    EPA Science Inventory

    Octanol/water partition coefficient (logP) and aqueous solubility (logS) are two important parameters in pharmacology and toxicology studies, and experimental measurements are usually time-consuming and expensive. In the present research, novel methods are presented for the estim...

  17. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

    2015-05-01

    The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels

  18. Mining the Kepler Data using Machine Learning

    NASA Astrophysics Data System (ADS)

    Walkowicz, Lucianne; Howe, A. R.; Nayar, R.; Turner, E. L.; Scargle, J.; Meadows, V.; Zee, A.

    2014-01-01

    Kepler's high cadence and incredible precision has provided an unprecedented view into stars and their planetary companions, revealing both expected and novel phenomena and systems. Due to the large number of Kepler lightcurves, the discovery of novel phenomena in particular has often been serendipitous in the course of searching for known forms of variability (for example, the discovery of the doubly pulsating elliptical binary KOI-54, originally identified by the transiting planet search pipeline). In this talk, we discuss progress on mining the Kepler data through both supervised and unsupervised machine learning, intended to both systematically search the Kepler lightcurves for rare or anomalous variability, and to create a variability catalog for community use. Mining the dataset in this way also allows for a quantitative identification of anomalous variability, and so may also be used as a signal-agnostic form of optical SETI. As the Kepler data are exceptionally rich, they provide an interesting counterpoint to machine learning efforts typically performed on sparser and/or noisier survey data, and will inform similar characterization carried out on future survey datasets.

  19. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. PMID:26829605

  20. Measure Transformer Semantics for Bayesian Machine Learning

    NASA Astrophysics Data System (ADS)

    Borgström, Johannes; Gordon, Andrew D.; Greenberg, Michael; Margetson, James; van Gael, Jurgen

    The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.

  1. Predicting Methylphenidate Response in ADHD Using Machine Learning Approaches

    PubMed Central

    Kim, Jae-Won; Sharma, Vinod

    2015-01-01

    Background: There are no objective, biological markers that can robustly predict methylphenidate response in attention deficit hyperactivity disorder. This study aimed to examine whether applying machine learning approaches to pretreatment demographic, clinical questionnaire, environmental, neuropsychological, neuroimaging, and genetic information can predict therapeutic response following methylphenidate administration. Methods: The present study included 83 attention deficit hyperactivity disorder youth. At baseline, parents completed the ADHD Rating Scale-IV and Disruptive Behavior Disorder rating scale, and participants undertook the continuous performance test, Stroop color word test, and resting-state functional MRI scans. The dopamine transporter gene, dopamine D4 receptor gene, alpha-2A adrenergic receptor gene (ADRA2A) and norepinephrine transporter gene polymorphisms, and blood lead and urine cotinine levels were also measured. The participants were enrolled in an 8-week, open-label trial of methylphenidate. Four different machine learning algorithms were used for data analysis. Results: Support vector machine classification accuracy was 84.6% (area under receiver operating characteristic curve 0.84) for predicting methylphenidate response. The age, weight, ADRA2A MspI and DraI polymorphisms, lead level, Stroop color word test performance, and oppositional symptoms of Disruptive Behavior Disorder rating scale were identified as the most differentiating subset of features. Conclusions: Our results provide preliminary support to the translational development of support vector machine as an informative method that can assist in predicting treatment response in attention deficit hyperactivity disorder, though further work is required to provide enhanced levels of classification performance. PMID:25964505

  2. TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries.

    PubMed

    Chang, Yung-Chun; Dai, Hong-Jie; Wu, Johnny Chi-Yang; Chen, Jian-Ming; Tsai, Richard Tzong-Han; Hsu, Wen-Lian

    2013-12-01

    Patient discharge summaries provide detailed medical information about individuals who have been hospitalized. To make a precise and legitimate assessment of the abundant data, a proper time layout of the sequence of relevant events should be compiled and used to drive a patient-specific timeline, which could further assist medical personnel in making clinical decisions. The process of identifying the chronological order of entities is called temporal relation extraction. In this paper, we propose a hybrid method to identify appropriate temporal links between a pair of entities. The method combines two approaches: one is rule-based and the other is based on the maximum entropy model. We develop an integration algorithm to fuse the results of the two approaches. All rules and the integration algorithm are formally stated so that one can easily reproduce the system and results. To optimize the system's configuration, we used the 2012 i2b2 challenge TLINK track dataset and applied threefold cross validation to the training set. Then, we evaluated its performance on the training and test datasets. The experiment results show that the proposed TEMPTING (TEMPoral relaTion extractING) system (ranked seventh) achieved an F-score of 0.563, which was at least 30% better than that of the baseline system, which randomly selects TLINK candidates from all pairs and assigns the TLINK types. The TEMPTING system using the hybrid method also outperformed the stage-based TEMPTING system. Its F-scores were 3.51% and 0.97% better than those of the stage-based system on the training set and test set, respectively. PMID:24060600

  3. A duct mapping method using least squares support vector machines

    NASA Astrophysics Data System (ADS)

    Douvenot, RéMi; Fabbro, Vincent; Gerstoft, Peter; Bourlier, Christophe; Saillard, Joseph

    2008-12-01

    This paper introduces a "refractivity from clutter" (RFC) approach with an inversion method based on a pregenerated database. The RFC method exploits the information contained in the radar sea clutter return to estimate the refractive index profile. Whereas initial efforts are based on algorithms giving a good accuracy involving high computational needs, the present method is based on a learning machine algorithm in order to obtain a real-time system. This paper shows the feasibility of a RFC technique based on the least squares support vector machine inversion method by comparing it to a genetic algorithm on simulated and noise-free data, at 1 and 5 GHz. These data are simulated in the presence of ideal trilinear surface-based ducts. The learning machine is based on a pregenerated database computed using Latin hypercube sampling to improve the efficiency of the learning. The results show that little accuracy is lost compared to a genetic algorithm approach. The computational time of a genetic algorithm is very high, whereas the learning machine approach is real time. The advantage of a real-time RFC system is that it could work on several azimuths in near real time.

  4. Online Sequential Extreme Learning Machine With Kernels.

    PubMed

    Scardapane, Simone; Comminiello, Danilo; Scarpiniti, Michele; Uncini, Aurelio

    2015-09-01

    The extreme learning machine (ELM) was recently proposed as a unifying framework for different families of learning algorithms. The classical ELM model consists of a linear combination of a fixed number of nonlinear expansions of the input vector. Learning in ELM is hence equivalent to finding the optimal weights that minimize the error on a dataset. The update works in batch mode, either with explicit feature mappings or with implicit mappings defined by kernels. Although an online version has been proposed for the former, no work has been done up to this point for the latter, and whether an efficient learning algorithm for online kernel-based ELM exists remains an open problem. By explicating some connections between nonlinear adaptive filtering and ELM theory, in this brief, we present an algorithm for this task. In particular, we propose a straightforward extension of the well-known kernel recursive least-squares, belonging to the kernel adaptive filtering (KAF) family, to the ELM framework. We call the resulting algorithm the kernel online sequential ELM (KOS-ELM). Moreover, we consider two different criteria used in the KAF field to obtain sparse filters and extend them to our context. We show that KOS-ELM, with their integration, can result in a highly efficient algorithm, both in terms of obtained generalization error and training time. Empirical evaluations demonstrate interesting results on some benchmarking datasets. PMID:25561597

  5. Introduction to machine learning: k-nearest neighbors

    PubMed Central

    2016-01-01

    Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance. PMID:27386492

  6. Introduction to machine learning: k-nearest neighbors.

    PubMed

    Zhang, Zhongheng

    2016-06-01

    Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance. PMID:27386492

  7. Machine Shop I. Learning Activity Packets (LAPs). Section D--Power Saws and Drilling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This document contains two learning activity packets (LAPs) for the "power saws and drilling machines" instructional area of a Machine Shop I course. The two LAPs cover the following topics: power saws and drill press. Each LAP contains a cover sheet that describes its purpose, an introduction, and the tasks included in the LAP; learning steps…

  8. Learning Activity Packets for Milling Machines. Unit I--Introduction to Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to identify parts and attachments of vertical and horizontal milling machines, identify work-holding devices, state safety rules, and…

  9. Machine learning classification of SDSS transient survey images

    NASA Astrophysics Data System (ADS)

    du Buisson, L.; Sivanandam, N.; Bassett, Bruce A.; Smith, M.

    2015-12-01

    We show that multiple machine learning algorithms can match human performance in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS) supernova survey into real objects and artefacts. This is a first step in any transient science pipeline and is currently still done by humans, but future surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate fully machine-enabled solutions. Using features trained from eigenimage analysis (principal component analysis, PCA) of single-epoch g, r and i difference images, we can reach a completeness (recall) of 96 per cent, while only incorrectly classifying at most 18 per cent of artefacts as real objects, corresponding to a precision (purity) of 84 per cent. In general, random forests performed best, followed by the k-nearest neighbour and the SkyNet artificial neural net algorithms, compared to other methods such as naive Bayes and kernel support vector machine. Our results show that PCA-based machine learning can match human success levels and can naturally be extended by including multiple epochs of data, transient colours and host galaxy information which should allow for significant further improvements, especially at low signal-to-noise.

  10. Multivariate Mapping of Environmental Data Using Extreme Learning Machines

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2014-05-01

    In most real cases environmental data are multivariate, highly variable at several spatio-temporal scales, and are generated by nonlinear and complex phenomena. Mapping - spatial predictions of such data, is a challenging problem. Machine learning algorithms, being universal nonlinear tools, have demonstrated their efficiency in modelling of environmental spatial and space-time data (Kanevski et al. 2009). Recently, a new approach in machine learning - Extreme Learning Machine (ELM), has gained a great popularity. ELM is a fast and powerful approach being a part of the machine learning algorithm category. Developed by G.-B. Huang et al. (2006), it follows the structure of a multilayer perceptron (MLP) with one single-hidden layer feedforward neural networks (SLFNs). The learning step of classical artificial neural networks, like MLP, deals with the optimization of weights and biases by using gradient-based learning algorithm (e.g. back-propagation algorithm). Opposed to this optimization phase, which can fall into local minima, ELM generates randomly the weights between the input layer and the hidden layer and also the biases in the hidden layer. By this initialization, it optimizes just the weight vector between the hidden layer and the output layer in a single way. The main advantage of this algorithm is the speed of the learning step. In a theoretical context and by growing the number of hidden nodes, the algorithm can learn any set of training data with zero error. To avoid overfitting, cross-validation method or "true validation" (by randomly splitting data into training, validation and testing subsets) are recommended in order to find an optimal number of neurons. With its universal property and solid theoretical basis, ELM is a good machine learning algorithm which can push the field forward. The present research deals with an extension of ELM to multivariate output modelling and application of ELM to the real data case study - pollution of the sediments in

  11. Trends in extreme learning machines: a review.

    PubMed

    Huang, Gao; Huang, Guang-Bin; Song, Shiji; You, Keyou

    2015-01-01

    Extreme learning machine (ELM) has gained increasing interest from various research fields recently. In this review, we aim to report the current state of the theoretical research and practical advances on this subject. We first give an overview of ELM from the theoretical perspective, including the interpolation theory, universal approximation capability, and generalization ability. Then we focus on the various improvements made to ELM which further improve its stability, sparsity and accuracy under general or specific conditions. Apart from classification and regression, ELM has recently been extended for clustering, feature selection, representational learning and many other learning tasks. These newly emerging algorithms greatly expand the applications of ELM. From implementation aspect, hardware implementation and parallel computation techniques have substantially sped up the training of ELM, making it feasible for big data processing and real-time reasoning. Due to its remarkable efficiency, simplicity, and impressive generalization performance, ELM have been applied in a variety of domains, such as biomedical engineering, computer vision, system identification, and control and robotics. In this review, we try to provide a comprehensive view of these advances in ELM together with its future perspectives. PMID:25462632

  12. Learning by Design: Good Video Games as Learning Machines

    ERIC Educational Resources Information Center

    Gee, James Paul

    2005-01-01

    This article asks how good video and computer game designers manage to get new players to learn long, complex and difficult games. The short answer is that designers of good games have hit on excellent methods for getting people to learn and to enjoy learning. The longer answer is more complex. Integral to this answer are the good principles of…

  13. Machine learning and genome annotation: a match meant to be?

    PubMed Central

    2013-01-01

    By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE. PMID:23731483

  14. Large-Scale Machine Learning for Classification and Search

    ERIC Educational Resources Information Center

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  15. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    ERIC Educational Resources Information Center

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  16. Machine Learning for Flood Prediction in Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.

    2015-12-01

    With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.

  17. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-10-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.

  18. Refining fuzzy logic controllers with machine learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1994-01-01

    In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.

  19. Machine learning research 1989-90

    NASA Technical Reports Server (NTRS)

    Porter, Bruce W.; Souther, Arthur

    1990-01-01

    Multifunctional knowledge bases offer a significant advance in artificial intelligence because they can support numerous expert tasks within a domain. As a result they amortize the costs of building a knowledge base over multiple expert systems and they reduce the brittleness of each system. Due to the inevitable size and complexity of multifunctional knowledge bases, their construction and maintenance require knowledge engineering and acquisition tools that can automatically identify interactions between new and existing knowledge. Furthermore, their use requires software for accessing those portions of the knowledge base that coherently answer questions. Considerable progress was made in developing software for building and accessing multifunctional knowledge bases. A language was developed for representing knowledge, along with software tools for editing and displaying knowledge, a machine learning program for integrating new information into existing knowledge, and a question answering system for accessing the knowledge base.

  20. Machine Learning Approaches: From Theory to Application in Schizophrenia

    PubMed Central

    Veronese, Elisa; Castellani, Umberto; Peruzzo, Denis; Bellani, Marcella; Brambilla, Paolo

    2013-01-01

    In recent years, machine learning approaches have been successfully applied for analysis of neuroimaging data, to help in the context of disease diagnosis. We provide, in this paper, an overview of recent support vector machine-based methods developed and applied in psychiatric neuroimaging for the investigation of schizophrenia. In particular, we focus on the algorithms implemented by our group, which have been applied to classify subjects affected by schizophrenia and healthy controls, comparing them in terms of accuracy results with other recently published studies. First we give a description of the basic terminology used in pattern recognition and machine learning. Then we separately summarize and explain each study, highlighting the main features that characterize each method. Finally, as an outcome of the comparison of the results obtained applying the described different techniques, conclusions are drawn in order to understand how much automatic classification approaches can be considered a useful tool in understanding the biological underpinnings of schizophrenia. We then conclude by discussing the main implications achievable by the application of these methods into clinical practice. PMID:24489603

  1. Tracking medical genetic literature through machine learning.

    PubMed

    Bornstein, Aaron T; McLoughlin, Matthew H; Aguilar, Jesus; Wong, Wendy S W; Solomon, Benjamin D

    2016-08-01

    There has been remarkable progress in identifying the causes of genetic conditions as well as understanding how changes in specific genes cause disease. Though difficult (and often superficial) to parse, an interesting tension involves emphasis on basic research aimed to dissect normal and abnormal biology versus more clearly clinical and therapeutic investigations. To examine one facet of this question and to better understand progress in Mendelian-related research, we developed an algorithm that classifies medical literature into three categories (Basic, Clinical, and Management) and conducted a retrospective analysis. We built a supervised machine learning classification model using the Azure Machine Learning (ML) Platform and analyzed the literature (1970-2014) from NCBI's Entrez Gene2Pubmed Database (http://www.ncbi.nlm.nih.gov/gene) using genes from the NHGRI's Clinical Genomics Database (http://research.nhgri.nih.gov/CGD/). We applied our model to 376,738 articles: 288,639 (76.6%) were classified as Basic, 54,178 (14.4%) as Clinical, and 24,569 (6.5%) as Management. The average classification accuracy was 92.2%. The rate of Clinical publication was significantly higher than Basic or Management. The rate of publication of article types differed significantly when divided into key eras: Human Genome Project (HGP) planning phase (1984-1990); HGP launch (1990) to publication (2001); following HGP completion to the "Next Generation" advent (2009); the era following 2009. In conclusion, in addition to the findings regarding the pace and focus of genetic progress, our algorithm produced a database that can be used in a variety of contexts including automating the identification of management-related literature. PMID:27268407

  2. Geological applications of machine learning on hyperspectral remote sensing data

    NASA Astrophysics Data System (ADS)

    Tse, C. H.; Li, Yi-liang; Lam, Edmund Y.

    2015-02-01

    The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.

  3. Machine Learning of Protein Interactions in Fungal Secretory Pathways.

    PubMed

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja; Rousu, Juho

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities. PMID:27441920

  4. Machine Learning of Protein Interactions in Fungal Secretory Pathways

    PubMed Central

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker’s yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities. PMID:27441920

  5. Position Paper: Applying Machine Learning to Software Analysis to Achieve Trusted, Repeatable Scientific Computing

    SciTech Connect

    Prowell, Stacy J; Symons, Christopher T

    2015-01-01

    Producing trusted results from high-performance codes is essential for policy and has significant economic impact. We propose combining rigorous analytical methods with machine learning techniques to achieve the goal of repeatable, trustworthy scientific computing.

  6. Software Development and Testing for Machine Learning Studies

    NASA Astrophysics Data System (ADS)

    Makino, Takaki; Aihara, Kazuyuki

    It is not easy to test software used in studies of machine learning with statistical frameworks. In particular, software for randomized algorithms such as Monte Carlo methods compromises testing process. Combined with underestimation of the importance of software testing in academic fields, many software programs without appropriate validation are being used and causing problems. In this article, we discuss the importance of writing test codes for software used in research, and present a practical way for testing, focusing on programs using Monte Carlo methods.

  7. Quantum learning and universal quantum matching machine

    NASA Astrophysics Data System (ADS)

    Sasaki, Masahide; Carlini, Alberto

    2002-08-01

    Suppose that three kinds of quantum systems are given in some unknown states |f>⊗N, |g1>⊗K, and |g2>⊗K, and we want to decide which template state |g1> or |g2>, each representing the feature of the pattern class C1 or C2, respectively, is closest to the input feature state |f>. This is an extension of the pattern matching problem into the quantum domain. Assuming that these states are known a priori to belong to a certain parametric family of pure qubit systems, we derive two kinds of matching strategies. The first one is a semiclassical strategy that is obtained by the natural extension of conventional matching strategies and consists of a two-stage procedure: identification (estimation) of the unknown template states to design the classifier (learning process to train the classifier) and classification of the input system into the appropriate pattern class based on the estimated results. The other is a fully quantum strategy without any intermediate measurement, which we might call as the universal quantum matching machine. We present the Bayes optimal solutions for both strategies in the case of K=1, showing that there certainly exists a fully quantum matching procedure that is strictly superior to the straightforward semiclassical extension of the conventional matching strategy based on the learning process.

  8. Estimation of alpine skier posture using machine learning techniques.

    PubMed

    Nemec, Bojan; Petrič, Tadej; Babič, Jan; Supej, Matej

    2014-01-01

    High precision Global Navigation Satellite System (GNSS) measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier's neck. A key issue is how to estimate other more relevant parameters of the skier's body, like the center of mass (COM) and ski trajectories. Previously, these parameters were estimated by modeling the skier's body with an inverted-pendulum model that oversimplified the skier's body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier's body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing. PMID:25313492

  9. Estimation of Alpine Skier Posture Using Machine Learning Techniques

    PubMed Central

    Nemec, Bojan; Petrič, Tadej; Babič, Jan; Supej, Matej

    2014-01-01

    High precision Global Navigation Satellite System (GNSS) measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier's neck. A key issue is how to estimate other more relevant parameters of the skier's body, like the center of mass (COM) and ski trajectories. Previously, these parameters were estimated by modeling the skier's body with an inverted-pendulum model that oversimplified the skier's body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier's body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing. PMID:25313492

  10. Machine-z: rapid machine-learned redshift indicator for Swift gamma-ray bursts

    NASA Astrophysics Data System (ADS)

    Ukwatta, T. N.; Woźniak, P. R.; Gehrels, N.

    2016-06-01

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce `machine-z', a redshift prediction algorithm and a `high-z' classifier for Swift GRBs based on machine learning. Our method relies exclusively on canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ˜100 per cent recall. The most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.

  11. Automatic pathology classification using a single feature machine learning support - vector machines

    NASA Astrophysics Data System (ADS)

    Yepes-Calderon, Fernando; Pedregosa, Fabian; Thirion, Bertrand; Wang, Yalin; Lepore, Natasha

    2014-03-01

    Magnetic Resonance Imaging (MRI) has been gaining popularity in the clinic in recent years as a safe in-vivo imaging technique. As a result, large troves of data are being gathered and stored daily that may be used as clinical training sets in hospitals. While numerous machine learning (ML) algorithms have been implemented for Alzheimer's disease classification, their outputs are usually difficult to interpret in the clinical setting. Here, we propose a simple method of rapid diagnostic classification for the clinic using Support Vector Machines (SVM)1 and easy to obtain geometrical measurements that, together with a cortical and sub-cortical brain parcellation, create a robust framework capable of automatic diagnosis with high accuracy. On a significantly large imaging dataset consisting of over 800 subjects taken from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, classification-success indexes of up to 99.2% are reached with a single measurement.

  12. Knowledge discovery via machine learning for neurodegenerative disease researchers.

    PubMed

    Ozyurt, I Burak; Brown, Gregory G

    2009-01-01

    Ever-increasing size of the biomedical literature makes more precise information retrieval and tapping into implicit knowledge in scientific literature a necessity. In this chapter, first, three new variants of the expectation-maximization (EM) method for semisupervised document classification (Machine Learning 39:103-134, 2000) are introduced to refine biomedical literature meta-searches. The retrieval performance of a multi-mixture per class EM variant with Agglomerative Information Bottleneck clustering (Slonim and Tishby (1999) Agglomerative information bottleneck. In Proceedings of NIPS-12) using Davies-Bouldin cluster validity index (IEEE Transactions on Pattern Analysis and Machine Intelligence 1:224-227, 1979), rivaled the state-of-the-art transductive support vector machines (TSVM) (Joachims (1999) Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning (ICML)). Moreover, the multi-mixture per class EM variant refined search results more quickly with more than one order of magnitude improvement in execution time compared with TSVM. A second tool, CRFNER, uses conditional random fields (Lafferty et al. (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-2001) to recognize 15 types of named entities from schizophrenia abstracts outperforming ABNER (Settles (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004 International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA)) in biological named entity recognition and reaching F(1) performance of 82.5% on the second set of named entities. PMID:19623491

  13. A new machine learning classifier for high dimensional healthcare data.

    PubMed

    Padman, Rema; Bai, Xue; Airoldi, Edoardo M

    2007-01-01

    Data sets with many discrete variables and relatively few cases arise in health care, commerce, information security, and many other domains. Learning effective and efficient prediction models from such data sets is a challenging task. In this paper, we propose a new approach that combines Metaheuristic search and Bayesian Networks to learn a graphical Markov Blanket-based classifier from data. The Tabu Search enhanced Markov Blanket (TS/MB) procedure is based on the use of restricted neighborhoods in a general Bayesian Network constrained by the Markov condition, called Markov Blanket Neighborhoods. Computational results from two real world healthcare data sets indicate that the TS/MB procedure converges fast and is able to find a parsimonious model with substantially fewer predictor variables than in the full data set. Furthermore, it has comparable or better prediction performance when compared against several machine learning methods, and provides insight into possible causal relations among the variables. PMID:17911800

  14. Mapping of Estimations and Prediction Intervals Using Extreme Learning Machines

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2015-04-01

    Due to the large amount and complexity of data available nowadays in environmental sciences, we face the need to apply more robust methodology allowing analyses and understanding of the phenomena under study. One particular but very important aspect of this understanding is the reliability of generated prediction models. From the data collection to the prediction map, several sources of error can occur and affect the final result. Theses sources are mainly identified as uncertainty in data (data noise), and uncertainty in the model. Their combination leads to the so-called prediction interval. Quantifying these two categories of uncertainty allows a finer understanding of phenomena under study and a better assessment of the prediction accuracy. The present research deals with a methodology combining a machine learning algorithm (ELM - Extreme Learning Machine) with a bootstrap-based procedure. Developed by G.-B. Huang et al. (2006), ELM is an artificial neural network following the structure of a multilayer perceptron (MLP) with one single hidden layer. Compared to classical MLP, ELM has the ability to learn faster without loss of accuracy, and need only one hyper-parameter to be fitted (that is the number of nodes in the hidden layer). The key steps of the proposed method are as following: sample from the original data a variety of subsets using bootstrapping; from these subsets, train and validate ELM models; and compute residuals. Then, the same procedure is performed a second time with only the squared training residuals. Finally, taking into account the two modeling levels allows developing the mean prediction map, the model uncertainty variance, and the data noise variance. The proposed approach is illustrated using geospatial data. References Efron B., and Tibshirani R. 1986, Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical accuracy, Statistical Science, vol. 1: 54-75. Huang G.-B., Zhu Q.-Y., and Siew C.-K. 2006

  15. Geological Mapping Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Harvey, A. S.; Fotopoulos, G.

    2016-06-01

    Remotely sensed spectral imagery, geophysical (magnetic and gravity), and geodetic (elevation) data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA), which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines) are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  16. DREAM: diabetic retinopathy analysis using machine learning.

    PubMed

    Roychowdhury, Sohini; Koozekanani, Dara D; Parhi, Keshab K

    2014-09-01

    This paper presents a computer-aided screening system (DREAM) that analyzes fundus images with varying illumination and fields of view, and generates a severity grade for diabetic retinopathy (DR) using machine learning. Classifiers such as the Gaussian Mixture model (GMM), k-nearest neighbor (kNN), support vector machine (SVM), and AdaBoost are analyzed for classifying retinopathy lesions from nonlesions. GMM and kNN classifiers are found to be the best classifiers for bright and red lesion classification, respectively. A main contribution of this paper is the reduction in the number of features used for lesion classification by feature ranking using Adaboost where 30 top features are selected out of 78. A novel two-step hierarchical classification approach is proposed where the nonlesions or false positives are rejected in the first step. In the second step, the bright lesions are classified as hard exudates and cotton wool spots, and the red lesions are classified as hemorrhages and micro-aneurysms. This lesion classification problem deals with unbalanced datasets and SVM or combination classifiers derived from SVM using the Dempster-Shafer theory are found to incur more classification error than the GMM and kNN classifiers due to the data imbalance. The DR severity grading system is tested on 1200 images from the publicly available MESSIDOR dataset. The DREAM system achieves 100% sensitivity, 53.16% specificity, and 0.904 AUC, compared to the best reported 96% sensitivity, 51% specificity, and 0.875 AUC, for classifying images as with or without DR. The feature reduction further reduces the average computation time for DR severity per image from 59.54 to 3.46 s. PMID:25192577

  17. A cross-validation scheme for machine learning algorithms in shotgun proteomics

    PubMed Central

    2012-01-01

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting. PMID:23176259

  18. Weka-A Machine Learning Workbench for Data Mining

    NASA Astrophysics Data System (ADS)

    Frank, Eibe; Hall, Mark; Holmes, Geoffrey; Kirkby, Richard; Pfahringer, Bernhard; Witten, Ian H.; Trigg, Len

    The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient interactive graphical user interfaces are provided for data exploration, for setting up large-scale experiments on distributed computing platforms, and for designing configurations for streamed data processing. These interfaces constitute an advanced environment for experimental data mining. The system is written in Java and distributed under the terms of the GNU General Public License.

  19. Ingot slicing machine and method

    NASA Technical Reports Server (NTRS)

    Kuo, Y. S. (Inventor)

    1984-01-01

    An improved method for simultaneously slicing one or a multiplicity of boules of silicon into silicon wafers is described. A plurality of vertical stacks of horizontal saw blades of circular configuration are arranged in juxtaposed coaxial alignment. Each blade is characterized by having a cutting diameter slightly greater than the cutting diameter of the blade arranged immediately above, imparting a simultaneous rotation to the blades.

  20. Predicting increased blood pressure using machine learning.

    PubMed

    Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Gomes, Cristiano Mauro Assis; Soares, Telma de Jesus; Dos Reis, Luciana Araujo; Santos, Joselito

    2014-01-01

    The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R (2) (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R (2) (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power. PMID:24669313

  1. Kernel-based machine learning techniques for infrasound signal classification

    NASA Astrophysics Data System (ADS)

    Tuma, Matthias; Igel, Christian; Mialle, Pierrick

    2014-05-01

    Infrasound monitoring is one of four remote sensing technologies continuously employed by the CTBTO Preparatory Commission. The CTBTO's infrasound network is designed to monitor the Earth for potential evidence of atmospheric or shallow underground nuclear explosions. Upon completion, it will comprise 60 infrasound array stations distributed around the globe, of which 47 were certified in January 2014. Three stages can be identified in CTBTO infrasound data processing: automated processing at the level of single array stations, automated processing at the level of the overall global network, and interactive review by human analysts. At station level, the cross correlation-based PMCC algorithm is used for initial detection of coherent wavefronts. It produces estimates for trace velocity and azimuth of incoming wavefronts, as well as other descriptive features characterizing a signal. Detected arrivals are then categorized into potentially treaty-relevant versus noise-type signals by a rule-based expert system. This corresponds to a binary classification task at the level of station processing. In addition, incoming signals may be grouped according to their travel path in the atmosphere. The present work investigates automatic classification of infrasound arrivals by kernel-based pattern recognition methods. It aims to explore the potential of state-of-the-art machine learning methods vis-a-vis the current rule-based and task-tailored expert system. To this purpose, we first address the compilation of a representative, labeled reference benchmark dataset as a prerequisite for both classifier training and evaluation. Data representation is based on features extracted by the CTBTO's PMCC algorithm. As classifiers, we employ support vector machines (SVMs) in a supervised learning setting. Different SVM kernel functions are used and adapted through different hyperparameter optimization routines. The resulting performance is compared to several baseline classifiers. All

  2. Teaching an Old Log New Tricks with Machine Learning.

    PubMed

    Schnell, Krista; Puri, Colin; Mahler, Paul; Dukatz, Carl

    2014-03-01

    To most people, the log file would not be considered an exciting area in technology today. However, these relatively benign, slowly growing data sources can drive large business transformations when combined with modern-day analytics. Accenture Technology Labs has built a new framework that helps to expand existing vendor solutions to create new methods of gaining insights from these benevolent information springs. This framework provides a systematic and effective machine-learning mechanism to understand, analyze, and visualize heterogeneous log files. These techniques enable an automated approach to analyzing log content in real time, learning relevant behaviors, and creating actionable insights applicable in traditionally reactive situations. Using this approach, companies can now tap into a wealth of knowledge residing in log file data that is currently being collected but underutilized because of its overwhelming variety and volume. By using log files as an important data input into the larger enterprise data supply chain, businesses have the opportunity to enhance their current operational log management solution and generate entirely new business insights-no longer limited to the realm of reactive IT management, but extending from proactive product improvement to defense from attacks. As we will discuss, this solution has immediate relevance in the telecommunications and security industries. However, the most forward-looking companies can take it even further. How? By thinking beyond the log file and applying the same machine-learning framework to other log file use cases (including logistics, social media, and consumer behavior) and any other transactional data source. PMID:27447306

  3. Effective feature selection for image steganalysis using extreme learning machine

    NASA Astrophysics Data System (ADS)

    Feng, Guorui; Zhang, Haiyan; Zhang, Xinpeng

    2014-11-01

    Image steganography delivers secret data by slight modifications of the cover. To detect these data, steganalysis tries to create some features to embody the discrepancy between the cover and steganographic images. Therefore, the urgent problem is how to design an effective classification architecture for given feature vectors extracted from the images. We propose an approach to automatically select effective features based on the well-known JPEG steganographic methods. This approach, referred to as extreme learning machine revisited feature selection (ELM-RFS), can tune input weights in terms of the importance of input features. This idea is derived from cross-validation learning and one-dimensional (1-D) search. While updating input weights, we seek the energy decreasing direction using the leave-one-out (LOO) selection. Furthermore, we optimize the 1-D energy function instead of directly discarding the least significant feature. Since recent Liu features can gain considerable low detection errors compared to a previous JPEG steganalysis, the experimental results demonstrate that the new approach results in less classification error than other classifiers such as SVM, Kodovsky ensemble classifier, direct ELM-LOO learning, kernel ELM, and conventional ELM in Liu features. Furthermore, ELM-RFS achieves a similar performance with a deep Boltzmann machine using less training time.

  4. Machine learning approach for objective inpainting quality assessment

    NASA Astrophysics Data System (ADS)

    Frantc, V. A.; Voronin, V. V.; Marchuk, V. I.; Sherstobitov, A. I.; Agaian, S.; Egiazarian, K.

    2014-05-01

    This paper focuses on a machine learning approach for objective inpainting quality assessment. Inpainting has received a lot of attention in recent years and quality assessment is an important task to evaluate different image reconstruction approaches. Quantitative metrics for successful image inpainting currently do not exist; researchers instead are relying upon qualitative human comparisons in order to evaluate their methodologies and techniques. We present an approach for objective inpainting quality assessment based on natural image statistics and machine learning techniques. Our method is based on observation that when images are properly normalized or transferred to a transform domain, local descriptors can be modeled by some parametric distributions. The shapes of these distributions are different for noninpainted and inpainted images. Approach permits to obtain a feature vector strongly correlated with a subjective image perception by a human visual system. Next, we use a support vector regression learned on assessed by human images to predict perceived quality of inpainted images. We demonstrate how our predicted quality value repeatably correlates with a qualitative opinion in a human observer study.

  5. Machine Learning Based Road Detection from High Resolution Imagery

    NASA Astrophysics Data System (ADS)

    Lv, Ye; Wang, Guofeng; Hu, Xiangyun

    2016-06-01

    At present, remote sensing technology is the best weapon to get information from the earth surface, and it is very useful in geo- information updating and related applications. Extracting road from remote sensing images is one of the biggest demand of rapid city development, therefore, it becomes a hot issue. Roads in high-resolution images are more complex, patterns of roads vary a lot, which becomes obstacles for road extraction. In this paper, a machine learning based strategy is presented. The strategy overall uses the geometry features, radiation features, topology features and texture features. In high resolution remote sensing images, the images cover a great scale of landscape, thus, the speed of extracting roads is slow. So, roads' ROIs are firstly detected by using Houghline detection and buffering method to narrow down the detecting area. As roads in high resolution images are normally in ribbon shape, mean-shift and watershed segmentation methods are used to extract road segments. Then, Real Adaboost supervised machine learning algorithm is used to pick out segments that contain roads' pattern. At last, geometric shape analysis and morphology methods are used to prune and restore the whole roads' area and to detect the centerline of roads.

  6. Machine Learning Approaches to Rare Events Sampling and Estimation

    NASA Astrophysics Data System (ADS)

    Elsheikh, A. H.

    2014-12-01

    Given the severe impacts of rare events, we try to quantitatively answer the following two questions: How can we estimate the probability of a rare event? And what are the factors affecting these probabilities? We utilize machine learning classification methods to define the failure boundary (in the stochastic space) corresponding to a specific threshold of a rare event. The training samples for the classification algorithm are obtained using multilevel splitting and Monte Carlo (MC) simulations. Once the training of the classifier is performed, a full MC simulation can be performed efficiently using the classifier as a reduced order model replacing the full physics simulator.We apply the proposed method on a standard benchmark for CO2 leakage through an abandoned well. In this idealized test case, CO2 is injected into a deep aquifer and then spreads within the aquifer and, upon reaching an abandoned well; it rises to a shallower aquifer. In current study, we try to evaluate the probability of leakage of a pre-defined amount of the injected CO2 given a heavy tailed distribution of the leaky well permeability. We show that machine learning based approaches significantly outperform direct MC and multi-level splitting methods in terms of efficiency and precision. The proposed algorithm's efficiency and reliability enabled us to perform a sensitivity analysis to the different modeling assumptions including the different prior distributions on the probability of CO2 leakage.

  7. Learning Machine, Vietnamese Based Human-Computer Interface.

    ERIC Educational Resources Information Center

    Northwest Regional Educational Lab., Portland, OR.

    The sixth session of IT@EDU98 consisted of seven papers on the topic of the learning machine--Vietnamese based human-computer interface, and was chaired by Phan Viet Hoang (Informatics College, Singapore). "Knowledge Based Approach for English Vietnamese Machine Translation" (Hoang Kiem, Dinh Dien) presents the knowledge base approach, which…

  8. Learn about Physical Science: Simple Machines. [CD-ROM].

    ERIC Educational Resources Information Center

    2000

    This CD-ROM, designed for students in grades K-2, explores the world of simple machines. It allows students to delve into the mechanical world and learn the ways in which simple machines make work easier. Animated demonstrations are provided of the lever, pulley, wheel, screw, wedge, and inclined plane. Activities include practical matching and…

  9. Machine learning challenges in Mars rover traverse science

    NASA Technical Reports Server (NTRS)

    Castano, R.; Judd, M.; Anderson, R. C.; Estlin, T.

    2003-01-01

    The successful implementation of machine learning in autonomous rover traverse science requires addressing challenges that range from the analytical technical realm, to the fuzzy, philosophical domain of entrenched belief systems within scientists and mission managers.

  10. A Machine Learning System for Recognizing Subclasses (Demo)

    SciTech Connect

    Vatsavai, Raju

    2012-01-01

    Thematic information extraction from remote sensing images is a complex task. In this demonstration, we present *Miner machine learning system. In particular, we demonstrate an advanced subclass recognition algorithm that is specifically designed to extract finer classes from aggregate classes.

  11. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and promises

    PubMed Central

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2014-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to misinformed conclusions. To illustrate this concern, the current paper critically evaluates and attempts to reproduce results from two studies (Wall et al., 2012a; Wall et al., 2012b) that claim to drastically reduce time to diagnose autism using machine learning. Our failure to generate comparable findings to those reported by Wall and colleagues using larger and more balanced data underscores several conceptual and methodological problems associated with these studies. We conclude with proposed best-practices when using machine learning in autism research, and highlight some especially promising areas for collaborative work at the intersection of computational and behavioral science. PMID:25294649

  12. Shedding Light on Synergistic Chemical Genetic Connections with Machine Learning.

    PubMed

    Ekins, Sean; Siqueira-Neto, Jair Lage

    2015-12-23

    Machine learning can be used to predict compounds acting synergistically, and this could greatly expand the universe of available potential treatments for diseases that are currently hidden in the dark chemical matter. PMID:27136350

  13. Method for machining steel with diamond tools

    DOEpatents

    Casstevens, J.M.

    1984-01-01

    The present invention is directed to a method for machine optical quality finishes and contour accuracies of workpieces of carbon-containing metals such as steel with diamond tooling. The wear rate of the diamond tooling is significantly reduced by saturating the atmosphere at the interface of the workpiece and the diamond tool with a gaseous hydrocarbon during the machining operation. The presence of the gaseous hydrocarbon effectively eliminates the deterioration of the diamond tool by inhibiting or preventing the conversion of the diamond carbon to graphite carbon at the point of contact between the cutting tool and the workpiece.

  14. Method for machining steel with diamond tools

    DOEpatents

    Casstevens, John M.

    1986-01-01

    The present invention is directed to a method for machining optical quality inishes and contour accuracies of workpieces of carbon-containing metals such as steel with diamond tooling. The wear rate of the diamond tooling is significantly reduced by saturating the atmosphere at the interface of the workpiece and the diamond tool with a gaseous hydrocarbon during the machining operation. The presence of the gaseous hydrocarbon effectively eliminates the deterioration of the diamond tool by inhibiting or preventing the conversion of the diamond carbon to graphite carbon at the point of contact between the cutting tool and the workpiece.

  15. Machine learning on Parkinson's disease? Let's translate into clinical practice.

    PubMed

    Cerasa, Antonio

    2016-06-15

    Machine learning techniques represent the third-generation of clinical neuroimaging studies where the principal interest is not related to describe anatomical changes of a neurological disorder, but to evaluate if a multivariate approach may use these abnormalities to predict the correct classification of previously unseen clinical cohort. In the next few years, Machine learning will revolutionize clinical practice of Parkinson's disease, but enthusiasm should be turned down before removing some important barriers. PMID:26743974

  16. Machine learning for the New York City power grid.

    PubMed

    Rudin, Cynthia; Waltz, David; Anderson, Roger N; Boulanger, Albert; Salleb-Aouissi, Ansaf; Chow, Maggie; Dutta, Haimonti; Gross, Philip N; Huang, Bert; Ierome, Steve; Isaac, Delfina F; Kressner, Arthur; Passonneau, Rebecca J; Radeva, Axinia; Wu, Leon

    2012-02-01

    Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint, terminator, and transformer rankings, 3) feeder Mean Time Between Failure (MTBF) estimates, and 4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or realtime, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City’s electrical grid. PMID:21576741

  17. Protocol for secure quantum machine learning at a distant place

    NASA Astrophysics Data System (ADS)

    Bang, Jeongho; Lee, Seung-Woo; Jeong, Hyunseok

    2015-10-01

    The application of machine learning to quantum information processing has recently attracted keen interest, particularly for the optimization of control parameters in quantum tasks without any pre-programmed knowledge. By adapting the machine learning technique, we present a novel protocol in which an arbitrarily initialized device at a learner's location is taught by a provider located at a distant place. The protocol is designed such that any external learner who attempts to participate in or disrupt the learning process can be prohibited or noticed. We numerically demonstrate that our protocol works faithfully for single-qubit operation devices. A trade-off between the inaccuracy and the learning time is also analyzed.

  18. Skull-Stripping with Machine Learning Deformable Organisms

    PubMed Central

    Prasad, Gautam; Joshi, Anand A.; Feng, Albert; Toga, Arthur W.; Thompson, Paul M.; Terzopoulos, Demetri

    2014-01-01

    Background Segmentation methods for medical images may not generalize well to new data sets or new tasks, hampering their utility. We attempt to remedy these issues using deformable organisms to create an easily customizable segmentation plan. We validate our framework by creating a plan to locate the brain in 3D magnetic resonance images of the head (skull-stripping). New Method Our method borrows ideas from artificial life to govern a set of deformable models. We use control processes such as sensing, proactive planning, reactive behavior, and knowledge representation to segment an image. The image may have landmarks and features specific to that dataset; these may be easily incorporated into the plan. In addition, we use a machine learning method to make our segmentation more accurate. Results Our method had the least Hausdorff distance error, but included slightly less brain voxels (false negatives). It also had the lowest false positive error and performed on par to skull-stripping specific method on other metrics. Comparison with Existing Method(s) We tested our method on 838 T1-weighted images, evaluating results using distance and overlap error metrics based on expert gold standard segmentations. We evaluated the results before and after the learning step to quantify its benefit; we also compare our results to three other widely used methods: BSE, BET, and the Hybrid Watershed algorithm. Conclusions Our framework captures diverse categories of information needed for brain segmentation and will provide a foundation for tackling a wealth of segmentation problems. PMID:25124851

  19. Overlay improvements using a real time machine learning algorithm

    NASA Astrophysics Data System (ADS)

    Schmitt-Weaver, Emil; Kubis, Michael; Henke, Wolfgang; Slotboom, Daan; Hoogenboom, Tom; Mulkens, Jan; Coogans, Martyn; ten Berge, Peter; Verkleij, Dick; van de Mast, Frank

    2014-04-01

    While semiconductor manufacturing is moving towards the 14nm node using immersion lithography, the overlay requirements are tightened to below 5nm. Next to improvements in the immersion scanner platform, enhancements in the overlay optimization and process control are needed to enable these low overlay numbers. Whereas conventional overlay control methods address wafer and lot variation autonomously with wafer pre exposure alignment metrology and post exposure overlay metrology, we see a need to reduce these variations by correlating more of the TWINSCAN system's sensor data directly to the post exposure YieldStar metrology in time. In this paper we will present the results of a study on applying a real time control algorithm based on machine learning technology. Machine learning methods use context and TWINSCAN system sensor data paired with post exposure YieldStar metrology to recognize generic behavior and train the control system to anticipate on this generic behavior. Specific for this study, the data concerns immersion scanner context, sensor data and on-wafer measured overlay data. By making the link between the scanner data and the wafer data we are able to establish a real time relationship. The result is an inline controller that accounts for small changes in scanner hardware performance in time while picking up subtle lot to lot and wafer to wafer deviations introduced by wafer processing.

  20. Machine Learning Assessments of Soil Drying

    NASA Astrophysics Data System (ADS)

    Coopersmith, E. J.; Minsker, B. S.; Wenzel, C.; Gilmore, B. J.

    2011-12-01

    Agricultural activities require the use of heavy equipment and vehicles on unpaved farmlands. When soil conditions are wet, equipment can cause substantial damage, leaving deep ruts. In extreme cases, implements can sink and become mired, causing considerable delays and expense to extricate the equipment. Farm managers, who are often located remotely, cannot assess sites before allocating equipment, causing considerable difficulty in reliably assessing conditions of countless sites with any reliability and frequency. For example, farmers often trace serpentine paths of over one hundred miles each day to assess the overall status of various tracts of land spanning thirty, forty, or fifty miles in each direction. One means of assessing the moisture content of a field lies in the strategic positioning of remotely-monitored in situ sensors. Unfortunately, land owners are often reluctant to place sensors across their properties due to the significant monetary cost and complexity. This work aspires to overcome these limitations by modeling the process of wetting and drying statistically - remotely assessing field readiness using only information that is publically accessible. Such data includes Nexrad radar and state climate network sensors, as well as Twitter-based reports of field conditions for validation. Three algorithms, classification trees, k-nearest-neighbors, and boosted perceptrons are deployed to deliver statistical field readiness assessments of an agricultural site located in Urbana, IL. Two of the three algorithms performed with 92-94% accuracy, with the majority of misclassifications falling within the calculated margins of error. This demonstrates the feasibility of using a machine learning framework with only public data, knowledge of system memory from previous conditions, and statistical tools to assess "readiness" without the need for real-time, on-site physical observation. Future efforts will produce a workflow assimilating Nexrad, climate network

  1. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star–galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star–galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star–galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  2. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star-galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star-galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star-galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  3. Learning Activity Packets for Milling Machines. Unit III--Vertical Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to set up and operate a vertical mill. Tasks addressed in the LAP include mounting and removing cutters and cutter holders for vertical…

  4. Learning Activity Packets for Milling Machines. Unit II--Horizontal Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to set up and operate a horizontal mill. Tasks addressed in the LAP include mounting style "A" or "B" arbors and adjusting arbor…

  5. Machine-z: Rapid machine-learned redshift indicator for Swift gamma-ray bursts

    DOE PAGESBeta

    Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

    2016-06-01

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce ‘machine-z’, a redshift prediction algorithm and a ‘high-z’ classifier for Swift GRBs based on machine learning. Our method relies exclusively onmore » canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ~100 per cent recall. As a result, the most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.« less

  6. Classification of ROTSE Variable Stars using Machine Learning

    NASA Astrophysics Data System (ADS)

    Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration

    2001-12-01

    We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.

  7. The use of machine learning and nonlinear statistical tools for ADME prediction.

    PubMed

    Sakiyama, Yojiro

    2009-02-01

    Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future. PMID:19239395

  8. Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

    PubMed Central

    2011-01-01

    Background Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures. PMID:21982277

  9. Calibration transfer via an extreme learning machine auto-encoder.

    PubMed

    Chen, Wo-Ruo; Bin, Jun; Lu, Hong-Mei; Zhang, Zhi-Min; Liang, Yi-Zeng

    2016-03-21

    In order to solve the spectra standardization problem in near-infrared (NIR) spectroscopy, a Transfer via Extreme learning machine Auto-encoder Method (TEAM) has been proposed in this study. A comparative study among TEAM, piecewise direct standardization (PDS), generalized least squares (GLS) and calibration transfer methods based on canonical correlation analysis (CCA) was conducted, and the performances of these algorithms were benchmarked with three spectral datasets: corn, tobacco and pharmaceutical tablet spectra. The results show that TEAM is a stable method and can significantly reduce prediction errors compared with PDS, GLS and CCA. TEAM can also achieve the best RMSEPs in most cases with a small number of calibration sets. TEAM is implemented in Python language and available as an open source package at https://github.com/zmzhang/TEAM. PMID:26846329

  10. GeneRIF indexing: sentence selection based on machine learning

    PubMed Central

    2013-01-01

    Background A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. Results We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. Conclusions The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species. PMID:23725347

  11. Machine learning of molecular electronic properties in chemical compound space

    NASA Astrophysics Data System (ADS)

    Montavon, Grégoire; Rupp, Matthias; Gobre, Vivekanand; Vazquez-Mayagoitia, Alvaro; Hansen, Katja; Tkatchenko, Alexandre; Müller, Klaus-Robert; Anatole von Lilienfeld, O.

    2013-09-01

    The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure-property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e. nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ‘quantum machine’ is similar, and sometimes superior, to modern quantum-chemical methods—at negligible computational cost.

  12. Forecasting daily streamflow using online sequential extreme learning machines

    NASA Astrophysics Data System (ADS)

    Lima, Aranildo R.; Cannon, Alex J.; Hsieh, William W.

    2016-06-01

    While nonlinear machine methods have been widely used in environmental forecasting, in situations where new data arrive continually, the need to make frequent model updates can become cumbersome and computationally costly. To alleviate this problem, an online sequential learning algorithm for single hidden layer feedforward neural networks - the online sequential extreme learning machine (OSELM) - is automatically updated inexpensively as new data arrive (and the new data can then be discarded). OSELM was applied to forecast daily streamflow at two small watersheds in British Columbia, Canada, at lead times of 1-3 days. Predictors used were weather forecast data generated by the NOAA Global Ensemble Forecasting System (GEFS), and local hydro-meteorological observations. OSELM forecasts were tested with daily, monthly or yearly model updates. More frequent updating gave smaller forecast errors, including errors for data above the 90th percentile. Larger datasets used in the initial training of OSELM helped to find better parameters (number of hidden nodes) for the model, yielding better predictions. With the online sequential multiple linear regression (OSMLR) as benchmark, we concluded that OSELM is an attractive approach as it easily outperformed OSMLR in forecast accuracy.

  13. Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff

    2016-01-01

    Galaxy clusters are a rich source of information for examining fundamental astrophysical processes and cosmological parameters, however, employing clusters as cosmological probes requires accurate mass measurements derived from cluster observables. We study dynamical mass measurements of galaxy clusters contaminated by interlopers, and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create a mock catalog from Multidark's publicly-available N-body MDPL1 simulation where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power law scaling relation to infer cluster mass from galaxy line of sight (LOS) velocity dispersion. The presence of interlopers in the catalog produces a wide, flat fractional mass error distribution, with width = 2.13. We employ the Support Distribution Machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (width = 0.67). Remarkably, SDM applied to contaminated clusters is better able to recover masses than even a scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.

  14. Multi-Step Protocol for Automatic Evaluation of Docking Results Based on Machine Learning Methods--A Case Study of Serotonin Receptors 5-HT(6) and 5-HT(7).

    PubMed

    Smusz, Sabina; Mordalski, Stefan; Witek, Jagna; Rataj, Krzysztof; Kafel, Rafał; Bojarski, Andrzej J

    2015-04-27

    Molecular docking, despite its undeniable usefulness in computer-aided drug design protocols and the increasing sophistication of tools used in the prediction of ligand-protein interaction energies, is still connected with a problem of effective results analysis. In this study, a novel protocol for the automatic evaluation of numerous docking results is presented, being a combination of Structural Interaction Fingerprints and Spectrophores descriptors, machine-learning techniques, and multi-step results analysis. Such an approach takes into consideration the performance of a particular learning algorithm (five machine learning methods were applied), the performance of the docking algorithm itself, the variety of conformations returned from the docking experiment, and the receptor structure (homology models were constructed on five different templates). Evaluation using compounds active toward 5-HT6 and 5-HT7 receptors, as well as additional analysis carried out for beta-2 adrenergic receptor ligands, proved that the methodology is a viable tool for supporting virtual screening protocols, enabling proper discrimination between active and inactive compounds. PMID:25806997

  15. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  16. Image quality assessment with manifold and machine learning

    NASA Astrophysics Data System (ADS)

    Charrier, Christophe; Lebrun, Gilles; Lezoray, Olivier

    2009-01-01

    A crucial step in image compression is the evaluation of its performance, and more precisely the available way to measure the final quality of the compressed image. In this paper, a machine learning expert, providing a final class number is designed. The quality measure is based on a learned classification process in order to respect the one of human observers. Instead of computing a final note, our method classifies the quality using the quality scale recommended by the UIT. This quality scale contains 5 ranks ordered from 1 (the worst quality) to 5 (the best quality). This was done constructing a vector containing many visual attributes. Finally, the final features vector contains more than 40 attibutes. Unfortunatley, no study about the existing interactions between the used visual attributes has been done. A feature selection algorithm could be interesting but the selection is highly related to the further used classifier. Therefore, we prefer to perform dimensionality reduction instead of feature selection. Manifold Learning methods are used to provide a low-dimensional new representation from the initial high dimensional feature space. The classification process is performed on this new low-dimensional representation of the images. Obtained results are compared to the one obtained without applying the dimension reduction process to judge the efficiency of the method.

  17. Sustainable cooling method for machining titanium alloy

    NASA Astrophysics Data System (ADS)

    Boswell, B.; Islam, M. N.

    2016-02-01

    Hard to machine materials such as Titanium Alloy TI-6AI-4V Grade 5 are notoriously known to generate high temperatures and adverse reactions between the workpiece and the tool tip materials. These conditions all contribute to an increase in the wear mechanisms, reducing tool life. Titanium Alloy, for example always requires coolant to be used during machining. However, traditional flood cooling needs to be replaced due to environmental issues, and an alternative cooling method found that has minimum impact on the environment. For true sustainable cooling of the tool it is necessary to account for all energy used in the cooling process, including the energy involved in producing the coolant. Previous research has established that efficient cooling of the tool interface improves the tool life and cutting action. The objective of this research is to determine the most appropriate sustainable cooling method that can also reduce the rate of wear at the tool interface.

  18. Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems.

    PubMed

    Kuusisto, Finn; Dutra, Inês; Elezaby, Mai; Mendonça, Eneida A; Shavlik, Jude; Burnside, Elizabeth S

    2015-01-01

    While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy. PMID:26306246

  19. Combining satellite imagery and machine learning to predict poverty.

    PubMed

    Jean, Neal; Burke, Marshall; Xie, Michael; Davis, W Matthew; Lobell, David B; Ermon, Stefano

    2016-08-19

    Reliable data on economic livelihoods remain scarce in the developing world, hampering efforts to study these outcomes and to design policies that improve them. Here we demonstrate an accurate, inexpensive, and scalable method for estimating consumption expenditure and asset wealth from high-resolution satellite imagery. Using survey and satellite data from five African countries--Nigeria, Tanzania, Uganda, Malawi, and Rwanda--we show how a convolutional neural network can be trained to identify image features that can explain up to 75% of the variation in local-level economic outcomes. Our method, which requires only publicly available data, could transform efforts to track and target poverty in developing countries. It also demonstrates how powerful machine learning techniques can be applied in a setting with limited training data, suggesting broad potential application across many scientific domains. PMID:27540167

  20. Liver vessel segmentation based on extreme learning machine.

    PubMed

    Zeng, Ye Zhan; Zhao, Yu Qian; Liao, Miao; Zou, Bei Ji; Wang, Xiao Fang; Wang, Wei

    2016-05-01

    Liver-vessel segmentation plays an important role in vessel structure analysis for liver surgical planning. This paper presents a liver-vessel segmentation method based on extreme learning machine (ELM). Firstly, an anisotropic filter is used to remove noise while preserving vessel boundaries from the original computer tomography (CT) images. Then, based on the knowledge of prior shapes and geometrical structures, three classical vessel filters including Sato, Frangi and offset medialness filters together with the strain energy filter are used to extract vessel structure features. Finally, the ELM is applied to segment liver vessels from background voxels. Experimental results show that the proposed method can effectively segment liver vessels from abdominal CT images, and achieves good accuracy, sensitivity and specificity. PMID:27132031

  1. Hematocrit estimation using online sequential extreme learning machine.

    PubMed

    Huynh, Hieu Trung; Won, Yonggwan; Kim, Jinsul

    2015-01-01

    Hematocrit is a blood test that is defined as the volume percentage of red blood cells in the whole blood. It is one of the important indicators for clinical decision making and the most effective factor in glucose measurement using handheld devices. In this paper, a method for hematocrit estimation that is based upon the transduced current curve and the neural network is presented. The salient points of this method are that (1) the neural network is trained by the online sequential extreme learning machine (OS-ELM) in which the devices can be still trained with new samples during the using process and (2) the extended features are used to reduce the number of current points which can save the battery power of devices and speed up the measurement process. PMID:26405979

  2. Bicriteria single machine scheduling with setup times and learning effects

    NASA Astrophysics Data System (ADS)

    Soroush, H. M.

    2012-11-01

    We study a bicriteria single machine scheduling problem with job-dependent and past-sequence-dependent (psd) setup time and job-dependent learning effects. The goal is to find the optimal sequence that minimizes a linear combination of a pair of performance criteria consisting of the makespan, the total completion time, and the total absolute differences in completion times. We show that special cases of the resulting three problems are solvable polynomially. However, the general cases cannot be solved in polynomial time; thus, branch-and-bound (B&B) methods are proposed to derive optimal sequences. Computational results demonstrate that the B&B methods solve relatively large problem instances in reasonable amounts of time.

  3. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction

    PubMed Central

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  4. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    PubMed

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  5. Visual Tracking Based on Extreme Learning Machine and Sparse Representation

    PubMed Central

    Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

    2015-01-01

    The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359

  6. Visual tracking based on extreme learning machine and sparse representation.

    PubMed

    Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

    2015-01-01

    The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359

  7. Learning Processes in Man, Machine and Society

    ERIC Educational Resources Information Center

    Malita, Mircea

    1977-01-01

    Deciphering the learning mechanism which exists in man remains to be solved. This article examines the learning process with respect to association and cybernetics. It is recommended that research should focus on the transdisciplinary processes of learning which could become the next key concept in the science of man. (Author/MA)

  8. Building Artificial Vision Systems with Machine Learning

    SciTech Connect

    LeCun, Yann

    2011-02-23

    Three questions pose the next challenge for Artificial Intelligence (AI), robotics, and neuroscience. How do we learn perception (e.g. vision)? How do we learn representations of the perceptual world? How do we learn visual categories from just a few examples?

  9. Data Triage of Astronomical Transients: A Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Rebbapragada, U.

    This talk presents real-time machine learning systems for triage of big data streams generated by photometric and image-differencing pipelines. Our first system is a transient event detection system in development for the Palomar Transient Factory (PTF), a fully-automated synoptic sky survey that has demonstrated real-time discovery of optical transient events. The system is tasked with discriminating between real astronomical objects and bogus objects, which are usually artifacts of the image differencing pipeline. We performed a machine learning forensics investigation on PTF’s initial system that led to training data improvements that decreased both false positive and negative rates. The second machine learning system is a real-time classification engine of transients and variables in development for the Australian Square Kilometre Array Pathfinder (ASKAP), an upcoming wide-field radio survey with unprecedented ability to investigate the radio transient sky. The goal of our system is to classify light curves into known classes with as few observations as possible in order to trigger follow-up on costlier assets. We discuss the violation of standard machine learning assumptions incurred by this task, and propose the use of ensemble and hierarchical machine learning classifiers that make predictions most robustly.

  10. Advances in Machine Learning and Data Mining for Astronomy

    NASA Astrophysics Data System (ADS)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  11. Machine learning for many-body physics: The case of the Anderson impurity model

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-François; Lopez-Bezanilla, Alejandro; von Lilienfeld, O. Anatole; Millis, Andrew J.

    2014-10-01

    Machine learning methods are applied to finding the Green's function of the Anderson impurity model, a basic model system of quantum many-body condensed-matter physics. Different methods of parametrizing the Green's function are investigated; a representation in terms of Legendre polynomials is found to be superior due to its limited number of coefficients and its applicability to state of the art methods of solution. The dependence of the errors on the size of the training set is determined. The results indicate that a machine learning approach to dynamical mean-field theory may be feasible.

  12. Ventricular fibrillation and tachycardia classification using a machine learning approach.

    PubMed

    Li, Qiao; Rajagopalan, Cadathur; Clifford, Gari D

    2014-06-01

    Correct detection and classification of ventricular fibrillation (VF) and rapid ventricular tachycardia (VT) is of pivotal importance for an automatic external defibrillator and patient monitoring. In this paper, a VF/VT classification algorithm using a machine learning method, a support vector machine, is proposed. A total of 14 metrics were extracted from a specific window length of the electrocardiogram (ECG). A genetic algorithm was then used to select the optimal variable combinations. Three annotated public domain ECG databases (the American Heart Association Database, the Creighton University Ventricular Tachyarrhythmia Database, and the MIT-BIH Malignant Ventricular Arrhythmia Database) were used as training, test, and validation datasets. Different window sizes, varying from 1 to 10 s were tested. An accuracy (Ac) of 98.1%, sensitivity (Se) of 98.4%, and specificity (Sp) of 98.0% were obtained on the in-sample training data with 5 s-window size and two selected metrics. On the out-of-sample validation data, an Ac of 96.3% ± 3.4%, Se of 96.2% ± 2.7%, and Sp of 96.2% ± 4.6% were obtained by fivefold cross validation. The results surpass those of current reported methods. PMID:23899591

  13. Study of on-machine error identification and compensation methods for micro machine tools

    NASA Astrophysics Data System (ADS)

    Wang, Shih-Ming; Yu, Han-Jen; Lee, Chun-Yi; Chiu, Hung-Sheng

    2016-08-01

    Micro machining plays an important role in the manufacturing of miniature products which are made of various materials with complex 3D shapes and tight machining tolerance. To further improve the accuracy of a micro machining process without increasing the manufacturing cost of a micro machine tool, an effective machining error measurement method and a software-based compensation method are essential. To avoid introducing additional errors caused by the re-installment of the workpiece, the measurement and compensation method should be on-machine conducted. In addition, because the contour of a miniature workpiece machined with a micro machining process is very tiny, the measurement method should be non-contact. By integrating the image re-constructive method, camera pixel correction, coordinate transformation, the error identification algorithm, and trajectory auto-correction method, a vision-based error measurement and compensation method that can on-machine inspect the micro machining errors and automatically generate an error-corrected numerical control (NC) program for error compensation was developed in this study. With the use of the Canny edge detection algorithm and camera pixel calibration, the edges of the contour of a machined workpiece were identified and used to re-construct the actual contour of the work piece. The actual contour was then mapped to the theoretical contour to identify the actual cutting points and compute the machining errors. With the use of a moving matching window and calculation of the similarity between the actual and theoretical contour, the errors between the actual cutting points and theoretical cutting points were calculated and used to correct the NC program. With the use of the error-corrected NC program, the accuracy of a micro machining process can be effectively improved. To prove the feasibility and effectiveness of the proposed methods, micro-milling experiments on a micro machine tool were conducted, and the results

  14. Integrating data sources to improve hydraulic head predictions : a hierarchical machine learning approach.

    SciTech Connect

    Michael, W. J.; Minsker, B. S.; Tcheng, D.; Valocchi, A. J.; Quinn, J. J.; Environmental Assessment; Univ. of Illinois

    2005-03-26

    This study investigates how machine learning methods can be used to improve hydraulic head predictions by integrating different types of data, including data from numerical models, in a hierarchical approach. A suite of four machine learning methods (decision trees, instance-based weighting, inverse distance weighting, and neural networks) are tested in several hierarchical configurations with different types of data from the 317/319 area at Argonne National Laboratory-East. The best machine learning model had a mean predicted head error 50% smaller than an existing MODFLOW numerical flow model, and a standard deviation of predicted head error 67% lower than the MODFLOW model, computed across all sampled locations used for calibrating the MODFLOW model. These predictions were obtained using decision trees trained with all historical quarterly data; the hourly head measurements were not as useful for prediction, most likely because of their poor spatial coverage. The results show promise for using hierarchical machine learning approaches to improve predictions and to identify the most essential types of data to guide future sampling efforts. Decision trees were also combined with an existing MODFLOW model to test their capabilities for updating numerical models to improve predictions as new data are collected. The combined model had a mean error 50% lower than the MODFLOW model alone. These results demonstrate that hierarchical machine learning approaches can be used to improve predictive performance of existing numerical models in areas with good data coverage. Further research is needed to compare this approach with methods such as Kalman filtering.

  15. OpenCL based machine learning labeling of biomedical datasets

    NASA Astrophysics Data System (ADS)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  16. The cerebellum: a neuronal learning machine?

    NASA Technical Reports Server (NTRS)

    Raymond, J. L.; Lisberger, S. G.; Mauk, M. D.

    1996-01-01

    Comparison of two seemingly quite different behaviors yields a surprisingly consistent picture of the role of the cerebellum in motor learning. Behavioral and physiological data about classical conditioning of the eyelid response and motor learning in the vestibulo-ocular reflex suggests that (i) plasticity is distributed between the cerebellar cortex and the deep cerebellar nuclei; (ii) the cerebellar cortex plays a special role in learning the timing of movement; and (iii) the cerebellar cortex guides learning in the deep nuclei, which may allow learning to be transferred from the cortex to the deep nuclei. Because many of the similarities in the data from the two systems typify general features of cerebellar organization, the cerebellar mechanisms of learning in these two systems may represent principles that apply to many motor systems.

  17. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    PubMed

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  18. Predicting Market Impact Costs Using Nonparametric Machine Learning Models

    PubMed Central

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  19. Machine learning for Big Data analytics in plants.

    PubMed

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences. PMID:25223304

  20. A global prediction of seafloor sediment porosity using machine learning

    NASA Astrophysics Data System (ADS)

    Martin, Kylara M.; Wood, Warren T.; Becker, Joseph J.

    2015-12-01

    Porosity (void ratio) is a critical parameter in models of acoustic propagation, bearing strength, and many other seafloor phenomena. However, like many seafloor phenomena, direct measurements are expensive and sparse. We show here how porosity everywhere at the seafloor can be estimated using a machine learning technique (specifically, Random Forests). Such techniques use sparsely acquired direct samples and dense grids of other parameters to produce a statistically optimal estimate where direct measurements are lacking. Our porosity estimate is both qualitatively more consistent with geologic principles than the results produced by interpolation and quantitatively more accurate than results produced by interpolation or regression methods. We present here a seafloor porosity estimate on a 5 arc min, pixel registered grid, produced using widely available, densely sampled grids of other seafloor properties. These techniques represent the only practical means of estimating seafloor properties in inaccessible regions of the seafloor (e.g., the Arctic).

  1. Extreme Learning Machine for the Predictions of Length of Day

    NASA Astrophysics Data System (ADS)

    Yu, Lei; Zhao, Danning; Cai, Hongbing

    2015-03-01

    This work presents short- and medium-term predictions of length of day (LOD) up to 500 days by means of extreme learning machine (ELM). The EOP C04 time-series with daily values from the International Earth Rotation and Reference Systems Service (IERS) serve as the data basis. The influences of the solid Earth and ocean tides and seasonal atmospheric variations are removed from the C04 series. The residuals are used for training of the ELM. The results of the prediction are compared with those from other prediction methods. The accuracy of the prediction is equal to or even better than that by other approaches. The most striking advantages of employing ELM instead of other algorithms are its noticeably reduced complexity and high computational efficiency.

  2. RECONCILE: a machine-learning coreference resolution system

    Energy Science and Technology Software Center (ESTSC)

    2007-12-10

    RECONCILE is a noun phrase conference resolution system: it identifies noun phrases in a text document and determines which subsets refer to each real world entity referenced in the text. The heart of the system is a combination of supervised and unsupervised machine learning systems. It uses a machine learning algorithm (chosen from an extensive suite, including Weka) for training noun phrase coreference classifier models and implements a variety of clustering algorithms to coordinate themore » pairwise classifications. A number of features have been implemented, including all of the features employed in Ng & Cardie [2002].« less

  3. Numerical analysis method for linear induction machines.

    NASA Technical Reports Server (NTRS)

    Elliott, D. G.

    1972-01-01

    A numerical analysis method has been developed for linear induction machines such as liquid metal MHD pumps and generators and linear motors. Arbitrary phase currents or voltages can be specified and the moving conductor can have arbitrary velocity and conductivity variations from point to point. The moving conductor is divided into a mesh and coefficients are calculated for the voltage induced at each mesh point by unit current at every other mesh point. Combining the coefficients with the mesh resistances yields a set of simultaneous equations which are solved for the unknown currents.

  4. Detecting falls with wearable sensors using machine learning techniques.

    PubMed

    Özdemir, Ahmet Turan; Barshan, Billur

    2014-01-01

    Falls are a serious public health problem and possibly life threatening for people in fall risk groups. We develop an automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions. Each unit comprises three tri-axial devices (accelerometer, gyroscope, and magnetometer/compass). Fourteen volunteers perform a standardized set of movements including 20 voluntary falls and 16 activities of daily living (ADLs), resulting in a large dataset with 2520 trials. To reduce the computational complexity of training and testing the classifiers, we focus on the raw data for each sensor in a 4 s time window around the point of peak total acceleration of the waist sensor, and then perform feature extraction and reduction. Most earlier studies on fall detection employ rule-based approaches that rely on simple thresholding of the sensor outputs. We successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs). We compare the performance and the computational complexity of the classifiers and achieve the best results with the k-NN classifier and LSM, with sensitivity, specificity, and accuracy all above 99%. These classifiers also have acceptable computational requirements for training and testing. Our approach would be applicable in real-world scenarios where data records of indeterminate length, containing multiple activities in sequence, are recorded. PMID:24945676

  5. Structure classification of AB solids via machine learning

    NASA Astrophysics Data System (ADS)

    Guberntis, J. E.; Pilania, G.; Lookman, T.

    2015-03-01

    We explored the use of machine learning methods, specifically support vector machines and various forms of cross-validation, for the task of classifying the crystal structures of the octet AB solids. We partitioned a set of 75 solids into rocksalt and non-rocksalt structures and thus performed a binary classification task. We found that using the standard indices (rσ ,rπ) , suggested by St. John and Bloch several decades ago, enabled an average success in classification of 92 % . Our main new result is our finding that using just rσ and the excess Born effective charge ΔZA of the A atom,computed by DFT, enabled an average success of 98 % , prompting us to propose (rσ , ΔZA) as a replacement for the St. John-Bloch pair. In general, we found that adding one or two other features to the St. John-Bloch pair, unless they include the excess Born effective charge, generally decreases the average success rate. Supported by the Department of Energy.

  6. Enhancement of plant metabolite fingerprinting by machine learning.

    PubMed

    Scott, Ian M; Vermeer, Cornelia P; Liakata, Maria; Corol, Delia I; Ward, Jane L; Lin, Wanchang; Johnson, Helen E; Whitehead, Lynne; Kular, Baldeep; Baker, John M; Walsh, Sean; Dave, Anuja; Larson, Tony R; Graham, Ian A; Wang, Trevor L; King, Ross D; Draper, John; Beale, Michael H

    2010-08-01

    Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by (1)H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, (1)H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted

  7. RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins.

    PubMed

    Walia, Rasna R; Xue, Li C; Wilkins, Katherine; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

    2014-01-01

    Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence

  8. Combining Human and Machine Learning for Morphological Analysis of Galaxy Images

    NASA Astrophysics Data System (ADS)

    Kuminski, Evan; George, Joe; Wallin, John; Shamir, Lior

    2014-10-01

    The increasing importance of digital sky surveys collecting many millions of galaxy images has reinforced the need for robust methods that can perform morphological analysis of large galaxy image databases. Citizen science initiatives such as Galaxy Zoo showed that large data sets of galaxy images can be analyzed effectively by nonscientist volunteers, but since databases generated by robotic telescopes grow much faster than the processing power of any group of citizen scientists, it is clear that computer analysis is required. Here, we propose to use citizen science data for training machine learning systems, and show experimental results demonstrating that machine learning systems can be trained with citizen science data. Our findings show that the performance of machine learning depends on the quality of the data, which can be improved by using samples that have a high degree of agreement between the citizen scientists. The source code of the method is publicly available.

  9. Learning Activity Packets for Grinding Machines. Unit I--Grinding Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the first unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  10. A Numerical Comparison of Rule Ensemble Methods and Support Vector Machines

    SciTech Connect

    Meza, Juan C.; Woods, Mark

    2009-12-18

    Machine or statistical learning is a growing field that encompasses many scientific problems including estimating parameters from data, identifying risk factors in health studies, image recognition, and finding clusters within datasets, to name just a few examples. Statistical learning can be described as 'learning from data' , with the goal of making a prediction of some outcome of interest. This prediction is usually made on the basis of a computer model that is built using data where the outcomes and a set of features have been previously matched. The computer model is called a learner, hence the name machine learning. In this paper, we present two such algorithms, a support vector machine method and a rule ensemble method. We compared their predictive power on three supernova type 1a data sets provided by the Nearby Supernova Factory and found that while both methods give accuracies of approximately 95%, the rule ensemble method gives much lower false negative rates.

  11. Machine learning applications in cancer prognosis and prediction.

    PubMed

    Kourou, Konstantina; Exarchos, Themis P; Exarchos, Konstantinos P; Karamouzis, Michalis V; Fotiadis, Dimitrios I

    2015-01-01

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes. PMID:25750696

  12. Machine learning applications in cancer prognosis and prediction

    PubMed Central

    Kourou, Konstantina; Exarchos, Themis P.; Exarchos, Konstantinos P.; Karamouzis, Michalis V.; Fotiadis, Dimitrios I.

    2014-01-01

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes. PMID:25750696

  13. Drought Forecasting Based on Machine Learning of Remote Sensing and Long-Range Forecast Data

    NASA Astrophysics Data System (ADS)

    Rhee, J.; Im, J.; Park, S.

    2016-06-01

    The reduction of drought impacts may be achieved through sustainable drought management and proactive measures against drought disaster. Accurate and timely provision of drought information is essential. In this study, drought forecasting models to provide high-resolution drought information based on drought indicators for ungauged areas were developed. The developed models predict drought indices of the 6-month Standardized Precipitation Index (SPI6) and the 6-month Standardized Precipitation Evapotranspiration Index (SPEI6). An interpolation method based on multiquadric spline interpolation method as well as three machine learning models were tested. Three machine learning models of Decision Tree, Random Forest, and Extremely Randomized Trees were tested to enhance the provision of drought initial conditions based on remote sensing data, since initial conditions is one of the most important factors for drought forecasting. Machine learning-based methods performed better than interpolation methods for both classification and regression, and the methods using climatology data outperformed the methods using long-range forecast. The model based on climatological data and the machine learning method outperformed overall.

  14. Reduction of false positives by machine learning for computer-aided detection of colonic polyps

    NASA Astrophysics Data System (ADS)

    Zhao, Xin; Wang, Su; Zhu, Hongbin; Liang, Zhengrong

    2009-02-01

    With the development of computer-aided detection of polyps (CADpolyp), various features have been extracted to detect the initial polyp candidates (IPCs). In this paper, three approaches were utilized to reduce the number of false positives (FPs): the multiply linear regression (MLR) and two modified machine learning methods, i.e., neural network (NN) and support vector machine (SVM), based on their own characteristics and specific learning purposes. Compared to MLR, the two modified machine learning methods are much more sophisticated and well-adapted to the data provided. To achieve the optimal sensitivity and specificity, raw features were pre-processed by the principle component analysis (PCA) in the hope of removing the second-order statistical correlation prior to any learning actions. The gain by the use of PCA was evidenced by the collected 26 patient studies, which included 32 colonic polyps confirmed by both optical colonoscopy (OC) and virtual colonoscopy (VC). The learning and testing results showed that the two modified machine-learning methods can reduce the number of FPs by 48.9% (or 7.2 FPs per patient) and 45.3% (or 7.7 FPs per patient) respectively, at 100% detection sensitivity in comparison with that of traditional MLR method. Generally, more than necessary number of features were stacked as input vectors to machine learning algorithms, dimensionality reduction for a more compact feature combination, i.e., how to determine the remaining dimensionality via PCA linear transform was considered and discussed in this paper. In addition, we proposed a new PCA-scaled data pre-processing method to help reduce the FPs significantly. Finally, fROC (free-response receiver operating characteristic) curves corresponding to three FP-reduction approaches were acquired, and comparative analysis was conducted.

  15. Machine Learning Techniques in Optimal Design

    NASA Technical Reports Server (NTRS)

    Cerbone, Giuseppe

    1992-01-01

    to the problem, is then obtained by solving in parallel each of the sub-problems in the set and computing the one with the minimum cost. In addition to speeding up the optimization process, our use of learning methods also relieves the expert from the burden of identifying rules that exactly pinpoint optimal candidate sub-problems. In real engineering tasks it is usually too costly to the engineers to derive such rules. Therefore, this paper also contributes to a further step towards the solution of the knowledge acquisition bottleneck [Feigenbaum, 1977] which has somewhat impaired the construction of rulebased expert systems.

  16. Method and apparatus for monitoring machine performance

    DOEpatents

    Smith, Stephen F.; Castleberry, Kimberly N.

    1996-01-01

    Machine operating conditions can be monitored by analyzing, in either the time or frequency domain, the spectral components of the motor current. Changes in the electric background noise, induced by mechanical variations in the machine, are correlated to changes in the operating parameters of the machine.

  17. One Method of Teaching an Office Machines Class

    ERIC Educational Resources Information Center

    Holmquist, Donna

    1978-01-01

    In the office machines class at the University of Nebraska at Omaha students learn how to operate a new machine each week according to a rotation plan illustrated and described in the article. Because of the rotation system, classwork is mostly individualized. Course objectives, instructional procedures, and a breakdown of duration and grade value…

  18. Machine Translation-Assisted Language Learning: Writing for Beginners

    ERIC Educational Resources Information Center

    Garcia, Ignacio; Pena, Maria Isabel

    2011-01-01

    The few studies that deal with machine translation (MT) as a language learning tool focus on its use by advanced learners, never by beginners. Yet, freely available MT engines (i.e. Google Translate) and MT-related web initiatives (i.e. Gabble-on.com) position themselves to cater precisely to the needs of learners with a limited command of a…

  19. PDT: Photometric DeTrending Algorithm Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Kim, Dae-Won

    2016-05-01

    PDT removes systematic trends in light curves. It finds clusters of light curves that are highly correlated using machine learning, constructs one master trend per cluster and detrends an individual light curve using the constructed master trends by minimizing residuals while constraining coefficients to be positive.

  20. Machine learning of fault characteristics from rocket engine simulation data

    NASA Technical Reports Server (NTRS)

    Ke, Min; Ali, Moonis

    1990-01-01

    Transformation of data into knowledge through conceptual induction has been the focus of our research described in this paper. We have developed a Machine Learning System (MLS) to analyze the rocket engine simulation data. MLS can provide to its users fault analysis, characteristics, and conceptual descriptions of faults, and the relationships of attributes and sensors. All the results are critically important in identifying faults.

  1. Acquiring Software Design Schemas: A Machine Learning Perspective

    NASA Technical Reports Server (NTRS)

    Harandi, Mehdi T.; Lee, Hing-Yan

    1991-01-01

    In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.

  2. A 128-Channel Extreme Learning Machine-Based Neural Decoder for Brain Machine Interfaces.

    PubMed

    Chen, Yi; Yao, Enyi; Basu, Arindam

    2016-06-01

    Currently, state-of-the-art motor intention decoding algorithms in brain-machine interfaces are mostly implemented on a PC and consume significant amount of power. A machine learning coprocessor in 0.35- μm CMOS for the motor intention decoding in the brain-machine interfaces is presented in this paper. Using Extreme Learning Machine algorithm and low-power analog processing, it achieves an energy efficiency of 3.45 pJ/MAC at a classification rate of 50 Hz. The learning in second stage and corresponding digitally stored coefficients are used to increase robustness of the core analog processor. The chip is verified with neural data recorded in monkey finger movements experiment, achieving a decoding accuracy of 99.3% for movement type. The same coprocessor is also used to decode time of movement from asynchronous neural spikes. With time-delayed feature dimension enhancement, the classification accuracy can be increased by 5% with limited number of input channels. Further, a sparsity promoting training scheme enables reduction of number of programmable weights by ≈ 2X. PMID:26672048

  3. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations.

    PubMed

    Torkzaban, Bahareh; Kayvanjoo, Amir Hossein; Ardalan, Arman; Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations. PMID:26599001

  4. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations

    PubMed Central

    Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two ‘4-targeted’ and ‘16-targeted’ experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations. PMID:26599001

  5. Extreme learning machine and adaptive sparse representation for image classification.

    PubMed

    Cao, Jiuwen; Zhang, Kai; Luo, Minxia; Yin, Chun; Lai, Xiaoping

    2016-09-01

    Recent research has shown the speed advantage of extreme learning machine (ELM) and the accuracy advantage of sparse representation classification (SRC) in the area of image classification. Those two methods, however, have their respective drawbacks, e.g., in general, ELM is known to be less robust to noise while SRC is known to be time-consuming. Consequently, ELM and SRC complement each other in computational complexity and classification accuracy. In order to unify such mutual complementarity and thus further enhance the classification performance, we propose an efficient hybrid classifier to exploit the advantages of ELM and SRC in this paper. More precisely, the proposed classifier consists of two stages: first, an ELM network is trained by supervised learning. Second, a discriminative criterion about the reliability of the obtained ELM output is adopted to decide whether the query image can be correctly classified or not. If the output is reliable, the classification will be performed by ELM; otherwise the query image will be fed to SRC. Meanwhile, in the stage of SRC, a sub-dictionary that is adaptive to the query image instead of the entire dictionary is extracted via the ELM output. The computational burden of SRC thus can be reduced. Extensive experiments on handwritten digit classification, landmark recognition and face recognition demonstrate that the proposed hybrid classifier outperforms ELM and SRC in classification accuracy with outstanding computational efficiency. PMID:27389571

  6. Machine Learning for Power System Disturbance and Cyber-attack Discrimination

    SciTech Connect

    Borges, Raymond Charles; Beaver, Justin M; Buckner, Mark A; Morris, Thomas; Adhikari, Uttam; Pan, Shengyi

    2014-01-01

    Power system disturbances are inherently complex and can be attributed to a wide range of sources, including both natural and man-made events. Currently, the power system operators are heavily relied on to make decisions regarding the causes of experienced disturbances and the appropriate course of action as a response. In the case of cyber-attacks against a power system, human judgment is less certain since there is an overt attempt to disguise the attack and deceive the operators as to the true state of the system. To enable the human decision maker, we explore the viability of machine learning as a means for discriminating types of power system disturbances, and focus specifically on detecting cyber-attacks where deception is a core tenet of the event. We evaluate various machine learning methods as disturbance discriminators and discuss the practical implications for deploying machine learning systems as an enhancement to existing power system architectures.

  7. Efficiently Ranking Hyphotheses in Machine Learning

    NASA Technical Reports Server (NTRS)

    Chien, Steve

    1997-01-01

    This paper considers the problem of learning the ranking of a set of alternatives based upon incomplete information (e.g. a limited number of observations). At each decision cycle, the system can output a complete ordering on the hypotheses or decide to gather additional information (e.g. observation) at some cost.

  8. Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification.

    PubMed

    Mirza, Bilal; Lin, Zhiping

    2016-08-01

    In this paper, a meta-cognitive online sequential extreme learning machine (MOS-ELM) is proposed for class imbalance and concept drift learning. In MOS-ELM, meta-cognition is used to self-regulate the learning by selecting suitable learning strategies for class imbalance and concept drift problems. MOS-ELM is the first sequential learning method to alleviate the imbalance problem for both binary class and multi-class data streams with concept drift. In MOS-ELM, a new adaptive window approach is proposed for concept drift learning. A single output update equation is also proposed which unifies various application specific OS-ELM methods. The performance of MOS-ELM is evaluated under different conditions and compared with methods each specific to some of the conditions. On most of the datasets in comparison, MOS-ELM outperforms the competing methods. PMID:27187873

  9. Effects of Plasma Transfusion on Perioperative Bleeding Complications: A Machine Learning Approach

    PubMed Central

    Ngufor, Che; Murphree, Dennis; Upadhyaya, Sudhindra; Madde, Nageswar; Kor, Daryl; Pathak, Jyotishman

    2016-01-01

    Perioperative bleeding (PB) is associated with increased patient morbidity and mortality, and results in substantial health care resource utilization. To assess bleeding risk, a routine practice in most centers is to use indicators such as elevated values of the International Normalized Ratio (INR). For patients with elevated INR, the routine therapy option is plasma transfusion. However, the predictive accuracy of INR and the value of plasma transfusion still remains unclear. Accurate methods are therefore needed to identify early the patients with increased risk of bleeding. The goal of this work is to apply advanced machine learning methods to study the relationship between preoperative plasma transfusion (PPT) and PB in patients with elevated INR undergoing noncardiac surgery. The problem is cast under the framework of causal inference where robust meaningful measures to quantify the effect of PPT on PB are estimated. Results show that both machine learning and standard statistical methods generally agree that PPT negatively impacts PB and other important patient outcomes. However, machine learning methods show significant results, and machine learning boosting methods are found to make less errors in predicting PB. PMID:26262146

  10. Active extreme learning machines for quad-polarimetric SAR imagery classification

    NASA Astrophysics Data System (ADS)

    Samat, Alim; Gamba, Paolo; Du, Peijun; Luo, Jieqiong

    2015-03-01

    Supervised classification of quad-polarimetric SAR images is often constrained by the availability of reliable training samples. Active learning (AL) provides a unique capability at selecting samples with high representation quality and low redundancy. The most important part of AL is the criterion for selecting the most informative candidates (pixels) by ranking. In this paper, class supports based on the posterior probability function are approximated by ensemble learning and majority voting. This approximation is statistically meaningful when a large enough classifier ensemble is exploited. In this work, we propose to use extreme learning machines and apply AL to quad-polarimetric SAR image classification. Extreme learning machines are ideal because of their fast operation, straightforward solution and strong generalization. As inputs to the so-called active extreme learning machines, both polarimetric and spatial features (morphological profiles) are considered. In order to validate the proposed method, results and performance are compared with random sampling and state-of-the-art AL methods, such as margin sampling, normalized entropy query-by-bagging and multiclass level uncertainty. Experimental results for four quad-polarimetric SAR images collected by RADARSAT-2, AirSAR and EMISAR indicate that the proposed method achieves promising results in different scenarios. Moreover, the proposed method is faster than existing techniques in both the learning and the classification phases.

  11. Machine learning bandgaps of double perovskites

    PubMed Central

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-01

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance. PMID:26783247

  12. Machine learning bandgaps of double perovskites

    NASA Astrophysics Data System (ADS)

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-01

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance.

  13. Machine learning bandgaps of double perovskites

    NASA Astrophysics Data System (ADS)

    Pilania, Ghanshyam; Mannodi-Kanakkithodi, Arun; Uberuaga, Blas; Ramprasad, Rampi; Gubernatis, James; Lookman, Turab

    The ability to make rapid and accurate predictions of bandgaps for double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps for double perovskites. After evaluating a set of nearly 1.2 million features, we identify several elemental features of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science (on a dataset of more than 1300 double perovskite bandgaps) and further analyzed to rationalize their prediction performance. Los Alamos National Laboratory LDRD program and the U.S. Department of Energy, Office of Science, Basic Energy Sciences.

  14. Applying machine learning to electronic form filling

    NASA Astrophysics Data System (ADS)

    Hermens, Leonard A.; Schlimmer, Jeffrey C.

    1993-03-01

    Forms of all types are used in businesses and government agencies and most of them are filled in by hand. Yet much time and effort has been expended to automate form-filling by programming specific systems on computers. The high cost of programmers and other resources prohibits many organizations from benefitting from efficient office automation. A learning apprentice can be used for such repetitious form-filling tasks. In this paper, we establish the need for learning apprentices, describe a framework for such a system, explain the difficulties of form-filling, and present empirical results of a form-filling system used in our department from September 1991 to April 1992. The form-filling apprentice saves up to 84% in keystroke effort and correctly predicts nearly 90% of the values on the form.

  15. Machine learning bandgaps of double perovskites

    DOE PAGESBeta

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-19

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the mostmore » crucial and relevant predictors. As a result, the developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance.« less

  16. A machine learning approach for detecting cell phone usage

    NASA Astrophysics Data System (ADS)

    Xu, Beilei; Loce, Robert P.

    2015-03-01

    Cell phone usage while driving is common, but widely considered dangerous due to distraction to the driver. Because of the high number of accidents related to cell phone usage while driving, several states have enacted regulations that prohibit driver cell phone usage while driving. However, to enforce the regulation, current practice requires dispatching law enforcement officers at road side to visually examine incoming cars or having human operators manually examine image/video records to identify violators. Both of these practices are expensive, difficult, and ultimately ineffective. Therefore, there is a need for a semi-automatic or automatic solution to detect driver cell phone usage. In this paper, we propose a machine-learning-based method for detecting driver cell phone usage using a camera system directed at the vehicle's front windshield. The developed method consists of two stages: first, the frontal windshield region localization using the deformable part model (DPM), next, we utilize Fisher vectors (FV) representation to classify the driver's side of the windshield into cell phone usage violation and non-violation classes. The proposed method achieved about 95% accuracy with a data set of more than 100 images with drivers in a variety of challenging poses with or without cell phones.

  17. Protein sequence classification with improved extreme learning machine algorithms.

    PubMed

    Cao, Jiuwen; Xiong, Lianglin

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  18. Health Informatics via Machine Learning for the Clinical Management of Patients

    PubMed Central

    Niehaus, K. E.; Charlton, P.; Colopy, G. W.

    2015-01-01

    Summary Objectives To review how health informatics systems based on machine learning methods have impacted the clinical management of patients, by affecting clinical practice. Methods We reviewed literature from 2010-2015 from databases such as Pubmed, IEEE xplore, and INSPEC, in which methods based on machine learning are likely to be reported. We bring together a broad body of literature, aiming to identify those leading examples of health informatics that have advanced the methodology of machine learning. While individual methods may have further examples that might be added, we have chosen some of the most representative, informative exemplars in each case. Results Our survey highlights that, while much research is taking place in this high-profile field, examples of those that affect the clinical management of patients are seldom found. We show that substantial progress is being made in terms of methodology, often by data scientists working in close collaboration with clinical groups. Conclusions Health informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale. PMID:26293849

  19. Use of Machine Learning to Identify Children with Autism and Their Motor Abnormalities

    ERIC Educational Resources Information Center

    Crippa, Alessandro; Salvatore, Christian; Perego, Paolo; Forti, Sara; Nobile, Maria; Molteni, Massimo; Castiglioni, Isabella

    2015-01-01

    In the present work, we have undertaken a proof-of-concept study to determine whether a simple upper-limb movement could be useful to accurately classify low-functioning children with autism spectrum disorder (ASD) aged 2-4. To answer this question, we developed a supervised machine-learning method to correctly discriminate 15 preschool children…

  20. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  1. Revisiting Warfarin Dosing Using Machine Learning Techniques

    PubMed Central

    Sharabiani, Ashkan; Bress, Adam; Douzali, Elnaz; Darabi, Houshang

    2015-01-01

    Determining the appropriate dosage of warfarin is an important yet challenging task. Several prediction models have been proposed to estimate a therapeutic dose for patients. The models are either clinical models which contain clinical and demographic variables or pharmacogenetic models which additionally contain the genetic variables. In this paper, a new methodology for warfarin dosing is proposed. The patients are initially classified into two classes. The first class contains patients who require doses of >30 mg/wk and the second class contains patients who require doses of ≤30 mg/wk. This phase is performed using relevance vector machines. In the second phase, the optimal dose for each patient is predicted by two clinical regression models that are customized for each class of patients. The prediction accuracy of the model was 11.6 in terms of root mean squared error (RMSE) and 8.4 in terms of mean absolute error (MAE). This was 15% and 5% lower than IWPC and Gage models (which are the most widely used models in practice), respectively, in terms of RMSE. In addition, the proposed model was compared with fixed-dose approach of 35 mg/wk, and the model proposed by Sharabiani et al. and its outperformance were proved in terms of both MAE and RMSE. PMID:26146514

  2. Method for laser machining explosives and ordnance

    DOEpatents

    Muenchausen, Ross E.; Rivera, Thomas; Sanchez, John A.

    2003-05-06

    Method for laser machining explosives and related articles. A laser beam is directed at a surface portion of a mass of high explosive to melt and/or vaporize the surface portion while directing a flow of gas at the melted and/or vaporized surface portion. The gas flow sends the melted and/or vaporized explosive away from the charge of explosive that remains. The method also involves splitting the casing of a munition having an encased explosive. The method includes rotating a munition while directing a laser beam to a surface portion of the casing of an article of ordnance. While the beam melts and/or vaporizes the surface portion, a flow of gas directed at the melted and/or vaporized surface portion sends it away from the remaining portion of ordnance. After cutting through the casing, the beam then melts and/or vaporizes portions of the encased explosive and the gas stream sends the melted/vaporized explosive away from the ordnance. The beam is continued until it splits the article, after which the encased explosive, now accessible, can be removed safely for recycle or disposal.

  3. Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

    PubMed Central

    Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M.; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John

    2016-01-01

    Objective: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient’s status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral. PMID:27257386

  4. Application of Learning Machines and Combinatorial Algorithms in Water Resources Management and Hydrologic Sciences

    SciTech Connect

    Khalil, Abedalrazq F.; Kaheil, Yasir H.; Gill, Kashif; Mckee, Mac

    2010-01-01

    Contemporary and water resources engineering and management rely increasingly on pattern recognition techniques that have the ability to capitalize on the unrelenting accumulation of data that is made possible by modern information technology and remote sensing methods. In response to the growing information needs of modern water systems, advanced computational models and tools have been devised to identify and extract relevant information from the mass of data that is now available. This chapter presents innovative applications from computational learning science within the fields of hydrology, hydrogeology, hydroclimatology, and water management. The success of machine learning is evident from the growing number of studies involving the application of Artificial Neural Networks (ANN), Support Vector Machines (SVM), Relevance Vector Machines (RVM), and Locally Weighted Projection Regression (LWPR) to address various issues in hydrologic sciences. The applications that will be discussed within the chapter employ the abovementioned machine learning techniques for intelligent modeling of reservoir operations, temporal downscaling of precipitation, spatial downscaling of soil moisture and evapotranspiration, comparisons of various techniques for groundwater quality modeling, and forecasting of chaotic time series behavior. Combinatorial algorithms to capture the intrinsic complexities in the modeled phenomena and to overcome disparate scales are developed; for example, learning machines have been coupled with geostatistical techniques, non-homogenous hidden Markov models, wavelets, and evolutionary computing techniques. This chapter does not intend to be exhaustive; it reviews the progress that has been made over the past decade in the use of learning machines in applied hydrologic sciences and presents a summary of future needs and challenges for further advancement of these methods.

  5. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines.

    PubMed

    Neftci, Emre O; Pedroni, Bruno U; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware. PMID:27445650

  6. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines

    PubMed Central

    Neftci, Emre O.; Pedroni, Bruno U.; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware. PMID:27445650

  7. Automated mapping of building facades by machine learning

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-08-01

    Facades of buildings contain various types of objects which have to be recorded for information systems. The article describes a solution for this task focussing on automated classification by means of machine learning techniques. Stereo pairs of oblique images are used to derive 3D point clouds of buildings. The planes of the buildings are automatically detected. The derived planes are supplemented with a regular grid of points for which the colour values are found in the images. For each grid point of the façade additional attributes are derived from image and object data. This "intelligent" point cloud is analysed by a decision tree, which is derived from a small training set. The derived decision tree is then used to classify the complete point cloud. To each point of the regular façade grid a class is assigned and a façade plan is mapped by a colour palette representing the different objects. Some image processing methods are applied to improve the appearance of the interpreted façade plot and to extract additional information. The proposed method is tested on facades of a church. Accuracy measures were derived from 140 independent checkpoints, which were randomly selected. When selecting four classes ("window", "stone work", "painted wall", and "vegetation") the overall accuracy is assessed with 80 % (95 % Confidence Interval: 71 %-88 %). The user accuracy of class "stonework" was assessed with 90 % (95 % CI: 80 %-97 %). The proposed methodology has a high potential for automation and fast processing.

  8. Phase discontinuity predictions using a machine-learning trained kernel.

    PubMed

    Sawaf, Firas; Groves, Roger M

    2014-08-20

    Phase unwrapping is one of the key steps of interferogram analysis, and its accuracy relies primarily on the correct identification of phase discontinuities. This can be especially challenging for inherently noisy phase fields, such as those produced through shearography and other speckle-based interferometry techniques. We showed in a recent work how a relatively small 10×10 pixel kernel was trained, through machine learning methods, for predicting the locations of phase discontinuities within noisy wrapped phase maps. We describe here how this kernel can be applied in a sliding-window fashion, such that each pixel undergoes 100 phase-discontinuity examinations--one test for each of its possible positions relative to its neighbors within the kernel's extent. We explore how the resulting predictions can be accumulated, and aggregated through a voting system, and demonstrate that the reliability of this method outperforms processing the image by segmenting it into more conventional 10×10 nonoverlapping tiles. When used in this way, we demonstrate that our 10×10 pixel kernel is large enough for effective processing of full-field interferograms. Avoiding, thus, the need for substantially more formidable computational resources which otherwise would have been necessary for training a kernel of a significantly larger size. PMID:25321117

  9. Machine learning of parameters for accurate semiempirical quantum chemical calculations

    DOE PAGESBeta

    Dral, Pavlo O.; von Lilienfeld, O. Anatole; Thiel, Walter

    2015-04-14

    We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempiricalmore » OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.« less

  10. Machine learning of parameters for accurate semiempirical quantum chemical calculations

    SciTech Connect

    Dral, Pavlo O.; von Lilienfeld, O. Anatole; Thiel, Walter

    2015-04-14

    We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempirical OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.

  11. Distinguishing meanders of the Kuroshio using machine learning

    NASA Astrophysics Data System (ADS)

    Plotkin, David A.; Weare, Jonathan; Abbot, Dorian S.

    2014-10-01

    The Kuroshio south of Japan is often described as being bimodal, with abrupt transitions between a straight path state that stays near the coast (small meander) and a meandering state that deviates from the coast (large meander). Despite evidence of the existence of two or more states of the Kuroshio, previous data-driven studies have shown only high variability of the current; they have not, however, demonstrated bimodality in the sense of two states of relatively high probability separated by a region of relatively low probability. We use singular value decomposition (SVD), a standard time series analysis method for characterizing variability, and diffusion maps and spectral clustering (DMSC), a machine learning algorithm that seeks multimodality, to investigate Kuroshio reanalysis output. By applying these methods to a time series of velocity fields, we find that (1) the Kuroshio is bimodal, with high inflow and low path variability in the small meander and low inflow and high path variability in the large meander, (2) the state of the system correlates highly with the location of the recirculation gyre south of Japan, and (3) the meanders are better characterized by path variability than by mean path. Because these results are consistent with satellite sea surface height data, they are not an artifact of the model used for reanalysis. Further, our results provide evidence for a previously proposed transition mechanism based on the strengthening, migration, and weakening of the recirculation gyre south of Japan and can therefore help direct future modeling studies.

  12. Improved Automated Seismic Event Extraction Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Mackey, L.; Kleiner, A.; Jordan, M. I.

    2009-12-01

    Like many organizations engaged in seismic monitoring, the Preparatory Commission for the Comprehensive Test Ban Treaty Organization collects and processes seismic data from a large network of sensors. This data is continuously transmitted to a central data center, and bulletins of seismic events are automatically extracted. However, as for many such automated systems at present, the inaccuracy of this extraction necessitates substantial human analyst review effort. A significant opportunity for improvement thus lies in the fact that these systems currently fail to fully utilize the valuable repository of historical data provided by prior analyst reviews. In this work, we present the results of the application of machine learning approaches to several fundamental sub-tasks in seismic event extraction. These methods share as a common theme the use of historical analyst-reviewed bulletins as ground truth from which they extract relevant patterns to accomplish the desired goals. For instance, we demonstrate the effectiveness of classification and ranking methods for the identification of false events -- that is, those which will be invalidated and discarded by analysts -- in automated bulletins. We also show gains in the accuracy of seismic phase identification via the use of classification techniques to automatically assign seismic phase labels to station detections. Furthermore, we examine the potential of historical association data to inform the direct association of new signal detections with their corresponding seismic events. Empirical results are based upon parametric historical seismic detection and event data received from the Preparatory Commission for the Comprehensive Test Ban Treaty Organization.

  13. A comparative analysis of support vector machines and extreme learning machines.

    PubMed

    Liu, Xueyi; Gao, Chuanhou; Li, Ping

    2012-09-01

    The theory of extreme learning machines (ELMs) has recently become increasingly popular. As a new learning algorithm for single-hidden-layer feed-forward neural networks, an ELM offers the advantages of low computational cost, good generalization ability, and ease of implementation. Hence the comparison and model selection between ELMs and other kinds of state-of-the-art machine learning approaches has become significant and has attracted many research efforts. This paper performs a comparative analysis of the basic ELMs and support vector machines (SVMs) from two viewpoints that are different from previous works: one is the Vapnik-Chervonenkis (VC) dimension, and the other is their performance under different training sample sizes. It is shown that the VC dimension of an ELM is equal to the number of hidden nodes of the ELM with probability one. Additionally, their generalization ability and computational complexity are exhibited with changing training sample size. ELMs have weaker generalization ability than SVMs for small sample but can generalize as well as SVMs for large sample. Remarkably, great superiority in computational speed especially for large-scale sample problems is found in ELMs. The results obtained can provide insight into the essential relationship between them, and can also serve as complementary knowledge for their past experimental and theoretical comparisons. PMID:22572469

  14. Neural cell image segmentation method based on support vector machine

    NASA Astrophysics Data System (ADS)

    Niu, Shiwei; Ren, Kan

    2015-10-01

    In the analysis of neural cell images gained by optical microscope, accurate and rapid segmentation is the foundation of nerve cell detection system. In this paper, a modified image segmentation method based on Support Vector Machine (SVM) is proposed to reduce the adverse impact caused by low contrast ratio between objects and background, adherent and clustered cells' interference etc. Firstly, Morphological Filtering and OTSU Method are applied to preprocess images for extracting the neural cells roughly. Secondly, the Stellate Vector, Circularity and Histogram of Oriented Gradient (HOG) features are computed to train SVM model. Finally, the incremental learning SVM classifier is used to classify the preprocessed images, and the initial recognition areas identified by the SVM classifier are added to the library as the positive samples for training SVM model. Experiment results show that the proposed algorithm can achieve much better segmented results than the classic segmentation algorithms.

  15. Optimizing extreme learning machine for hyperspectral image classification

    NASA Astrophysics Data System (ADS)

    Li, Jiaojiao; Du, Qian; Li, Wei; Li, Yunsong

    2015-01-01

    Extreme learning machine (ELM) is of great interest to the machine learning society due to its extremely simple training step. Its performance sensitivity to the number of hidden neurons is studied under the context of hyperspectral remote sensing image classification. An empirical linear relationship between the number of training samples and the number of hidden neurons is proposed. Such a relationship can be easily estimated with two small training sets and extended to large training sets to greatly reduce computational cost. The kernel version of ELM (KELM) is also implemented with the radial basis function kernel, and such a linear relationship is still suitable. The experimental results demonstrated that when the number of hidden neurons is appropriate, the performance of ELM may be slightly lower than the linear SVM, but the performance of KELM can be comparable to the kernel version of SVM (KSVM). The computational cost of ELM and KELM is much lower than that of the linear SVM and KSVM, respectively.

  16. Closure modeling using field inversion and machine learning

    NASA Astrophysics Data System (ADS)

    Duraisamy, Karthik

    2015-11-01

    The recent acceleration in computational power and measurement resolution has made possible the availability of extreme scale simulations and data sets. In this work, a modeling paradigm that seeks to comprehensively harness large scale data is introduced, with the aim of improving closure models. Full-field inversion (in contrast to parameter estimation) is used to obtain corrective, spatially distributed functional terms, offering a route to directly address model-form errors. Once the inference has been performed over a number of problems that are representative of the deficient physics in the closure model, machine learning techniques are used to reconstruct the model corrections in terms of variables that appear in the closure model. These machine-learned functional forms are then used to augment the closure model in predictive computations. The approach is demonstrated to be able to successfully reconstruct functional corrections and yield predictions with quantified uncertainties in a range of turbulent flows.

  17. Robust Extreme Learning Machine With its Application to Indoor Positioning.

    PubMed

    Lu, Xiaoxuan; Zou, Han; Zhou, Hongming; Xie, Lihua; Huang, Guang-Bin

    2016-01-01

    The increasing demands of location-based services have spurred the rapid development of indoor positioning system and indoor localization system interchangeably (IPSs). However, the performance of IPSs suffers from noisy measurements. In this paper, two kinds of robust extreme learning machines (RELMs), corresponding to the close-to-mean constraint, and the small-residual constraint, have been proposed to address the issue of noisy measurements in IPSs. Based on whether the feature mapping in extreme learning machine is explicit, we respectively provide random-hidden-nodes and kernelized formulations of RELMs by second order cone programming. Furthermore, the computation of the covariance in feature space is discussed. Simulations and real-world indoor localization experiments are extensively carried out and the results demonstrate that the proposed algorithms can not only improve the accuracy and repeatability, but also reduce the deviation and worst case error of IPSs compared with other baseline algorithms. PMID:26684258

  18. Stochastic Local Interaction (SLI) model: Bridging machine learning and geostatistics

    NASA Astrophysics Data System (ADS)

    Hristopulos, Dionissios T.

    2015-12-01

    Machine learning and geostatistics are powerful mathematical frameworks for modeling spatial data. Both approaches, however, suffer from poor scaling of the required computational resources for large data applications. We present the Stochastic Local Interaction (SLI) model, which employs a local representation to improve computational efficiency. SLI combines geostatistics and machine learning with ideas from statistical physics and computational geometry. It is based on a joint probability density function defined by an energy functional which involves local interactions implemented by means of kernel functions with adaptive local kernel bandwidths. SLI is expressed in terms of an explicit, typically sparse, precision (inverse covariance) matrix. This representation leads to a semi-analytical expression for interpolation (prediction), which is valid in any number of dimensions and avoids the computationally costly covariance matrix inversion.

  19. Protein function in precision medicine: deep understanding with machine learning.

    PubMed

    Rost, Burkhard; Radivojac, Predrag; Bromberg, Yana

    2016-08-01

    Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both. PMID:27423136

  20. Explanatory approach for evaluation of machine learning-induced knowledge.

    PubMed

    Zorman, Milan; Verlic, M

    2009-01-01

    Progress in biomedical research has resulted in an explosive growth of data. Use of the world wide web for sharing data has opened up possibilities for exhaustive data mining analysis. Symbolic machine learning approaches used in data mining, especially ensemble approaches, produce large sets of patterns that need to be evaluated. Manual evaluation of all patterns by a human expert is almost impossible. We propose a new approach to the evaluation of machine learning-induced knowledge by introducing a pre-evaluation step. Pre-evaluation is the automatic evaluation of patterns obtained from the data mining phase, using text mining techniques and sentiment analysis. It is used as a filter for patterns according to the support found in online resources, such as publicly-available repositories of scientific papers and reports related to the problem. The domain expert can then more easily distinguish between patterns or rules that are potential candidates for new knowledge. PMID:19930862

  1. Prototype Vector Machine for Large Scale Semi-Supervised Learning

    SciTech Connect

    Zhang, Kai; Kwok, James T.; Parvin, Bahram

    2009-04-29

    Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.

  2. Application of machine learning to structural molecular biology.

    PubMed

    Sternberg, M J; King, R D; Lewis, R A; Muggleton, S

    1994-06-29

    A technique of machine learning, inductive logic programming implemented in the program GOLEM, has been applied to three problems in structural molecular biology. These problems are: the prediction of protein secondary structure; the identification of rules governing the arrangement of beta-sheets strands in the tertiary folding of proteins; and the modelling of a quantitative structure activity relationship (QSAR) of a series of drugs. For secondary structure prediction and the QSAR, GOLEM yielded predictions comparable with contemporary approaches including neural networks. Rules for beta-strand arrangement are derived and it is planned to contrast their accuracy with those obtained by human inspection. In all three studies GOLEM discovered rules that provided insight into the stereochemistry of the system. We conclude machine learning used together with human intervention will provide a powerful tool to discover patterns in biological sequences and structures. PMID:7800706

  3. Probability and Statistics in Astronomical Machine Learning and Data Minin

    NASA Astrophysics Data System (ADS)

    Scargle, Jeffrey

    2012-03-01

    Statistical issues peculiar to astronomy have implications for machine learning and data mining. It should be obvious that statistics lies at the heart of machine learning and data mining. Further it should be no surprise that the passive observational nature of astronomy, the concomitant lack of sampling control, and the uniqueness of its realm (the whole universe!) lead to some special statistical issues and problems. As described in the Introduction to this volume, data analysis technology is largely keeping up with major advances in astrophysics and cosmology, even driving many of them. And I realize that there are many scientists with good statistical knowledge and instincts, especially in the modern era I like to call the Age of Digital Astronomy. Nevertheless, old impediments still lurk, and the aim of this chapter is to elucidate some of them. Many experiences with smart people doing not-so-smart things (cf. the anecdotes collected in the Appendix here) have convinced me that the cautions given here need to be emphasized. Consider these four points: 1. Data analysis often involves searches of many cases, for example, outcomes of a repeated experiment, for a feature of the data. 2. The feature comprising the goal of such searches may not be defined unambiguously until the search is carried out, or perhaps vaguely even then. 3. The human visual system is very good at recognizing patterns in noisy contexts. 4. People are much easier to convince of something they want to believe, or already believe, as opposed to unpleasant or surprising facts. One can argue that all four are good things during the initial, exploratory phases of most data analysis. They represent the curiosity and creativity of the scientific process, especially during the exploration of data collections from new observational programs such as all-sky surveys in wavelengths not accessed before or sets of images of a planetary surface not yet explored. On the other hand, confirmatory scientific

  4. Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality

    PubMed Central

    2016-01-01

    Background One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Objective Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Methods Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Results Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Conclusions Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data. PMID:27185366

  5. Smarter Instruments, Smarter Archives: Machine Learning for Tactical Science

    NASA Astrophysics Data System (ADS)

    Thompson, D. R.; Kiran, R.; Allwood, A.; Altinok, A.; Estlin, T.; Flannery, D.

    2014-12-01

    There has been a growing interest by Earth and Planetary Sciences in machine learning, visualization and cyberinfrastructure to interpret ever-increasing volumes of instrument data. Such tools are commonly used to analyze archival datasets, but they can also play a valuable real-time role during missions. Here we discuss ways that machine learning can benefit tactical science decisions during Earth and Planetary Exploration. Machine learning's potential begins at the instrument itself. Smart instruments endowed with pattern recognition can immediately recognize science features of interest. This allows robotic explorers to optimize their limited communications bandwidth, triaging science products and prioritizing the most relevant data. Smart instruments can also target their data collection on the fly, using principles of experimental design to reduce redundancy and generally improve sampling efficiency for time-limited operations. Moreover, smart instruments can respond immediately to transient or unexpected phenomena. Examples include detections of cometary plumes, terrestrial floods, or volcanism. We show recent examples of smart instruments from 2014 tests including: aircraft and spacecraft remote sensing instruments that recognize cloud contamination, field tests of a "smart camera" for robotic surface geology, and adaptive data collection by X-Ray fluorescence spectrometers. Machine learning can also assist human operators when tactical decision making is required. Terrestrial scenarios include airborne remote sensing, where the decision to re-fly a transect must be made immediately. Planetary scenarios include deep space encounters or planetary surface exploration, where the number of command cycles is limited and operators make rapid daily decisions about where next to collect measurements. Visualization and modeling can reveal trends, clusters, and outliers in new data. This can help operators recognize instrument artifacts or spot anomalies in real time

  6. Machine-learning-assisted materials discovery using failed experiments

    NASA Astrophysics Data System (ADS)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic–organic hybrid materials such as organically templated metal oxides, metal–organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure–property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully

  7. Machine-learning-assisted materials discovery using failed experiments.

    PubMed

    Raccuglia, Paul; Elbert, Katherine C; Adler, Philip D F; Falk, Casey; Wenny, Malia B; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A; Schrier, Joshua; Norquist, Alexander J

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions--failed or unsuccessful hydrothermal syntheses--collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions

  8. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  9. A new machine learning algorithm for removal of salt and pepper noise

    NASA Astrophysics Data System (ADS)

    Wang, Yi; Adhami, Reza; Fu, Jian

    2015-07-01

    Supervised machine learning algorithm has been extensively studied and applied to different fields of image processing in past decades. This paper proposes a new machine learning algorithm, called margin setting (MS), for restoring images that are corrupted by salt and pepper impulse noise. Margin setting generates decision surface to classify the noise pixels and non-noise pixels. After the noise pixels are detected, a modified ranked order mean (ROM) filter is used to replace the corrupted pixels for images reconstruction. Margin setting algorithm is tested with grayscale and color images for different noise densities. The experimental results are compared with those of the support vector machine (SVM) and standard median filter (SMF). The results show that margin setting outperforms these methods with higher Peak Signal-to-Noise Ratio (PSNR), lower mean square error (MSE), higher image enhancement factor (IEF) and higher Structural Similarity Index (SSIM).

  10. A software framework for building biomedical machine learning classifiers through grid computing resources.

    PubMed

    Ramos-Pollán, Raúl; Guevara-López, Miguel Angel; Oliveira, Eugénio

    2012-08-01

    This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation. PMID:21479625

  11. Parsimonious extreme learning machine using recursive orthogonal least squares.

    PubMed

    Wang, Ning; Er, Meng Joo; Han, Min

    2014-10-01

    Novel constructive and destructive parsimonious extreme learning machines (CP- and DP-ELM) are proposed in this paper. By virtue of the proposed ELMs, parsimonious structure and excellent generalization of multiinput-multioutput single hidden-layer feedforward networks (SLFNs) are obtained. The proposed ELMs are developed by innovative decomposition of the recursive orthogonal least squares procedure into sequential partial orthogonalization (SPO). The salient features of the proposed approaches are as follows: 1) Initial hidden nodes are randomly generated by the ELM methodology and recursively orthogonalized into an upper triangular matrix with dramatic reduction in matrix size; 2) the constructive SPO in the CP-ELM focuses on the partial matrix with the subcolumn of the selected regressor including nonzeros as the first column while the destructive SPO in the DP-ELM operates on the partial matrix including elements determined by the removed regressor; 3) termination criteria for CP- and DP-ELM are simplified by the additional residual error reduction method; and 4) the output weights of the SLFN need not be solved in the model selection procedure and is derived from the final upper triangular equation by backward substitution. Both single- and multi-output real-world regression data sets are used to verify the effectiveness and superiority of the CP- and DP-ELM in terms of parsimonious architecture and generalization accuracy. Innovative applications to nonlinear time-series modeling demonstrate superior identification results. PMID:25291736

  12. Characterization of decohering quantum systems: Machine learning approach

    NASA Astrophysics Data System (ADS)

    Stenberg, Markku P. V.; Köhn, Oliver; Wilhelm, Frank K.

    2016-01-01

    Adaptive data collection and analysis, where data are being fed back to update the measurement settings, can greatly increase speed, precision, and reliability of the characterization of quantum systems. However, decoherence tends to make adaptive characterization difficult. As an example, we consider two coupled discrete quantum systems. When one of the systems can be controlled and measured, the standard method to characterize another, with an unknown frequency ωr, is swap spectroscopy. Here, adapting measurements can provide estimates whose error decreases exponentially in the number of measurement shots rather than as a power law in conventional swap spectroscopy. However, when the decoherence time is so short that an excitation oscillating between the two systems can only undergo less than a few tens of vacuum Rabi oscillations, this approach can be marred by a severe limit on accuracy unless carefully designed. We adopt machine learning techniques to search for efficient policies for the characterization of decohering quantum systems. We find, for instance, that when the system undergoes more than 2 Rabi oscillations during its relaxation time T1, O (103) measurement shots are sufficient to reduce the squared error of the Bayesian initial prior of the unknown frequency ωr by a factor O (104) or larger. We also develop policies optimized for extreme initial parameter uncertainty and for the presence of imperfections in the readout.

  13. Drug repositioning: a machine-learning approach through data integration.

    PubMed

    Napolitano, Francesco; Zhao, Yan; Moreira, Vânia M; Tagliaferri, Roberto; Kere, Juha; D'Amato, Mauro; Greco, Dario

    2013-01-01

    : Existing computational methods for drug repositioning either rely only on the gene expression response of cell lines after treatment, or on drug-to-disease relationships, merging several information levels. However, the noisy nature of the gene expression and the scarcity of genomic data for many diseases are important limitations to such approaches. Here we focused on a drug-centered approach by predicting the therapeutic class of FDA-approved compounds, not considering data concerning the diseases. We propose a novel computational approach to predict drug repositioning based on state-of-the-art machine-learning algorithms. We have integrated multiple layers of information: i) on the distances of the drugs based on how similar are their chemical structures, ii) on how close are their targets within the protein-protein interaction network, and iii) on how correlated are the gene expression patterns after treatment. Our classifier reaches high accuracy levels (78%), allowing us to re-interpret the top misclassifications as re-classifications, after rigorous statistical evaluation. Efficient drug repurposing has the potential to significantly impact the whole field of drug development. The results presented here can significantly accelerate the translation into the clinics of known compounds for novel therapeutic uses. PMID:23800010

  14. Drug repositioning: a machine-learning approach through data integration

    PubMed Central

    2013-01-01

    Existing computational methods for drug repositioning either rely only on the gene expression response of cell lines after treatment, or on drug-to-disease relationships, merging several information levels. However, the noisy nature of the gene expression and the scarcity of genomic data for many diseases are important limitations to such approaches. Here we focused on a drug-centered approach by predicting the therapeutic class of FDA-approved compounds, not considering data concerning the diseases. We propose a novel computational approach to predict drug repositioning based on state-of-the-art machine-learning algorithms. We have integrated multiple layers of information: i) on the distances of the drugs based on how similar are their chemical structures, ii) on how close are their targets within the protein-protein interaction network, and iii) on how correlated are the gene expression patterns after treatment. Our classifier reaches high accuracy levels (78%), allowing us to re-interpret the top misclassifications as re-classifications, after rigorous statistical evaluation. Efficient drug repurposing has the potential to significantly impact the whole field of drug development. The results presented here can significantly accelerate the translation into the clinics of known compounds for novel therapeutic uses. PMID:23800010

  15. Neural Network Machine Learning and Dimension Reduction for Data Visualization

    NASA Technical Reports Server (NTRS)

    Liles, Charles A.

    2014-01-01

    Neural network machine learning in computer science is a continuously developing field of study. Although neural network models have been developed which can accurately predict a numeric value or nominal classification, a general purpose method for constructing neural network architecture has yet to be developed. Computer scientists are often forced to rely on a trial-and-error process of developing and improving accurate neural network models. In many cases, models are constructed from a large number of input parameters. Understanding which input parameters have the greatest impact on the prediction of the model is often difficult to surmise, especially when the number of input variables is very high. This challenge is often labeled the "curse of dimensionality" in scientific fields. However, techniques exist for reducing the dimensionality of problems to just two dimensions. Once a problem's dimensions have been mapped to two dimensions, it can be easily plotted and understood by humans. The ability to visualize a multi-dimensional dataset can provide a means of identifying which input variables have the highest effect on determining a nominal or numeric output. Identifying these variables can provide a better means of training neural network models; models can be more easily and quickly trained using only input variables which appear to affect the outcome variable. The purpose of this project is to explore varying means of training neural networks and to utilize dimensional reduction for visualizing and understanding complex datasets.

  16. Automatic Quality Inspection of Percussion Cap Mass Production by Means of 3D Machine Vision and Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Tellaeche, A.; Arana, R.; Ibarguren, A.; Martínez-Otzeta, J. M.

    The exhaustive quality control is becoming very important in the world's globalized market. One of these examples where quality control becomes critical is the percussion cap mass production. These elements must achieve a minimum tolerance deviation in their fabrication. This paper outlines a machine vision development using a 3D camera for the inspection of the whole production of percussion caps. This system presents multiple problems, such as metallic reflections in the percussion caps, high speed movement of the system and mechanical errors and irregularities in percussion cap placement. Due to these problems, it is impossible to solve the problem by traditional image processing methods, and hence, machine learning algorithms have been tested to provide a feasible classification of the possible errors present in the percussion caps.

  17. Robust airway extraction based on machine learning and minimum spanning tree

    NASA Astrophysics Data System (ADS)

    Inoue, Tsutomu; Kitamura, Yoshiro; Li, Yuanzhong; Ito, Wataru

    2013-02-01

    Recent advances in MDCT have improved the quality of 3D images. Virtual Bronchoscopy has been used before and during the bronchoscopic examination for the biopsy. However, Virtual Bronchoscopy has become widely used only for the examination of proximal airway diseases. The reason is that conventional airway extraction methods often fail to extract peripheral airways with low image contrast. In this paper, we propose a machine learning based method which can improve the extraction robustness remarkably. The method consists of 4 steps. In the first step, we use Hessian analysis to detect as many airway candidates as possible. In the second, false positives are reduced effectively by introducing a machine learning method. In the third, an airway tree is constructed from the airway candidates by utilizing a minimum spanning tree algorithm. In the fourth, we extract airway regions by using Graph cuts. Experimental results evaluated by a standardized evaluation framework show that our method can extract peripheral airways very well.

  18. Experiments with encapsulation of Monte Carlo simulation results in machine learning models

    NASA Astrophysics Data System (ADS)

    Lal Shrestha, Durga; Kayastha, Nagendra; Solomatine, Dimitri

    2010-05-01

    Uncertainty analysis techniques based on Monte Carlo (MC) simulation have been applied in hydrological sciences successfully in the last decades. They allow for quantification of the model output uncertainty resulting from uncertain model parameters, input data or model structure. They are very flexible, conceptually simple and straightforward, but become impractical in real time applications for complex models when there is little time to perform the uncertainty analysis because of the large number of model runs required. A number of new methods were developed to improve the efficiency of Monte Carlo methods and still these methods require considerable number of model runs in both offline and operational mode to produce reliable and meaningful uncertainty estimation. This paper presents experiments with machine learning techniques used to encapsulate the results of MC runs. A version of MC simulation method, the generalised likelihood uncertain estimation (GLUE) method, is first used to assess the parameter uncertainty of the conceptual rainfall-runoff model HBV. Then the three machines learning methods, namely artificial neural networks, M5 model trees and locally weighted regression methods are trained to encapsulate the uncertainty estimated by the GLUE method using the historical input data. The trained machine learning models are then employed to predict the uncertainty of the model output for the new input data. This method has been applied to two contrasting catchments: the Brue catchment (United Kingdom) and the Bagamati catchment (Nepal). The experimental results demonstrate that the machine learning methods are reasonably accurate in approximating the uncertainty estimated by GLUE. The great advantage of the proposed method is its efficiency to reproduce the MC based simulation results; it can thus be an effective tool to assess the uncertainty of flood forecasting in real time.

  19. Method of fabricating a micro machine

    DOEpatents

    Stalford, Harold L

    2014-11-11

    A micro machine may be in or less than the micrometer domain. The micro machine may include a micro actuator and a micro shaft coupled to the micro actuator. The micro shaft is operable to be driven by the micro actuator. A tool is coupled to the micro shaft and is operable to perform work in response to at least motion of the micro shaft.

  20. Some Principles of Learning and Learning with the Aid of Machines.

    ERIC Educational Resources Information Center

    Dolyatovskii, V. A.; Sotnikov, E. M.

    A translated Soviet document describes some theories of learning, and the practical problems of developing a teaching machine--as taught in an Industrial Electronics course (in the automation and telemechanics curriculum). The point is stressed that the growing number of students at institutions of higher learning in the Soviet Union, up forty…

  1. Dynamic probability estimator for machine learning.

    PubMed

    Starzyk, Janusz A; Wang, Feng

    2004-03-01

    An efficient algorithm for dynamic estimation of probabilities without division on unlimited number of input data is presented. The method estimates probabilities of the sampled data from the raw sample count, while keeping the total count value constant. Accuracy of the estimate depends on the counter size, rather than on the total number of data points. Estimator follows variations of the incoming data probability within a fixed window size, without explicit implementation of the windowing technique. Total design area is very small and all probabilities are estimated concurrently. Dynamic probability estimator was implemented using a programmable gate array from Xilinx. The performance of this implementation is evaluated in terms of the area efficiency and execution time. This method is suitable for the highly integrated design of artificial neural networks where a large number of dynamic probability estimators can work concurrently. PMID:15384523

  2. A Multianalyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping

    PubMed Central

    Yan, Wang; Jiajin, Le; Yun, Zhang

    2014-01-01

    The main challenges that marine heterogeneous data integration faces are the problem of accurate schema mapping between heterogeneous data sources. In order to improve the schema mapping efficiency and get more accurate learning results, this paper proposes a heterogeneous data schema mapping method basing on multianalyzer machine learning model. The multianalyzer analysis the learning results comprehensively, and a fuzzy comprehensive evaluation system is introduced for output results' evaluation and multi factor quantitative judging. Finally, the data mapping comparison experiment on the East China Sea observing data confirms the effectiveness of the model and shows multianalyzer's obvious improvement of mapping error rate. PMID:25250372

  3. A multianalyzer machine learning model for marine heterogeneous data schema mapping.

    PubMed

    Yan, Wang; Jiajin, Le; Yun, Zhang

    2014-01-01

    The main challenges that marine heterogeneous data integration faces are the problem of accurate schema mapping between heterogeneous data sources. In order to improve the schema mapping efficiency and get more accurate learning results, this paper proposes a heterogeneous data schema mapping method basing on multianalyzer machine learning model. The multianalyzer analysis the learning results comprehensively, and a fuzzy comprehensive evaluation system is introduced for output results' evaluation and multi factor quantitative judging. Finally, the data mapping comparison experiment on the East China Sea observing data confirms the effectiveness of the model and shows multianalyzer's obvious improvement of mapping error rate. PMID:25250372

  4. Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.

    PubMed

    Armañanzas, Rubén; Alonso-Nanclares, Lidia; Defelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G; Defelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148

  5. Machine Learning Approach for the Outcome Prediction of Temporal Lobe Epilepsy Surgery

    PubMed Central

    DeFelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G.; DeFelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148

  6. Mining the Galaxy Zoo Database: Machine Learning Applications

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.

    2010-01-01

    The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.

  7. Potential application of machine learning in health outcomes research and some statistical cautions.

    PubMed

    Crown, William H

    2015-03-01

    Traditional analytic methods are often ill-suited to the evolving world of health care big data characterized by massive volume, complexity, and velocity. In particular, methods are needed that can estimate models efficiently using very large datasets containing healthcare utilization data, clinical data, data from personal devices, and many other sources. Although very large, such datasets can also be quite sparse (e.g., device data may only be available for a small subset of individuals), which creates problems for traditional regression models. Many machine learning methods address such limitations effectively but are still subject to the usual sources of bias that commonly arise in observational studies. Researchers using machine learning methods such as lasso or ridge regression should assess these models using conventional specification tests. PMID:25773546

  8. Equivalence between learning in noisy perceptrons and tree committee machines

    NASA Astrophysics Data System (ADS)

    Copelli, Mauro; Kinouchi, Osame; Caticha, Nestor

    1996-06-01

    We study learning from single presentation of examples (on-line learning) in single-layer perceptrons and tree committee machines (TCMs). Lower bounds for the perceptron generalization error as a function of the noise level ɛ in the teacher output are calculated. We find that local learning in a TCM with K hidden units is simply related to learning in a simple perceptron with a corresponding noise level ɛ(K). For a large number of examples and finite K the generalization error decays as α-1CM, where αCM is the number of examples per adjustable weight in the TCM. We also show that on-line learning is possible even in the K-->∞ limit, but with the generalization error decaying as α-1/2CM. The simple Hebb rule can also be applied to the TCM, but now the error decays as α-1/2CM for finite K and α-1/4CM for K-->∞. Exponential decay of the generalization error in both the noisy perceptron learning and in the TCM is obtained by using the learning by queries strategy.

  9. Method for measuring the contour of a machined part

    DOEpatents

    Bieg, L.F.

    1995-05-30

    A method is disclosed for measuring the contour of a machined part with a contour gage apparatus, having a probe assembly including a probe tip for providing a measure of linear displacement of the tip on the surface of the part. The contour gage apparatus may be moved into and out of position for measuring the part while the part is still carried on the machining apparatus. Relative positions between the part and the probe tip may be changed, and a scanning operation is performed on the machined part by sweeping the part with the probe tip, whereby data points representing linear positions of the probe tip at prescribed rotation intervals in the position changes between the part and the probe tip are recorded. The method further allows real-time adjustment of the apparatus machining the part, including real-time adjustment of the machining apparatus in response to wear of the tool that occurs during machining. 5 figs.

  10. Method for measuring the contour of a machined part

    DOEpatents

    Bieg, Lothar F.

    1995-05-30

    A method for measuring the contour of a machined part with a contour gage apparatus, having a probe assembly including a probe tip for providing a measure of linear displacement of the tip on the surface of the part. The contour gage apparatus may be moved into and out of position for measuring the part while the part is still carried on the machining apparatus. Relative positions between the part and the probe tip may be changed, and a scanning operation is performed on the machined part by sweeping the part with the probe tip, whereby data points representing linear positions of the probe tip at prescribed rotation intervals in the position changes between the part and the probe tip are recorded. The method further allows real-time adjustment of the apparatus machining the part, including real-time adjustment of the machining apparatus in response to wear of the tool that occurs during machining.

  11. Methods and apparatus for controlling rotary machines

    DOEpatents

    Bagepalli, Bharat Sampathkumaran; Jansen, Patrick Lee; Barnes, Gary R.; Fric, Thomas Frank; Lyons, James Patrick Francis; Pierce, Kirk Gee; Holley, William Edwin; Barbu, Corneliu

    2009-09-01

    A control system for a rotary machine is provided. The rotary machine has at least one rotating member and at least one substantially stationary member positioned such that a clearance gap is defined between a portion of the rotating member and a portion of the substantially stationary member. The control system includes at least one clearance gap dimension measurement apparatus and at least one clearance gap adjustment assembly. The adjustment assembly is coupled in electronic data communication with the measurement apparatus. The control system is configured to process a clearance gap dimension signal and modulate the clearance gap dimension.

  12. Identifying hosts of families of viruses: a machine learning approach.

    PubMed

    Raj, Anil; Dewar, Michael; Palacios, Gustavo; Rabadan, Raul; Wiggins, Christopher H

    2011-01-01

    Identifying emerging viral pathogens and characterizing their transmission is essential to developing effective public health measures in response to an epidemic. Phylogenetics, though currently the most popular tool used to characterize the likely host of a virus, can be ambiguous when studying species very distant to known species and when there is very little reliable sequence information available in the early stages of the outbreak of disease. Motivated by an existing framework for representing biological sequence information, we learn sparse, tree-structured models, built from decision rules based on subsequences, to predict viral hosts from protein sequence data using popular discriminative machine learning tools. Furthermore, the predictive motifs robustly selected by the learning algorithm are found to show strong host-specificity and occur in highly conserved regions of the viral proteome. PMID:22174744

  13. Predicting outcome in clinically isolated syndrome using machine learning

    PubMed Central

    Wottschel, V.; Alexander, D.C.; Kwok, P.P.; Chard, D.T.; Stromillo, M.L.; De Stefano, N.; Thompson, A.J.; Miller, D.H.; Ciccarelli, O.

    2014-01-01

    We aim to determine if machine learning techniques, such as support vector machines (SVMs), can predict the occurrence of a second clinical attack, which leads to the diagnosis of clinically-definite Multiple Sclerosis (CDMS) in patients with a clinically isolated syndrome (CIS), on the basis of single patient's lesion features and clinical/demographic characteristics. Seventy-four patients at onset of CIS were scanned and clinically reviewed after one and three years. CDMS was used as the gold standard against which SVM classification accuracy was tested. Radiological features related to lesional characteristics on conventional MRI were defined a priori and used in combination with clinical/demographic features in an SVM. Forward recursive feature elimination with 100 bootstraps and a leave-one-out cross-validation was used to find the most predictive feature combinations. 30 % and 44 % of patients developed CDMS within one and three years, respectively. The SVMs correctly predicted the presence (or the absence) of CDMS in 71.4 % of patients (sensitivity/specificity: 77 %/66 %) at 1 year, and in 68 % (60 %/76 %) at 3 years on average over all bootstraps. Combinations of features consistently gave a higher accuracy in predicting outcome than any single feature. Machine-learning-based classifications can be used to provide an “individualised” prediction of conversion to MS from subjects' baseline scans and clinical characteristics, with potential to be incorporated into routine clinical practice. PMID:25610791

  14. Galaxy Image Processing and Morphological Classification Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Kates-Harbeck, Julian

    2012-03-01

    This work uses data from the Sloan Digital Sky Survey (SDSS) and the Galaxy Zoo Project for classification of galaxy morphologies via machine learning. SDSS imaging data together with reliable human classifications from Galaxy Zoo provide the training set and test set for the machine learning architectures. Classification is performed with hand-picked, pre-computed features from SDSS as well as with the raw imaging data from SDSS that was available to humans in the Galaxy Zoo project. With the hand-picked features and a logistic regression classifier, 95.21% classification accuracy and an area under the ROC curve of 0.986 are attained. In the case of the raw imaging data, the images are first processed to remove background noise, image artifacts, and celestial objects other than the galaxy of interest. They are then rotated onto their principle axis of variance to guarantee rotational invariance. The processed images are used to compute color information, up to 4^th order central normalized moments, and radial intensity profiles. These features are used to train a support vector machine with a 3^rd degree polynomial kernel, which achieves a classification accuracy of 95.89% with an ROC area of 0.943.

  15. Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.

    PubMed

    Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J; Bao, Forrest Sheng

    2016-04-01

    13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species. PMID:27092947

  16. Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming

    PubMed Central

    Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J.; Bao, Forrest Sheng

    2016-01-01

    13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species. PMID:27092947

  17. Short-term wind speed predictions with machine learning techniques

    NASA Astrophysics Data System (ADS)

    Ghorbani, M. A.; Khatibi, R.; FazeliFard, M. H.; Naghipour, L.; Makarynskyy, O.

    2016-02-01

    Hourly wind speed forecasting is presented by a modeling study with possible applications to practical problems including farming wind energy, aircraft safety and airport operations. Modeling techniques employed in this paper for such short-term predictions are based on the machine learning techniques of artificial neural networks (ANNs) and genetic expression programming (GEP). Recorded values of wind speed were used, which comprised 8 years of collected data at the Kersey site, Colorado, USA. The January data over the first 7 years (2005-2011) were used for model training; and the January data for 2012 were used for model testing. A number of model structures were investigated for the validation of the robustness of these two techniques. The prediction results were compared with those of a multiple linear regression (MLR) method and with the Persistence method developed for the data. The model performances were evaluated using the correlation coefficient, root mean square error, Nash-Sutcliffe efficiency coefficient and Akaike information criterion. The results indicate that forecasting wind speed is feasible using past records of wind speed alone, but the maximum lead time for the data was found to be 14 h. The results show that different techniques would lead to different results, where the choice between them is not easy. Thus, decision making has to be informed of these modeling results and decisions should be arrived at on the basis of an understanding of inherent uncertainties. The results show that both GEP and ANN are equally credible selections and even MLR should not be dismissed, as it has its uses.

  18. Bearing fault component identification using information gain and machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Vinay, Vakharia; Kumar, Gupta Vijay; Kumar, Kankar Pavan

    2015-04-01

    In the present study an attempt has been made to identify various bearing faults using machine learning algorithm. Vibration signals obtained from faults in inner race, outer race, rolling element and combined faults are considered. Raw vibration signal cannot be used directly since vibration signals are masked by noise. To overcome this difficulty combined time frequency domain method such as wavelet transform is used. Further wavelet selection criteria based on minimum permutation entropy is employed to select most appropriate base wavelet. Statistical features from selected wavelet coefficients are calculated to form feature vector. To reduce size of feature vector information gain attribute selection method is employed. Modified feature set is fed in to machine learning algorithm such as random forest and self-organizing map for getting maximize fault identification efficiency. Results obtained revealed that attribute selection method shows improvement in fault identification accuracy of bearing components.

  19. Machine Learning Approach to Extract Diagnostic and Prognostic Thresholds: Application in Prognosis of Cardiovascular Mortality

    PubMed Central

    Mena, Luis J.; Orozco, Eber E.; Felix, Vanessa G.; Ostos, Rodolfo; Melgarejo, Jesus; Maestre, Gladys E.

    2012-01-01

    Machine learning has become a powerful tool for analysing medical domains, assessing the importance of clinical parameters, and extracting medical knowledge for outcomes research. In this paper, we present a machine learning method for extracting diagnostic and prognostic thresholds, based on a symbolic classification algorithm called REMED. We evaluated the performance of our method by determining new prognostic thresholds for well-known and potential cardiovascular risk factors that are used to support medical decisions in the prognosis of fatal cardiovascular diseases. Our approach predicted 36% of cardiovascular deaths with 80% specificity and 75% general accuracy. The new method provides an innovative approach that might be useful to support decisions about medical diagnoses and prognoses. PMID:22924062

  20. Sensor fusion method for machine performance enhancement

    SciTech Connect

    Mou, J.I.; King, C.; Hillaire, R.; Jones, S.; Furness, R.

    1998-03-01

    A sensor fusion methodology was developed to uniquely integrate pre-process, process-intermittent, and post-process measurement and analysis technology to cost-effectively enhance the accuracy and capability of computer-controlled manufacturing equipment. Empirical models and computational algorithms were also developed to model, assess, and then enhance the machine performance.

  1. Classifying Structures in the ISM with Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, A. A.; Williams, J. P.

    2011-01-01

    The processes which govern molecular cloud evolution and star formation often sculpt structures in the ISM: filaments, pillars, shells, outflows, etc. Because of their morphological complexity, these objects are often identified manually. Manual classification has several disadvantages; the process is subjective, not easily reproducible, and does not scale well to handle increasingly large datasets. We have explored to what extent machine learning algorithms can be trained to autonomously identify specific morphological features in molecular cloud datasets. We show that the Support Vector Machine algorithm can successfully locate filaments and outflows blended with other emission structures. When the objects of interest are morphologically distinct from the surrounding emission, this autonomous classification achieves >90% accuracy. We have developed a set of IDL-based tools to apply this technique to other datasets.

  2. Improved Correction of Atmospheric Pressure Data Obtained by Smartphones through Machine Learning.

    PubMed

    Kim, Yong-Hyuk; Ha, Ji-Hun; Yoon, Yourim; Kim, Na-Young; Im, Hyo-Hyuc; Sim, Sangjin; Choi, Reno K Y

    2016-01-01

    A correction method using machine learning aims to improve the conventional linear regression (LR) based method for correction of atmospheric pressure data obtained by smartphones. The method proposed in this study conducts clustering and regression analysis with time domain classification. Data obtained in Gyeonggi-do, one of the most populous provinces in South Korea surrounding Seoul with the size of 10,000 km(2), from July 2014 through December 2014, using smartphones were classified with respect to time of day (daytime or nighttime) as well as day of the week (weekday or weekend) and the user's mobility, prior to the expectation-maximization (EM) clustering. Subsequently, the results were analyzed for comparison by applying machine learning methods such as multilayer perceptron (MLP) and support vector regression (SVR). The results showed a mean absolute error (MAE) 26% lower on average when regression analysis was performed through EM clustering compared to that obtained without EM clustering. For machine learning methods, the MAE for SVR was around 31% lower for LR and about 19% lower for MLP. It is concluded that pressure data from smartphones are as good as the ones from national automatic weather station (AWS) network. PMID:27524999

  3. Improved Correction of Atmospheric Pressure Data Obtained by Smartphones through Machine Learning

    PubMed Central

    Kim, Yong-Hyuk; Ha, Ji-Hun; Kim, Na-Young; Im, Hyo-Hyuc; Sim, Sangjin; Choi, Reno K. Y.

    2016-01-01

    A correction method using machine learning aims to improve the conventional linear regression (LR) based method for correction of atmospheric pressure data obtained by smartphones. The method proposed in this study conducts clustering and regression analysis with time domain classification. Data obtained in Gyeonggi-do, one of the most populous provinces in South Korea surrounding Seoul with the size of 10,000 km2, from July 2014 through December 2014, using smartphones were classified with respect to time of day (daytime or nighttime) as well as day of the week (weekday or weekend) and the user's mobility, prior to the expectation-maximization (EM) clustering. Subsequently, the results were analyzed for comparison by applying machine learning methods such as multilayer perceptron (MLP) and support vector regression (SVR). The results showed a mean absolute error (MAE) 26% lower on average when regression analysis was performed through EM clustering compared to that obtained without EM clustering. For machine learning methods, the MAE for SVR was around 31% lower for LR and about 19% lower for MLP. It is concluded that pressure data from smartphones are as good as the ones from national automatic weather station (AWS) network. PMID:27524999

  4. A machine learning approach to computer-aided molecular design.

    PubMed

    Bolis, G; Di Pace, L; Fabrocini, F

    1991-12-01

    Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one--the specialization step--the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase--the generalization step--the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process. PMID:1818094

  5. Method and system for fault accommodation of machines

    NASA Technical Reports Server (NTRS)

    Goebel, Kai Frank (Inventor); Subbu, Rajesh Venkat (Inventor); Rausch, Randal Thomas (Inventor); Frederick, Dean Kimball (Inventor)

    2011-01-01

    A method for multi-objective fault accommodation using predictive modeling is disclosed. The method includes using a simulated machine that simulates a faulted actual machine, and using a simulated controller that simulates an actual controller. A multi-objective optimization process is performed, based on specified control settings for the simulated controller and specified operational scenarios for the simulated machine controlled by the simulated controller, to generate a Pareto frontier-based solution space relating performance of the simulated machine to settings of the simulated controller, including adjustment to the operational scenarios to represent a fault condition of the simulated machine. Control settings of the actual controller are adjusted, represented by the simulated controller, for controlling the actual machine, represented by the simulated machine, in response to a fault condition of the actual machine, based on the Pareto frontier-based solution space, to maximize desirable operational conditions and minimize undesirable operational conditions while operating the actual machine in a region of the solution space defined by the Pareto frontier.

  6. TEACHING MACHINES AND PROGRAMMED LEARNING, A SOURCE BOOK.

    ERIC Educational Resources Information Center

    LUMSDAINE, A.A., ED.; GLASER, ROBERT, ED.

    BROUGHT TOGETHER HERE IS THE WIDELY-SCATTERED LITERATURE ON SELF-INSTRUCTIONAL PROGRAMS AND DEVICES BY LEADERS, PAST AND PRESENT, IN THEIR DEVELOPMENT. S.L. PRESSEY IN HIS ARTICLES DESCRIBES THE APPARATUS, METHODS, THEORY, AND RESULTS ATTENDANT UPON USE OF HIS TEST-SCORING DEVICES. B.F. SKINNER IN HIS ARTICLES DEVELOPS THEORY, DESCRIBES MACHINES,…

  7. Machine learning approaches in medical image analysis: From detection to diagnosis.

    PubMed

    de Bruijne, Marleen

    2016-10-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols, learning from weak labels, and interpretation and evaluation of results. PMID:27481324

  8. Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID

    PubMed Central

    Shen, Huibin; Zamboni, Nicola; Heinonen, Markus; Rousu, Juho

    2013-01-01

    Metabolite identification is a major bottleneck in metabolomics due to the number and diversity of the molecules. To alleviate this bottleneck, computational methods and tools that reliably filter the set of candidates are needed for further analysis by human experts. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for developing a new genre of metabolite identification methods that rely on machine learning as the primary vehicle for identification. In this paper we describe the machine learning approach used in FingerID, its application to the CASMI challenges and some results that were not part of our challenge submission. In short, FingerID learns to predict molecular fingerprints from a large collection of MS/MS spectra, and uses the predicted fingerprints to retrieve and rank candidate molecules from a given large molecular database. Furthermore, we introduce a web server for FingerID, which was applied for the first time to the CASMI challenges. The challenge results show that the new machine learning framework produces competitive results on those challenge molecules that were found within the relatively restricted KEGG compound database. Additional experiments on the PubChem database confirm the feasibility of the approach even on a much larger database, although room for improvement still remains. PMID:24958002

  9. Using machine learning for discovery in synoptic survey imaging data

    NASA Astrophysics Data System (ADS)

    Brink, Henrik; Richards, Joseph W.; Poznanski, Dovi; Bloom, Joshua S.; Rice, John; Negahban, Sahand; Wainwright, Martin

    2013-10-01

    Modern time-domain surveys continuously monitor large swaths of the sky to look for astronomical variability. Astrophysical discovery in such data sets is complicated by the fact that detections of real transient and variable sources are highly outnumbered by `bogus' detections caused by imperfect subtractions, atmospheric effects and detector artefacts. In this work, we present a machine-learning (ML) framework for discovery of variability in time-domain imaging surveys. Our ML methods provide probabilistic statements, in near real time, about the degree to which each newly observed source is an astrophysically relevant source of variable brightness. We provide details about each of the analysis steps involved, including compilation of the training and testing sets, construction of descriptive image-based and contextual features, and optimization of the feature subset and model tuning parameters. Using a validation set of nearly 30 000 objects from the Palomar Transient Factory, we demonstrate a missed detection rate of at most 7.7 per cent at our chosen false-positive rate of 1 per cent for an optimized ML classifier of 23 features, selected to avoid feature correlation and overfitting from an initial library of 42 attributes. Importantly, we show that our classification methodology is insensitive to mislabelled training data up to a contamination of nearly 10 per cent, making it easier to compile sufficient training sets for accurate performance in future surveys. This ML framework, if so adopted, should enable the maximization of scientific gain from future synoptic survey and enable fast follow-up decisions on the vast amounts of streaming data produced by such experiments.

  10. Applying machine learning classification techniques to automate sky object cataloguing

    NASA Astrophysics Data System (ADS)

    Fayyad, Usama M.; Doyle, Richard J.; Weir, W. Nick; Djorgovski, Stanislav

    1993-08-01

    We describe the application of an Artificial Intelligence machine learning techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Mt. Palomar Northern Sky Survey is nearly completed. This survey provides comprehensive coverage of the northern celestial hemisphere in the form of photographic plates. The plates are being transformed into digitized images whose quality will probably not be surpassed in the next ten to twenty years. The images are expected to contain on the order of 107 galaxies and 108 stars. Astronomers wish to determine which of these sky objects belong to various classes of galaxies and stars. Unfortunately, the size of this data set precludes analysis in an exclusively manual fashion. Our approach is to develop a software system which integrates the functions of independently developed techniques for image processing and data classification. Digitized sky images are passed through image processing routines to identify sky objects and to extract a set of features for each object. These routines are used to help select a useful set of attributes for classifying sky objects. Then GID3 (Generalized ID3) and O-B Tree, two inductive learning techniques, learns classification decision trees from examples. These classifiers will then be applied to new data. These developmnent process is highly interactive, with astronomer input playing a vital role. Astronomers refine the feature set used to construct sky object descriptions, and evaluate the performance of the automated classification technique on new data. This paper gives an overview of the machine learning techniques with an emphasis on their general applicability, describes the details of our specific application, and reports the initial encouraging results. The results indicate that our machine learning approach is well-suited to the problem. The primary benefit of the approach is increased data reduction throughput. Another benefit is

  11. Merged or monolithic? Using machine-learning to reconstruct the dynamical history of simulated star clusters

    NASA Astrophysics Data System (ADS)

    Pasquato, Mario; Chung, Chul

    2016-05-01

    Context. Machine-learning (ML) solves problems by learning patterns from data with limited or no human guidance. In astronomy, ML is mainly applied to large observational datasets, e.g. for morphological galaxy classification. Aims: We apply ML to gravitational N-body simulations of star clusters that are either formed by merging two progenitors or evolved in isolation, planning to later identify globular clusters (GCs) that may have a history of merging from observational data. Methods: We create mock-observations from simulated GCs, from which we measure a set of parameters (also called features in the machine-learning field). After carrying out dimensionality reduction on the feature space, the resulting datapoints are fed in to various classification algorithms. Using repeated random subsampling validation, we check whether the groups identified by the algorithms correspond to the underlying physical distinction between mergers and monolithically evolved simulations. Results: The three algorithms we considered (C5.0 trees, k-nearest neighbour, and support-vector machines) all achieve a test misclassification rate of about 10% without parameter tuning, with support-vector machines slightly outperforming the others. The first principal component of feature space correlates with cluster concentration. If we exclude it from the regression, the performance of the algorithms is only slightly reduced.

  12. A hybrid genetic algorithm-extreme learning machine approach for accurate significant wave height reconstruction

    NASA Astrophysics Data System (ADS)

    Alexandre, E.; Cuadra, L.; Nieto-Borge, J. C.; Candil-García, G.; del Pino, M.; Salcedo-Sanz, S.

    2015-08-01

    Wave parameters computed from time series measured by buoys (significant wave height Hs, mean wave period, etc.) play a key role in coastal engineering and in the design and operation of wave energy converters. Storms or navigation accidents can make measuring buoys break down, leading to missing data gaps. In this paper we tackle the problem of locally reconstructing Hs at out-of-operation buoys by using wave parameters from nearby buoys, based on the spatial correlation among values at neighboring buoy locations. The novelty of our approach for its potential application to problems in coastal engineering is twofold. On one hand, we propose a genetic algorithm hybridized with an extreme learning machine that selects, among the available wave parameters from the nearby buoys, a subset FnSP with nSP parameters that minimizes the Hs reconstruction error. On the other hand, we evaluate to what extent the selected parameters in subset FnSP are good enough in assisting other machine learning (ML) regressors (extreme learning machines, support vector machines and gaussian process regression) to reconstruct Hs. The results show that all the ML method explored achieve a good Hs reconstruction in the two different locations studied (Caribbean Sea and West Atlantic).

  13. Coordinated machine learning and decision support for situation awareness.

    SciTech Connect

    Draelos, Timothy John; Zhang, Peng-Chu.; Wunsch, Donald C.; Seiffertt, John; Conrad, Gregory N.; Brannon, Nathan Gregory

    2007-09-01

    For applications such as force protection, an effective decision maker needs to maintain an unambiguous grasp of the environment. Opportunities exist to leverage computational mechanisms for the adaptive fusion of diverse information sources. The current research employs neural networks and Markov chains to process information from sources including sensors, weather data, and law enforcement. Furthermore, the system operator's input is used as a point of reference for the machine learning algorithms. More detailed features of the approach are provided, along with an example force protection scenario.

  14. A machine learning classification broker for the LSST transient database

    NASA Astrophysics Data System (ADS)

    Borne, K. D.

    2008-03-01

    We describe the largest data-producing astronomy project in the coming decade - the LSST (Large Synoptic Survey Telescope). The enormous data output, database contents, knowledge discovery, and community science expected from this project will impose massive data challenges on the astronomical research community. One of these challenge areas is the rapid machine learning, data mining, and classification of all novel astronomical events from each 3-gigapixel (6-GB) image obtained every 20 seconds throughout every night for the project duration of 10 years. We describe these challenges and a particular implementation of a classification broker for this data fire hose.

  15. Machine Learning and the Starship - A Match Made in Heaven

    NASA Astrophysics Data System (ADS)

    Galea, P.

    The computer control system of an unmanned interstellar craft must deal with a variety of complex problems. For example, upon reaching the destination star, the computer may need to make assessments of the planets and other objects to prioritize the most `interesting', and assign appropriate probes to each. These decisions would normally be regarded as intelligent if they were made by humans. This paper looks at machine learning technologies currently deployed in non-aerospace contexts, such as book recommendation systems, dating websites and social network analysis, and investigates the ways in which they can be adapted for applications in the starship. This paper is a submission of the Project Icarus Study Group.

  16. Suppression of false arrhythmia alarms in the ICU: a machine learning approach.

    PubMed

    Ansari, Sardar; Belle, Ashwin; Ghanbari, Hamid; Salamango, Mark; Najarian, Kayvan

    2016-08-01

    This paper presents a novel approach for false alarm suppression using machine learning tools. It proposes a multi-modal detection algorithm to find the true beats using the information from all the available waveforms. This method uses a variety of beat detection algorithms, some of which are developed by the authors. The outputs of the beat detection algorithms are combined using a machine learning approach. For the ventricular tachycardia and ventricular fibrillation alarms, separate classification models are trained to distinguish between the normal and abnormal beats. This information, along with alarm-specific criteria, is used to decide if the alarm is false. The results indicate that the presented method was effective in suppressing false alarms when it was tested on a hidden validation dataset. PMID:27454017

  17. Semi-supervised and unsupervised extreme learning machines.

    PubMed

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency. PMID:25415946

  18. Dynamics of the adaptive natural gradient descent method for soft committee machines

    NASA Astrophysics Data System (ADS)

    Inoue, Masato; Park, Hyeyoung; Okada, Masato

    2004-05-01

    Adaptive natural gradient descent (ANGD) method realizes natural gradient descent (NGD) without needing to know the input distribution of learning data and reduces the calculation cost from a cubic order to a square order. However, no performance analysis of ANGD has been done. We have developed a statistical-mechanical theory of the simplified version of ANGD dynamics for soft committee machines in on-line learning; this method provides deterministic learning dynamics expressed through a few order parameters, even though ANGD intrinsically holds a large approximated Fisher information matrix. Numerical results obtained using this theory were consistent with those of a simulation, with respect not only to the learning curve but also to the learning failure. Utilizing this method, we numerically evaluated ANGD efficiency and found that ANGD generally performs as well as NGD. We also revealed the key condition affecting the learning plateau in ANGD.

  19. Classification of hydration status using electrocardiogram and machine learning

    NASA Astrophysics Data System (ADS)

    Kaveh, Anthony; Chung, Wayne

    2013-10-01

    The electrocardiogram (ECG) has been used extensively in clinical practice for decades to non-invasively characterize the health of heart tissue; however, these techniques are limited to time domain features. We propose a machine classification system using support vector machines (SVM) that uses temporal and spectral information to classify health state beyond cardiac arrhythmias. Our method uses single lead ECG to classify volume depletion (or dehydration) without the lengthy and costly blood analysis tests traditionally used for detecting dehydration status. Our method builds on established clinical ECG criteria for identifying electrolyte imbalances and lends to automated, computationally efficient implementation. The method was tested on the MIT-BIH PhysioNet database to validate this purely computational method for expedient disease-state classification. The results show high sensitivity, supporting use as a cost- and time-effective screening tool.

  20. A quantum speedup in machine learning: finding an N-bit Boolean function for a classification

    NASA Astrophysics Data System (ADS)

    Yoo, Seokwon; Bang, Jeongho; Lee, Changhyoup; Lee, Jinhyoung

    2014-10-01

    We compare quantum and classical machines designed for learning an N-bit Boolean function in order to address how a quantum system improves the machine learning behavior. The machines of the two types consist of the same number of operations and control parameters, but only the quantum machines utilize the quantum coherence naturally induced by unitary operators. We show that quantum superposition enables quantum learning that is faster than classical learning by expanding the approximate solution regions, i.e., the acceptable regions. This is also demonstrated by means of numerical simulations with a standard feedback model, namely random search, and a practical model, namely differential evolution.

  1. Rectangular tunnel boring machine and method

    SciTech Connect

    Snyder, L.L.

    1984-12-04

    A machine for boring a tunnel having an end face wall, a roof wall, a bottom wall, and opposite side walls. The machine comprises a rotatable cutting wheel means having an annular peripheral wall supporting a plurality of cutting devices and a generally convex-shaped upper wall supporting a plurality of cutting devices. The cutting wheel means is rotatable about an axis of rotation which is inclined in a forward direction relative to a plane perpendicular to the longitudinal axis of the tunnel for simultaneously cutting the tunnel face along two intersecting surfaces defined by the cutting devices on the annular peripheral wall and the cutting devices on the convex-shape upper wall. Support shoe means are mounted beneath the cutting wheel means for movably supporting the cutting wheel means on the tunnel floor. Drive motor means are mounted on the support shoe means and are operatively associated with the cutting wheel means for causing rotation of the cutting wheel means relative to the tunnel face and the support shoe means. Thrust means are connected to the support shoe means for advancing the cutting wheel means and the support shoe means toward the tunnel face. Gripping means are associated with the thrust means for gripping engagement with the opposite tunnel side walls to prevent axial rearward movement as the cutting wheel means and the support shoe means are advanced toward the tunnel face. Vertical and horizontal steering means for changing the direction of advance of the machine are described. Paddle means and conveyor means for removing rock cuttings from the end face of the tunnel are disclosed. Shield means for shielding workers from dust and debris and for containing the cuttings are also described.

  2. Evaluating data distribution and drift vulnerabilities of machine learning algorithms in secure and adversarial environments

    NASA Astrophysics Data System (ADS)

    Nelson, Kevin; Corbin, George; Blowers, Misty

    2014-05-01

    Machine learning is continuing to gain popularity due to its ability to solve problems that are difficult to model using conventional computer programming logic. Much of the current and past work has focused on algorithm development, data processing, and optimization. Lately, a subset of research has emerged which explores issues related to security. This research is gaining traction as systems employing these methods are being applied to both secure and adversarial environments. One of machine learning's biggest benefits, its data-driven versus logic-driven approach, is also a weakness if the data on which the models rely are corrupted. Adversaries could maliciously influence systems which address drift and data distribution changes using re-training and online learning. Our work is focused on exploring the resilience of various machine learning algorithms to these data-driven attacks. In this paper, we present our initial findings using Monte Carlo simulations, and statistical analysis, to explore the maximal achievable shift to a classification model, as well as the required amount of control over the data.

  3. Machine Learning for Quantum Metrology and Quantum Control

    NASA Astrophysics Data System (ADS)

    Sanders, Barry; Zahedinejad, Ehsan; Palittapongarnpim, Pantita

    Generating quantum metrological procedures and quantum gate designs, subject to constraints such as temporal or particle-number bounds or limits on the number of control parameters, are typically hard computationally. Although greedy machine learning algorithms are ubiquitous for tackling these problems, the severe constraints listed above limit the efficacy of such approaches. Our aim is to devise heuristic machine learning techniques to generate tractable procedures for adaptive quantum metrology and quantum gate design. In particular we have modified differential evolution to generate adaptive interferometric-phase quantum metrology procedures for up to 100 photons including loss and noise, and we have generated policies for designing single-shot high-fidelity three-qubit gates in superconducting circuits by avoided level crossings. Although quantum metrology and quantum control are regarded as disparate, we have developed a unified framework for these two subjects, and this unification enables us to transfer insights and breakthroughs from one of the topics to the other. Thanks to NSERC, AITF and 1000 Talent Plan.

  4. Analyzing angle crashes at unsignalized intersections using machine learning techniques.

    PubMed

    Abdel-Aty, Mohamed; Haleem, Kirolos

    2011-01-01

    A recently developed machine learning technique, multivariate adaptive regression splines (MARS), is introduced in this study to predict vehicles' angle crashes. MARS has a promising prediction power, and does not suffer from interpretation complexity. Negative Binomial (NB) and MARS models were fitted and compared using extensive data collected on unsignalized intersections in Florida. Two models were estimated for angle crash frequency at 3- and 4-legged unsignalized intersections. Treating crash frequency as a continuous response variable for fitting a MARS model was also examined by considering the natural logarithm of the crash frequency. Finally, combining MARS with another machine learning technique (random forest) was explored and discussed. The fitted NB angle crash models showed several significant factors that contribute to angle crash occurrence at unsignalized intersections such as, traffic volume on the major road, the upstream distance to the nearest signalized intersection, the distance between successive unsignalized intersections, median type on the major approach, percentage of trucks on the major approach, size of the intersection and the geographic location within the state. Based on the mean square prediction error (MSPE) assessment criterion, MARS outperformed the corresponding NB models. Also, using MARS for predicting continuous response variables yielded more favorable results than predicting discrete response variables. The generated MARS models showed the most promising results after screening the covariates using random forest. Based on the results of this study, MARS is recommended as an efficient technique for predicting crashes at unsignalized intersections (angle crashes in this study). PMID:21094345

  5. Effective and efficient optics inspection approach using machine learning algorithms

    SciTech Connect

    Abdulla, G; Kegelmeyer, L; Liao, Z; Carr, W

    2010-11-02

    The Final Optics Damage Inspection (FODI) system automatically acquires and utilizes the Optics Inspection (OI) system to analyze images of the final optics at the National Ignition Facility (NIF). During each inspection cycle up to 1000 images acquired by FODI are examined by OI to identify and track damage sites on the optics. The process of tracking growing damage sites on the surface of an optic can be made more effective by identifying and removing signals associated with debris or reflections. The manual process to filter these false sites is daunting and time consuming. In this paper we discuss the use of machine learning tools and data mining techniques to help with this task. We describe the process to prepare a data set that can be used for training and identifying hardware reflections in the image data. In order to collect training data, the images are first automatically acquired and analyzed with existing software and then relevant features such as spatial, physical and luminosity measures are extracted for each site. A subset of these sites is 'truthed' or manually assigned a class to create training data. A supervised classification algorithm is used to test if the features can predict the class membership of new sites. A suite of self-configuring machine learning tools called 'Avatar Tools' is applied to classify all sites. To verify, we used 10-fold cross correlation and found the accuracy was above 99%. This substantially reduces the number of false alarms that would otherwise be sent for more extensive investigation.

  6. Effective and efficient optics inspection approach using machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Abdulla, Ghaleb M.; Kegelmeyer, Laura Mascio; Liao, Zhi M.; Carr, Wren

    2010-11-01

    The Final Optics Damage Inspection (FODI) system automatically acquires and utilizes the Optics Inspection (OI) system to analyze images of the final optics at the National Ignition Facility (NIF). During each inspection cycle up to 1000 images acquired by FODI are examined by OI to identify and track damage sites on the optics. The process of tracking growing damage sites on the surface of an optic can be made more effective by identifying and removing signals associated with debris or reflections. The manual process to filter these false sites is daunting and time consuming. In this paper we discuss the use of machine learning tools and data mining techniques to help with this task. We describe the process to prepare a data set that can be used for training and identifying hardware reflections in the image data. In order to collect training data, the images are first automatically acquired and analyzed with existing software and then relevant features such as spatial, physical and luminosity measures are extracted for each site. A subset of these sites is "truthed" or manually assigned a class to create training data. A supervised classification algorithm is used to test if the features can predict the class membership of new sites. A suite of self-configuring machine learning tools called "Avatar Tools" is applied to classify all sites. To verify, we used 10-fold cross correlation and found the accuracy was above 99%. This substantially reduces the number of false alarms that would otherwise be sent for more extensive investigation.

  7. Calibrating Building Energy Models Using Supercomputer Trained Machine Learning Agents

    SciTech Connect

    Sanyal, Jibonananda; New, Joshua Ryan; Edwards, Richard; Parker, Lynne Edwards

    2014-01-01

    Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrofit purposes. EnergyPlus is the flagship Department of Energy software that performs BEM for different types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manually by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building energy modeling unfeasible for smaller projects. In this paper, we describe the Autotune research which employs machine learning algorithms to generate agents for the different kinds of standard reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of EnergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-effective calibration of building models.

  8. Prediction of brain tumor progression using a machine learning technique

    NASA Astrophysics Data System (ADS)

    Shen, Yuzhong; Banerjee, Debrup; Li, Jiang; Chandler, Adam; Shen, Yufei; McKenzie, Frederic D.; Wang, Jihong

    2010-03-01

    A machine learning technique is presented for assessing brain tumor progression by exploring six patients' complete MRI records scanned during their visits in the past two years. There are ten MRI series, including diffusion tensor image (DTI), for each visit. After registering all series to the corresponding DTI scan at the first visit, annotated normal and tumor regions were overlaid. Intensity value of each pixel inside the annotated regions were then extracted across all of the ten MRI series to compose a 10 dimensional vector. Each feature vector falls into one of three categories:normal, tumor, and normal but progressed to tumor at a later time. In this preliminary study, we focused on the trend of brain tumor progression during three consecutive visits, i.e., visit A, B, and C. A machine learning algorithm was trained using the data containing information from visit A to visit B, and the trained model was used to predict tumor progression from visit A to visit C. Preliminary results showed that prediction for brain tumor progression is feasible. An average of 80.9% pixel-wise accuracy was achieved for tumor progression prediction at visit C.

  9. Predicting submicron air pollution indicators: a machine learning approach.

    PubMed

    Pandey, Gaurav; Zhang, Bin; Jian, Le

    2013-05-01

    The regulation of air pollutant levels is rapidly becoming one of the most important tasks for the governments of developing countries, especially China. Submicron particles, such as ultrafine particles (UFP, aerodynamic diameter ≤ 100 nm) and particulate matter ≤ 1.0 micrometers (PM1.0), are an unregulated emerging health threat to humans, but the relationships between the concentration of these particles and meteorological and traffic factors are poorly understood. To shed some light on these connections, we employed a range of machine learning techniques to predict UFP and PM1.0 levels based on a dataset consisting of observations of weather and traffic variables recorded at a busy roadside in Hangzhou, China. Based upon the thorough examination of over twenty five classifiers used for this task, we find that it is possible to predict PM1.0 and UFP levels reasonably accurately and that tree-based classification models (Alternating Decision Tree and Random Forests) perform the best for both these particles. In addition, weather variables show a stronger relationship with PM1.0 and UFP levels, and thus cannot be ignored for predicting submicron particle levels. Overall, this study has demonstrated the potential application value of systematically collecting and analysing datasets using machine learning techniques for the prediction of submicron sized ambient air pollutants. PMID:23535697

  10. Edge detection in grayscale imagery using machine learning

    SciTech Connect

    Glocer, K. A.; Perkins, S. J.

    2004-01-01

    Edge detection can be formulated as a binary classification problem at the pixel level with the goal of identifying individual pixels as either on-edge or off-edge. To solve this classification problem we use both fixed and adaptive feature selection in conjunction with a support vector machine. This approach provides a direct data-driven solution and does not require the intermediate step of learning a distribution to perform a likelihood-based classification. Furthermore, the approach can readily be adapted for other image processing tasks. The algorithm was tested on a data set of 50 object images, each associated with a hand-drawn 'ground truth' image. We computed ROC curves to evaluate the performance of the general feature extraction and machine learning approach, and compared that to the standard Canny edge detector and with recent work on statistical edge detection. Using a direct pixel-by-pixel error metric enabled us to compare to the statistical edge detection approach, and our algorithm compared favorably. Using a more 'natural' metric enabled comparision with work by the authors of the image data set, and our algorithm performed comparably to the suite of state-of-art edge detectors in that study.

  11. Machine Learning Estimates of Natural Product Conformational Energies

    PubMed Central

    Rupp, Matthias; Bauer, Matthias R.; Wilcken, Rainer; Lange, Andreas; Reutlinger, Michael; Boeckler, Frank M.; Schneider, Gisbert

    2014-01-01

    Machine learning has been used for estimation of potential energy surfaces to speed up molecular dynamics simulations of small systems. We demonstrate that this approach is feasible for significantly larger, structurally complex molecules, taking the natural product Archazolid A, a potent inhibitor of vacuolar-type ATPase, from the myxobacterium Archangium gephyra as an example. Our model estimates energies of new conformations by exploiting information from previous calculations via Gaussian process regression. Predictive variance is used to assess whether a conformation is in the interpolation region, allowing a controlled trade-off between prediction accuracy and computational speed-up. For energies of relaxed conformations at the density functional level of theory (implicit solvent, DFT/BLYP-disp3/def2-TZVP), mean absolute errors of less than 1 kcal/mol were achieved. The study demonstrates that predictive machine learning models can be developed for structurally complex, pharmaceutically relevant compounds, potentially enabling considerable speed-ups in simulations of larger molecular structures. PMID:24453952

  12. Parsimonious kernel extreme learning machine in primal via Cholesky factorization.

    PubMed

    Zhao, Yong-Ping

    2016-08-01

    Recently, extreme learning machine (ELM) has become a popular topic in machine learning community. By replacing the so-called ELM feature mappings with the nonlinear mappings induced by kernel functions, two kernel ELMs, i.e., P-KELM and D-KELM, are obtained from primal and dual perspectives, respectively. Unfortunately, both P-KELM and D-KELM possess the dense solutions in direct proportion to the number of training data. To this end, a constructive algorithm for P-KELM (CCP-KELM) is first proposed by virtue of Cholesky factorization, in which the training data incurring the largest reductions on the objective function are recruited as significant vectors. To reduce its training cost further, PCCP-KELM is then obtained with the application of a probabilistic speedup scheme into CCP-KELM. Corresponding to CCP-KELM, a destructive P-KELM (CDP-KELM) is presented using a partial Cholesky factorization strategy, where the training data incurring the smallest reductions on the objective function after their removals are pruned from the current set of significant vectors. Finally, to verify the efficacy and feasibility of the proposed algorithms in this paper, experiments on both small and large benchmark data sets are investigated. PMID:27203553

  13. Machine Learning for Knowledge Extraction from PHR Big Data.

    PubMed

    Poulymenopoulou, Michaela; Malamateniou, Flora; Vassilacopoulos, George

    2014-01-01

    Cloud computing, Internet of things (IOT) and NoSQL database technologies can support a new generation of cloud-based PHR services that contain heterogeneous (unstructured, semi-structured and structured) patient data (health, social and lifestyle) from various sources, including automatically transmitted data from Internet connected devices of patient living space (e.g. medical devices connected to patients at home care). The patient data stored in such PHR systems constitute big data whose analysis with the use of appropriate machine learning algorithms is expected to improve diagnosis and treatment accuracy, to cut healthcare costs and, hence, to improve the overall quality and efficiency of healthcare provided. This paper describes a health data analytics engine which uses machine learning algorithms for analyzing cloud based PHR big health data towards knowledge extraction to support better healthcare delivery as regards disease diagnosis and prognosis. This engine comprises of the data preparation, the model generation and the data analysis modules and runs on the cloud taking advantage from the map/reduce paradigm provided by Apache Hadoop. PMID:25000009

  14. Gaussian Process Regression as a machine learning tool for predicting organic carbon from soil spectra - a machine learning comparison study

    NASA Astrophysics Data System (ADS)

    Schmidt, Andreas; Lausch, Angela; Vogel, Hans-Jörg

    2016-04-01

    Diffuse reflectance spectroscopy as a soil analytical tool is spreading more and more. There is a wide range of possible applications ranging from the point scale (e.g. simple soil samples, drill cores, vertical profile scans) through the field scale to the regional and even global scale (UAV, airborne and space borne instruments, soil reflectance databases). The basic idea is that the soil's reflectance spectrum holds information about its properties (like organic matter content or mineral composition). The relation between soil properties and the observable spectrum is usually not exactly know and is typically derived from statistical methods. Nowadays these methods are classified in the term machine learning, which comprises a vast pool of algorithms and methods for learning the relationship between pairs if input - output data (training data set). Within this pool of methods a Gaussian Process Regression (GPR) is newly emerging method (originating from Bayesian statistics) which is increasingly applied to applications in different fields. For example, it was successfully used to predict vegetation parameters from hyperspectral remote sensing data. In this study we apply GPR to predict soil organic carbon from soil spectroscopy data (400 - 2500 nm). We compare it to more traditional and widely used methods such as Partitial Least Squares Regression (PLSR), Random Forest (RF) and Gradient Boosted Regression Trees (GBRT). All these methods have the common ability to calculate a measure for the variable importance (wavelengths importance). The main advantage of GPR is its ability to also predict the variance of the target parameter. This makes it easy to see whether a prediction is reliable or not. The ability to choose from various covariance functions makes GPR a flexible method. This allows for including different assumptions or a priori knowledge about the data. For this study we use samples from three different locations to test the prediction accuracies. One

  15. Bias correction for selecting the minimal-error classifier from many machine learning models

    PubMed Central

    Ding, Ying; Tang, Shaowu; Liao, Serena G.; Jia, Jia; Oesterreich, Steffi; Lin, Yan; Tseng, George C.

    2014-01-01

    Motivation: Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in general. It has been a common practice to apply many machine learning methods and report the method that produces the smallest cross-validation error rate. Theoretically, such a procedure produces a selection bias. Consequently, many clinical studies with moderate sample sizes (e.g. n = 30–60) risk reporting a falsely small cross-validation error rate that could not be validated later in independent cohorts. Results: In this article, we illustrated the probabilistic framework of the problem and explored the statistical and asymptotic properties. We proposed a new bias correction method based on learning curve fitting by inverse power law (IPL) and compared it with three existing methods: nested cross-validation, weighted mean correction and Tibshirani-Tibshirani procedure. All methods were compared in simulation datasets, five moderate size real datasets and two large breast cancer datasets. The result showed that IPL outperforms the other methods in bias correction with smaller variance, and it has an additional advantage to extrapolate error estimates for larger sample sizes, a practical feature to recommend whether more samples should be recruited to improve the classifier and accuracy. An R package ‘MLbias’ and all source files are publicly available. Availability and implementation: tsenglab.biostat.pitt.edu/software.htm. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25086004

  16. Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images

    PubMed Central

    Nunez-Iglesias, Juan; Kennedy, Ryan; Parag, Toufiq; Shi, Jianbo; Chklovskii, Dmitri B.

    2013-01-01

    We aim to improve segmentation through the use of machine learning tools during region agglomeration. We propose an active learning approach for performing hierarchical agglomerative segmentation from superpixels. Our method combines multiple features at all scales of the agglomerative process, works for data with an arbitrary number of dimensions, and scales to very large datasets. We advocate the use of variation of information to measure segmentation accuracy, particularly in 3D electron microscopy (EM) images of neural tissue, and using this metric demonstrate an improvement over competing algorithms in EM and natural images. PMID:23977123

  17. Generalization Evaluation of Machine Learning Numerical Observers for Image Quality Assessment.

    PubMed

    Kalayeh, Mahdi M; Marin, Thibault; Brankov, Jovan G

    2013-06-01

    In this paper, we present two new numerical observers (NO) based on machine learning for image quality assessment. The proposed NOs aim to predict human observer performance in a cardiac perfusion-defect detection task for single-photon emission computed tomography (SPECT) images. Human observer (HumO) studies are now considered to be the gold standard for task-based evaluation of medical images. However such studies are impractical for use in early stages of development for imaging devices and algorithms, because they require extensive involvement of trained human observers who must evaluate a large number of images. To address this problem, numerical observers (also called model observers) have been developed as a surrogate for human observers. The channelized Hotelling observer (CHO), with or without internal noise model, is currently the most widely used NO of this kind. In our previous work we argued that development of a NO model to predict human observers' performance can be viewed as a machine learning (or system identification) problem. This consideration led us to develop a channelized support vector machine (CSVM) observer, a kernel-based regression model that greatly outperformed the popular and widely used CHO. This was especially evident when the numerical observers were evaluated in terms of generalization performance. To evaluate generalization we used a typical situation for the practical use of a numerical observer: after optimizing the NO (which for a CHO might consist of adjusting the internal noise model) based upon a broad set of reconstructed images, we tested it on a broad (but different) set of images obtained by a different reconstruction method. In this manuscript we aim to evaluate two new regression models that achieve accuracy higher than the CHO and comparable to our earlier CSVM method, while dramatically reducing model complexity and computation time. The new models are defined in a Bayesian machine-learning framework: a channelized

  18. SIRT3 substrate specificity determined by peptide arrays and machine learning.

    PubMed

    Smith, Brian C; Settles, Burr; Hallows, William C; Craven, Mark W; Denu, John M

    2011-02-18

    Accumulating evidence suggests that reversible protein acetylation may be a major regulatory mechanism that rivals phosphorylation. With the recent cataloging of thousands of acetylation sites on hundreds of proteins comes the challenge of identifying the acetyltransferases and deacetylases that regulate acetylation levels. Sirtuins are a conserved family of NAD(+)-dependent protein deacetylases that are implicated in genome maintenance, metabolism, cell survival, and lifespan. SIRT3 is the dominant protein deacetylase in mitochondria, and emerging evidence suggests that SIRT3 may control major pathways by deacetylation of central metabolic enzymes. Here, to identify potential SIRT3 substrates, we have developed an unbiased screening strategy that involves a novel acetyl-lysine analogue (thiotrifluoroacetyl-lysine), SPOT-peptide libraries, machine learning, and kinetic validation. SPOT peptide libraries based on known and potential mitochondrial acetyl-lysine sites were screened for SIRT3 binding and then analyzed using machine learning to establish binding trends. These trends were then applied to the mitochondrial proteome as a whole to predict binding affinity of all lysine sites within human mitochondria. Machine learning prediction of SIRT3 binding correlated with steady-state kinetic k(cat)/K(m) values for 24 acetyl-lysine peptides that possessed a broad range of predicted binding. Thus, SPOT peptide-binding screens and machine learning prediction provides an accurate and efficient method to evaluate sirtuin substrate specificity from a relatively small learning set. These analyses suggest potential SIRT3 substrates involved in several metabolic pathways such as the urea cycle, ATP synthesis, and fatty acid oxidation. PMID:20945913

  19. Retrieval of Similar Objects in Simulation Data Using Machine Learning Techniques

    SciTech Connect

    Cantu-Paz, E; Cheung, S-C; Kamath, C

    2003-06-19

    Comparing the output of a physics simulation with an experiment is often done by visually comparing the two outputs. In order to determine which simulation is a closer match to the experiment, more quantitative measures are needed. This paper describes our early experiences with this problem by considering the slightly simpler problem of finding objects in a image that are similar to a given query object. Focusing on a dataset from a fluid mixing problem, we report on our experiments using classification techniques from machine learning to retrieve the objects of interest in the simulation data. The early results reported in this paper suggest that machine learning techniques can retrieve more objects that are similar to the query than distance-based similarity methods.

  20. Controlling misses and false alarms in a machine learning framework for predicting uniformity of printed pages

    NASA Astrophysics Data System (ADS)

    Nguyen, Minh Q.; Allebach, Jan P.

    2015-01-01

    alarms" are not nearly as catastrophic as "misses", which represent potentially serious problems that are never seen by the systems developers. This scenario motivates us to develop a machine learning framework that will achieve the minimum "false alarm" rate subject to a specified "miss" rate. In order to construct such a set of receiver operating characteristic2 (ROC) curves, we examine various tools for the prediction, ranging from an exhaustive search over the space of the nonlinear discriminants to a Cost-Sentitive SVM3 framework. We then compare the curves gained from those methods. Our work shows promise for applying a standard framework to obtain a full ROC curve when it comes to tackling other machine learning problems in industry.

  1. Machine Learning Assisted Design of Highly Active Peptides for Drug Discovery

    PubMed Central

    Giguère, Sébastien; Laviolette, François; Marchand, Mario; Tremblay, Denise; Moineau, Sylvain; Liang, Xinxia; Biron, Éric; Corbeil, Jacques

    2015-01-01

    The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/. PMID:25849257

  2. Application of machine learning using support vector machines for crater detection from Martian digital topography data

    NASA Astrophysics Data System (ADS)

    Salamunićcar, Goran; Lončarić, Sven

    In our previous work, in order to extend the GT-57633 catalogue [PSS, 56 (15), 1992-2008] with still uncatalogued impact-craters, the following has been done [GRS, 48 (5), in press, doi:10.1109/TGRS.2009.2037750]: (1) the crater detection algorithm (CDA) based on digital elevation model (DEM) was developed; (2) using 1/128° MOLA data, this CDA proposed 414631 crater-candidates; (3) each crater-candidate was analyzed manually; and (4) 57592 were confirmed as correct detections. The resulting GT-115225 catalog is the significant result of this effort. However, to check such a large number of crater-candidates manually was a demanding task. This was the main motivation for work on improvement of the CDA in order to provide better classification of craters as true and false detections. To achieve this, we extended the CDA with the machine learning capability, using support vector machines (SVM). In the first step, the CDA (re)calculates numerous terrain morphometric attributes from DEM. For this purpose, already existing modules of the CDA from our previous work were reused in order to be capable to prepare these attributes. In addition, new attributes were introduced such as ellipse eccentricity and tilt. For machine learning purpose, the CDA is additionally extended to provide 2-D topography-profile and 3-D shape for each crater-candidate. The latter two are a performance problem because of the large number of crater-candidates in combination with the large number of attributes. As a solution, we developed a CDA architecture wherein it is possible to combine the SVM with a radial basis function (RBF) or any other kernel (for initial set of attributes), with the SVM with linear kernel (for the cases when 2-D and 3-D data are included as well). Another challenge is that, in addition to diversity of possible crater types, there are numerous morphological differences between the smallest (mostly very circular bowl-shaped craters) and the largest (multi-ring) impact

  3. Inductive machine learning for improved estimation of catchment-scale snow water equivalent

    NASA Astrophysics Data System (ADS)

    Buckingham, David; Skalka, Christian; Bongard, Josh

    2015-05-01

    Infrastructure for the automatic collection of single-point measurements of snow water equivalent (SWE) is well-established. However, because SWE varies significantly over space, the estimation of SWE at the catchment scale based on a single-point measurement is error-prone. We propose low-cost, lightweight methods for near-real-time estimation of mean catchment-wide SWE using existing infrastructure, wireless sensor networks, and machine learning algorithms. Because snowpack distribution is highly nonlinear, we focus on Genetic Programming (GP), a nonlinear, white-box, inductive machine learning algorithm. Because we did not have access to near-real-time catchment-scale SWE data, we used available data as ground truth for machine learning in a set of experiments that are successive approximations of our goal of catchment-wide SWE estimation. First, we used a history of maritime snowpack data collected by manual snow courses. Second, we used distributed snow depth (HS) data collected automatically by wireless sensor networks. We compared the performance of GP against linear regression (LR), binary regression trees (BT), and a widely used basic method (BM) that naively assumes non-variable snowpack. In the first experiment set, GP and LR models predicted SWE with lower error than BM. In the second experiment set, GP had lower error than LR, but outperformed BT only when we applied a technique that specifically mitigated the possibility of over-fitting.

  4. Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework.

    PubMed

    Liu, Wenbo; Li, Ming; Yi, Li

    2016-08-01

    The atypical face scanning patterns in individuals with Autism Spectrum Disorder (ASD) has been repeatedly discovered by previous research. The present study examined whether their face scanning patterns could be potentially useful to identify children with ASD by adopting the machine learning algorithm for the classification purpose. Particularly, we applied the machine learning method to analyze an eye movement dataset from a face recognition task [Yi et al., 2016], to classify children with and without ASD. We evaluated the performance of our model in terms of its accuracy, sensitivity, and specificity of classifying ASD. Results indicated promising evidence for applying the machine learning algorithm based on the face scanning patterns to identify children with ASD, with a maximum classification accuracy of 88.51%. Nevertheless, our study is still preliminary with some constraints that may apply in the clinical practice. Future research should shed light on further valuation of our method and contribute to the development of a multitask and multimodel approach to aid the process of early detection and diagnosis of ASD. Autism Res 2016, 9: 888-898. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. PMID:27037971

  5. Advances in Patient Classification for Traditional Chinese Medicine: A Machine Learning Perspective

    PubMed Central

    Zhao, Changbo; Li, Guo-Zheng; Wang, Chengjun; Niu, Jinling

    2015-01-01

    As a complementary and alternative medicine in medical field, traditional Chinese medicine (TCM) has drawn great attention in the domestic field and overseas. In practice, TCM provides a quite distinct methodology to patient diagnosis and treatment compared to western medicine (WM). Syndrome (ZHENG or pattern) is differentiated by a set of symptoms and signs examined from an individual by four main diagnostic methods: inspection, auscultation and olfaction, interrogation, and palpation which reflects the pathological and physiological changes of disease occurrence and development. Patient classification is to divide patients into several classes based on different criteria. In this paper, from the machine learning perspective, a survey on patient classification issue will be summarized on three major aspects of TCM: sign classification, syndrome differentiation, and disease classification. With the consideration of different diagnostic data analyzed by different computational methods, we present the overview for four subfields of TCM diagnosis, respectively. For each subfield, we design a rectangular reference list with applications in the horizontal direction and machine learning algorithms in the longitudinal direction. According to the current development of objective TCM diagnosis for patient classification, a discussion of the research issues around machine learning techniques with applications to TCM diagnosis is given to facilitate the further research for TCM patient classification. PMID:26246834

  6. Application of neural networks and other machine learning algorithms to DNA sequence analysis

    SciTech Connect

    Lapedes, A.; Barnes, C.; Burks, C.; Farber, R.; Sirotkin, K.

    1988-01-01

    In this article we report initial, quantitative results on application of simple neutral networks, and simple machine learning methods, to two problems in DNA sequence analysis. The two problems we consider are: (1) determination of whether procaryotic and eucaryotic DNA sequences segments are translated to protein. An accuracy of 99.4% is reported for procaryotic DNA (E. coli) and 98.4% for eucaryotic DNA (H. Sapiens genes known to be expressed in liver); (2) determination of whether eucaryotic DNA sequence segments containing the dinucleotides ''AG'' or ''GT'' are transcribed to RNA splice junctions. Accuracy of 91.2% was achieved on intron/exon splice junctions (acceptor sites) and 92.8% on exon/intron splice junctions (donor sites). The solution of these two problems, by use of information processing algorithms operating on unannotated base sequences and without recourse to biological laboratory work, is relevant to the Human Genome Project. A variety of neural network, machine learning, and information theoretic algorithms are used. The accuracies obtained exceed those of previous investigations for which quantitative results are available in the literature. They result from an ongoing program of research that applies machine learning algorithms to the problem of determining biological function of DNA sequences. Some predictions of possible new genes using these methods are listed -- although a complete survey of the H. sapiens and E. coli sections of GenBank will be given elsewhere. 36 refs., 6 figs., 6 tabs.

  7. Enhancing nanoscale SEM image segmentation and reconstruction with crystallographic orientation data and machine learning

    SciTech Connect

    Converse, Matthew I. Fullwood, David T.

    2013-09-15

    Current methods of image segmentation and reconstructions from scanning electron micrographs can be inadequate for resolving nanoscale gaps in composite materials (1–20 nm). Such information is critical to both accurate material characterizations and models of piezoresistive response. The current work proposes the use of crystallographic orientation data and machine learning for enhancing this process. It is first shown how a machine learning algorithm can be used to predict the connectivity of nanoscale grains in a Nickel nanostrand/epoxy composite. This results in 71.9% accuracy for a 2D algorithm and 62.4% accuracy in 3D. Finally, it is demonstrated how these algorithms can be used to predict the location of gaps between distinct nanostrands — gaps which would otherwise not be detected with the sole use of a scanning electron microscope. - Highlights: • A method is proposed for enhancing the segmentation/reconstruction of SEM images. • 3D crystallographic orientation data from a nickel nanocomposite is collected. • A machine learning algorithm is used to detect trends in adjacent grains. • This algorithm is then applied to predict likely regions of nanoscale gaps. • These gaps would otherwise be unresolved with the sole use of an SEM.

  8. Axial flux machine, stator and fabrication method

    DOEpatents

    Carl, Ralph James

    2004-03-16

    An axial flux machine comprises: a soft magnetic composite stator extension positioned in parallel with a rotor disk and having slots; soft magnetic composite pole pieces attached to the stator extension and facing a permanent magnet on the rotor disk, each comprising a protrusion situated within a respective one of the slots, each protrusion shaped so as to facilitate orientation of the respective pole piece with respect to the stator extension; electrical coils, each wrapped around a respective one of the pole pieces. In another embodiment the soft magnetic composite pole pieces each comprise a base portion around with the electrical coils are wound and a trapezoidal shield portion a plurality of heights with a first height in a first region being longer than a second height in a second region, the second region being closer to a pole-to-pole gap than the first region.

  9. Analysis of Pollution Patterns Using Unsupervised Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Kanevski, M.; Timonin, V.; Pozdnoukhov, A.; Maignan, M.

    2009-04-01

    The research presents an application of Machine Learning Algorithms, mainly unsupervised learning techniques like self-organising Kohonen maps (SOM), to study spatial patterns of multivariate environmental spatial data. SOM are well-known neural networks widely used for high-dimensional data analysis, modelling (clustering and classification), and visualization. Self-organising maps belong to the unsupervised machine learning algorithms providing solutions to clustering, classification or density modelling problems using unlabeled data. SOM are efficiently used for the dimensionality reduction and for the visualisation of high-dimensional data (projection into a two-dimensional space). Unlabeled data are points/vectors in a high-dimensional feature space that have some attributes (or coordinates) but have no target values, neither continuous (as in a regression problem) nor discrete labels (as in the case of classification problem). The main task of SOM is to "group" or to "range" in some manner these input vectors and to try to catch regularities (to find patterns) in data by preserving topological structure and by using some well defined similarity measures. A generic methodology presented in this study consists of detailed spatial exploratory data analysis using statistical and geostatistical tools, analysis and modelling of spatial (cross)-correlations anisotropic structures, and application of SOM as a nonlinear modelling and visualisation tool. The case study considers multivariate data of sediments contamination by heavy metals (eight spatially distributes pollutants) in Geneva Lake. The most important modelling task is formulated as a problem of revealing structures or coherent clusters in this multivariate data set that would shed some light on the underlying phenomena of the contamination. Three major clusters, clearly spatially separated, were detected and explained by using the SOM technique.

  10. Using statistical and machine learning to help institutions detect suspicious access to electronic health records

    PubMed Central

    Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    Objective To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. Methods From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. Results The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. Limitations The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. Conclusion The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs. PMID:21672912

  11. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    PubMed Central

    2011-01-01

    Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF. PMID:22369383

  12. Mortality risk score prediction in an elderly population using machine learning.

    PubMed

    Rose, Sherri

    2013-03-01

    Standard practice for prediction often relies on parametric regression methods. Interesting new methods from the machine learning literature have been introduced in epidemiologic studies, such as random forest and neural networks. However, a priori, an investigator will not know which algorithm to select and may wish to try several. Here I apply the super learner, an ensembling machine learning approach that combines multiple algorithms into a single algorithm and returns a prediction function with the best cross-validated mean squared error. Super learning is a generalization of stacking methods. I used super learning in the Study of Physical Performance and Age-Related Changes in Sonomans (SPPARCS) to predict death among 2,066 residents of Sonoma, California, aged 54 years or more during the period 1993-1999. The super learner for predicting death (risk score) improved upon all single algorithms in the collection of algorithms, although its performance was similar to that of several algorithms. Super learner outperformed the worst algorithm (neural networks) by 44% with respect to estimated cross-validated mean squared error and had an R2 value of 0.201. The improvement of super learner over random forest with respect to R2 was approximately 2-fold. Alternatives for risk score prediction include the super learner, which can provide improved performance. PMID:23364879

  13. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    PubMed Central

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-01-01

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further, a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. While this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well. PMID:26876223

  14. Automatic programming of binary morphological machines by PAC learning

    NASA Astrophysics Data System (ADS)

    Barrera, Junior; Tomita, Nina S.; Correa da Silva, Flavio S.; Terada, Routo

    1995-08-01

    Binary image analysis problems can be solved by set operators implemented as programs for a binary morphological machine (BMM). This is a very general and powerful approach to solve this type of problem. However, the design of these programs is not a task manageable by nonexperts on mathematical morphology. In order to overcome this difficulty we have worked on tools that help users describe their goals at higher levels of abstraction and to translate them into BMM programs. Some of these tools are based on the representation of the goals of the user as a collection of input-output pairs of images and the estimation of the target operator from these data. PAC learning is a well suited methodology for this task, since in this theory 'concepts' are represented as Boolean functions that are equivalent to set operators. In order to apply this technique in practice we must have efficient learning algorithms. In this paper we introduce two PAC learning algorithms, both are based on the minimal representation of Boolean functions, which has a straightforward translation to the canonical decomposition of set operators. The first algorithm is based on the classical Quine-McCluskey algorithm for the simplification of Boolean functions, and the second one is based on a new idea for the construction of Boolean functions: the incremental splitting of intervals. We also present a comparative complexity analysis of the two algorithms. Finally, we give some application examples.

  15. Machine learning strategy for accelerated design of polymer dielectrics

    DOE PAGESBeta

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-02-15

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further,more » a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. Furthermore, while this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well.« less

  16. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    NASA Astrophysics Data System (ADS)

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-02-01

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further, a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. While this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well.

  17. Active Learning Methods

    ERIC Educational Resources Information Center

    Zayapragassarazan, Z.; Kumar, Santosh

    2012-01-01

    Present generation students are primarily active learners with varied learning experiences and lecture courses may not suit all their learning needs. Effective learning involves providing students with a sense of progress and control over their own learning. This requires creating a situation where learners have a chance to try out or test their…

  18. 3D prostate segmentation of ultrasound images combining longitudinal image registration and machine learning

    NASA Astrophysics Data System (ADS)

    Yang, Xiaofeng; Fei, Baowei

    2012-02-01

    We developed a three-dimensional (3D) segmentation method for transrectal ultrasound (TRUS) images, which is based on longitudinal image registration and machine learning. Using longitudinal images of each individual patient, we register previously acquired images to the new images of the same subject. Three orthogonal Gabor filter banks were used to extract texture features from each registered image. Patient-specific Gabor features from the registered images are used to train kernel support vector machines (KSVMs) and then to segment the newly acquired prostate image. The segmentation method was tested in TRUS data from five patients. The average surface distance between our and manual segmentation is 1.18 +/- 0.31 mm, indicating that our automatic segmentation method based on longitudinal image registration is feasible for segmenting the prostate in TRUS images.

  19. A User-Oriented Splog Filtering Based on a Machine Learning

    NASA Astrophysics Data System (ADS)

    Yoshinaka, Takayuki; Ishii, Soichi; Fukuhara, Tomohiro; Masuda, Hidetaka; Nakagawa, Hiroshi

    A method for filtering spam blogs (splogs) based on a machine learning technique, and its evaluation results are described. Today, spam blogs (splogs) became one of major issues on the Web. The problem of splogs is that values of blog sites are different by people. We propose a novel user-oriented splog filtering method that can adapt each user's preference for valuable blogs. We use the SVM(Support Vector Machine) for creating a personalized splog filter for each user. We had two experiments: (1) an experiment of individual splog judgement, and (2) an experiment for user oriented splog filtering. From the former experiment, we found existence of 'gray' blogs that are needed to treat by persons. From the latter experiment, we found that we can provide appropriate personalized filters by choosing the best feature set for each user. An overview of proposed method, and evaluation results are described.

  20. Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain

    PubMed Central

    Tighe, Patrick J.; Harle, Christopher A.; Hurley, Robert W.; Aytug, Haldun; Boezaart, Andre P.; Fillingim, Roger B.

    2015-01-01

    Background Given their ability to process highly dimensional datasets with hundreds of variables, machine learning algorithms may offer one solution to the vexing challenge of predicting postoperative pain. Methods Here, we report on the application of machine learning algorithms to predict postoperative pain outcomes in a retrospective cohort of 8071 surgical patients using 796 clinical variables. Five algorithms were compared in terms of their ability to forecast moderate to severe postoperative pain: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine, neural network, and k-nearest neighbor, with logistic regression included for baseline comparison. Results In forecasting moderate to severe postoperative pain for postoperative day (POD) 1, the LASSO algorithm, using all 796 variables, had the highest accuracy with an area under the receiver-operating curve (ROC) of 0.704. Next, the gradient-boosted decision tree had an ROC of 0.665 and the k-nearest neighbor algorithm had an ROC of 0.643. For POD 3, the LASSO algorithm, using all variables, again had the highest accuracy, with an ROC of 0.727. Logistic regression had a lower ROC of 0.5 for predicting pain outcomes on POD 1 and 3. Conclusions Machine learning algorithms, when combined with complex and heterogeneous data from electronic medical record systems, can forecast acute postoperative pain outcomes with accuracies similar to methods that rely only on variables specifically collected for pain outcome prediction. PMID:26031220