Sample records for weka machine learning

  1. Data mining in bioinformatics using Weka.

    PubMed

    Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H

    2004-10-12

    The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.

  2. A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam

    NASA Astrophysics Data System (ADS)

    Foozy, Cik Feresa Mohd; Ahmad, Rabiah; Faizal Abdollah, M. A.; Chai Wen, Chuah

    2017-08-01

    SMS Spamming is a serious attack that can manipulate the use of the SMS by spreading the advertisement in bulk. By sending the unwanted SMS that contain advertisement can make the users feeling disturb and this against the privacy of the mobile users. To overcome these issues, many studies have proposed to detect SMS Spam by using data mining tools. This paper will do a comparative study using five machine learning techniques such as Naïve Bayes, K-NN (K-Nearest Neighbour Algorithm), Decision Tree, Random Forest and Decision Stumps to observe the accuracy result between RapidMiner and WEKA for dataset SMS Spam UCI Machine Learning repository.

  3. An experimental result of estimating an application volume by machine learning techniques.

    PubMed

    Hasegawa, Tatsuhito; Koshino, Makoto; Kimura, Haruhiko

    2015-01-01

    In this study, we improved the usability of smartphones by automating a user's operations. We developed an intelligent system using machine learning techniques that periodically detects a user's context on a smartphone. We selected the Android operating system because it has the largest market share and highest flexibility of its development environment. In this paper, we describe an application that automatically adjusts application volume. Adjusting the volume can be easily forgotten because users need to push the volume buttons to alter the volume depending on the given situation. Therefore, we developed an application that automatically adjusts the volume based on learned user settings. Application volume can be set differently from ringtone volume on Android devices, and these volume settings are associated with each specific application including games. Our application records a user's location, the volume setting, the foreground application name and other such attributes as learning data, thereby estimating whether the volume should be adjusted using machine learning techniques via Weka.

  4. Improving diagnostic recognition of primary hyperparathyroidism with machine learning.

    PubMed

    Somnay, Yash R; Craven, Mark; McCoy, Kelly L; Carty, Sally E; Wang, Tracy S; Greenberg, Caprice C; Schneider, David F

    2017-04-01

    Parathyroidectomy offers the only cure for primary hyperparathyroidism, but today only 50% of primary hyperparathyroidism patients are referred for operation, in large part, because the condition is widely under-recognized. The diagnosis of primary hyperparathyroidism can be especially challenging with mild biochemical indices. Machine learning is a collection of methods in which computers build predictive algorithms based on labeled examples. With the aim of facilitating diagnosis, we tested the ability of machine learning to distinguish primary hyperparathyroidism from normal physiology using clinical and laboratory data. This retrospective cohort study used a labeled training set and 10-fold cross-validation to evaluate accuracy of the algorithm. Measures of accuracy included area under the receiver operating characteristic curve, precision (sensitivity), and positive and negative predictive value. Several different algorithms and ensembles of algorithms were tested using the Weka platform. Among 11,830 patients managed operatively at 3 high-volume endocrine surgery programs from March 2001 to August 2013, 6,777 underwent parathyroidectomy for confirmed primary hyperparathyroidism, and 5,053 control patients without primary hyperparathyroidism underwent thyroidectomy. Test-set accuracies for machine learning models were determined using 10-fold cross-validation. Age, sex, and serum levels of preoperative calcium, phosphate, parathyroid hormone, vitamin D, and creatinine were defined as potential predictors of primary hyperparathyroidism. Mild primary hyperparathyroidism was defined as primary hyperparathyroidism with normal preoperative calcium or parathyroid hormone levels. After testing a variety of machine learning algorithms, Bayesian network models proved most accurate, classifying correctly 95.2% of all primary hyperparathyroidism patients (area under receiver operating characteristic = 0.989). Omitting parathyroid hormone from the model did not

  5. Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks

    DTIC Science & Technology

    2014-03-27

    intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File

  6. Application of Machine Learning to Predict Dietary Lapses During Weight Loss.

    PubMed

    Goldstein, Stephanie P; Zhang, Fengqing; Thomas, John G; Butryn, Meghan L; Herbert, James D; Forman, Evan M

    2018-05-01

    Individuals who adhere to dietary guidelines provided during weight loss interventions tend to be more successful with weight control. Any deviation from dietary guidelines can be referred to as a "lapse." There is a growing body of research showing that lapses are predictable using a variety of physiological, environmental, and psychological indicators. With recent technological advancements, it may be possible to assess these triggers and predict dietary lapses in real time. The current study sought to use machine learning techniques to predict lapses and evaluate the utility of combining both group- and individual-level data to enhance lapse prediction. The current study trained and tested a machine learning algorithm capable of predicting dietary lapses from a behavioral weight loss program among adults with overweight/obesity (n = 12). Participants were asked to follow a weight control diet for 6 weeks and complete ecological momentary assessment (EMA; repeated brief surveys delivered via smartphone) regarding dietary lapses and relevant triggers. WEKA decision trees were used to predict lapses with an accuracy of 0.72 for the group of participants. However, generalization of the group algorithm to each individual was poor, and as such, group- and individual-level data were combined to improve prediction. The findings suggest that 4 weeks of individual data collection is recommended to attain optimal model performance. The predictive algorithm could be utilized to provide in-the-moment interventions to prevent dietary lapses and therefore enhance weight losses. Furthermore, methods in the current study could be translated to other types of health behavior lapses.

  7. Quantum machine learning.

    PubMed

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  8. Quantum machine learning

    NASA Astrophysics Data System (ADS)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-01

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  9. Prediction of the Chloride Resistance of Concrete Modified with High Calcium Fly Ash Using Machine Learning

    PubMed Central

    Marks, Michał; Glinicki, Michał A.; Gibas, Karolina

    2015-01-01

    The aim of the study was to generate rules for the prediction of the chloride resistance of concrete modified with high calcium fly ash using machine learning methods. The rapid chloride permeability test, according to the Nordtest Method Build 492, was used for determining the chloride ions’ penetration in concrete containing high calcium fly ash (HCFA) for partial replacement of Portland cement. The results of the performed tests were used as the training set to generate rules describing the relation between material composition and the chloride resistance. Multiple methods for rule generation were applied and compared. The rules generated by algorithm J48 from the Weka workbench provided the means for adequate classification of plain concretes and concretes modified with high calcium fly ash as materials of good, acceptable or unacceptable resistance to chloride penetration. PMID:28793740

  10. Machine Learning

    DTIC Science & Technology

    1990-04-01

    DTIC i.LE COPY RADC-TR-90-25 Final Technical Report April 1990 MACHINE LEARNING The MITRE Corporation Melissa P. Chase Cs) CTIC ’- CT E 71 IN 2 11990...S. FUNDING NUMBERS MACHINE LEARNING C - F19628-89-C-0001 PE - 62702F PR - MOlE S. AUTHO(S) TA - 79 Melissa P. Chase WUT - 80 S. PERFORMING...341.280.5500 pm I " Aw Sig rill Ia 2110-01 SECTION 1 INTRODUCTION 1.1 BACKGROUND Research in machine learning has taken two directions in the problem of

  11. Machine Learning

    NASA Astrophysics Data System (ADS)

    Hoffmann, Achim; Mahidadia, Ashesh

    The purpose of this chapter is to present fundamental ideas and techniques of machine learning suitable for the field of this book, i.e., for automated scientific discovery. The chapter focuses on those symbolic machine learning methods, which produce results that are suitable to be interpreted and understood by humans. This is particularly important in the context of automated scientific discovery as the scientific theories to be produced by machines are usually meant to be interpreted by humans. This chapter contains some of the most influential ideas and concepts in machine learning research to give the reader a basic insight into the field. After the introduction in Sect. 1, general ideas of how learning problems can be framed are given in Sect. 2. The section provides useful perspectives to better understand what learning algorithms actually do. Section 3 presents the Version space model which is an early learning algorithm as well as a conceptual framework, that provides important insight into the general mechanisms behind most learning algorithms. In section 4, a family of learning algorithms, the AQ family for learning classification rules is presented. The AQ family belongs to the early approaches in machine learning. The next, Sect. 5 presents the basic principles of decision tree learners. Decision tree learners belong to the most influential class of inductive learning algorithms today. Finally, a more recent group of learning systems are presented in Sect. 6, which learn relational concepts within the framework of logic programming. This is a particularly interesting group of learning systems since the framework allows also to incorporate background knowledge which may assist in generalisation. Section 7 discusses Association Rules - a technique that comes from the related field of Data mining. Section 8 presents the basic idea of the Naive Bayesian Classifier. While this is a very popular learning technique, the learning result is not well suited for

  12. Machine Learning.

    ERIC Educational Resources Information Center

    Kirrane, Diane E.

    1990-01-01

    As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)

  13. Evaluation of a Machine-Learning Classifier for Keratoconus Detection Based on Scheimpflug Tomography.

    PubMed

    Ruiz Hidalgo, Irene; Rodriguez, Pablo; Rozema, Jos J; Ní Dhubhghaill, Sorcha; Zakaria, Nadia; Tassignon, Marie-José; Koppen, Carina

    2016-06-01

    To evaluate the performance of a support vector machine algorithm that automatically and objectively identifies corneal patterns based on a combination of 22 parameters obtained from Pentacam measurements and to compare this method with other known keratoconus (KC) classification methods. Pentacam data from 860 eyes were included in the study and divided into 5 groups: 454 KC, 67 forme fruste (FF), 28 astigmatic, 117 after refractive surgery (PR), and 194 normal eyes (N). Twenty-two parameters were used for classification using a support vector machine algorithm developed in Weka, a machine-learning computer software. The cross-validation accuracy for 3 different classification tasks (KC vs. N, FF vs. N and all 5 groups) was calculated and compared with other known classification methods. The accuracy achieved in the KC versus N discrimination task was 98.9%, with 99.1% sensitivity and 98.5% specificity for KC detection. The accuracy in the FF versus N task was 93.1%, with 79.1% sensitivity and 97.9% specificity for the FF discrimination. Finally, for the 5-groups classification, the accuracy was 88.8%, with a weighted average sensitivity of 89.0% and specificity of 95.2%. Despite using the strictest definition for FF KC, the present study obtained comparable or better results than the single-parameter methods and indices reported in the literature. In some cases, direct comparisons with the literature were not possible because of differences in the compositions and definitions of the study groups, especially the FF KC.

  14. Machine Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networksmore » and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.« less

  15. Machine Learning and Radiology

    PubMed Central

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  16. Toward Intelligent Machine Learning Algorithms

    DTIC Science & Technology

    1988-05-01

    Machine learning is recognized as a tool for improving the performance of many kinds of systems, yet most machine learning systems themselves are not...directed systems, and with the addition of a knowledge store for organizing and maintaining knowledge to assist learning, a learning machine learning (L...ML) algorithm is possible. The necessary components of L-ML systems are presented along with several case descriptions of existing machine learning systems

  17. Quantum-Enhanced Machine Learning

    NASA Astrophysics Data System (ADS)

    Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

    2016-09-01

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  18. Machine learning and radiology.

    PubMed

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  19. Human Machine Learning Symbiosis

    ERIC Educational Resources Information Center

    Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.

    2017-01-01

    Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…

  20. The Security of Machine Learning

    DTIC Science & Technology

    2008-04-24

    Machine learning has become a fundamental tool for computer security, since it can rapidly evolve to changing and complex situations. That...adaptability is also a vulnerability: attackers can exploit machine learning systems. We present a taxonomy identifying and analyzing attacks against machine ...We use our framework to survey and analyze the literature of attacks against machine learning systems. We also illustrate our taxonomy by showing

  1. Machine Learning for Medical Imaging

    PubMed Central

    Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L.

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. ©RSNA, 2017 PMID:28212054

  2. Machine Learning for Medical Imaging.

    PubMed

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.

  3. Model-based machine learning.

    PubMed

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

  4. Model-based machine learning

    PubMed Central

    Bishop, Christopher M.

    2013-01-01

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  5. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

    PubMed

    Zhang, Lu; Tan, Jianjun; Han, Dan; Zhu, Hao

    2017-11-01

    Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Approaches to Machine Learning.

    DTIC Science & Technology

    1984-02-16

    The field of machine learning strives to develop methods and techniques to automatic the acquisition of new information, new skills, and new ways of organizing existing information. In this article, we review the major approaches to machine learning in symbolic domains, covering the tasks of learning concepts from examples, learning search methods, conceptual clustering, and language acquisition. We illustrate each of the basic approaches with paradigmatic examples. (Author)

  7. Machine Learning Based Malware Detection

    DTIC Science & Technology

    2015-05-18

    A TRIDENT SCHOLAR PROJECT REPORT NO. 440 Machine Learning Based Malware Detection by Midshipman 1/C Zane A. Markel, USN...COVERED (From - To) 4. TITLE AND SUBTITLE Machine Learning Based Malware Detection 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM...suitably be projected into realistic performance. This work explores several aspects of machine learning based malware detection . First, we

  8. Game-powered machine learning

    PubMed Central

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-01-01

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786

  9. Game-powered machine learning.

    PubMed

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  10. Intelligible machine learning with malibu.

    PubMed

    Langlois, Robert E; Lu, Hui

    2008-01-01

    malibu is an open-source machine learning work-bench developed in C/C++ for high-performance real-world applications, namely bioinformatics and medical informatics. It leverages third-party machine learning implementations for more robust bug-free software. This workbench handles several well-studied supervised machine learning problems including classification, regression, importance-weighted classification and multiple-instance learning. The malibu interface was designed to create reproducible experiments ideally run in a remote and/or command line environment. The software can be found at: http://proteomics.bioengr. uic.edu/malibu/index.html.

  11. Web Mining: Machine Learning for Web Applications.

    ERIC Educational Resources Information Center

    Chen, Hsinchun; Chau, Michael

    2004-01-01

    Presents an overview of machine learning research and reviews methods used for evaluating machine learning systems. Ways that machine-learning algorithms were used in traditional information retrieval systems in the "pre-Web" era are described, and the field of Web mining and how machine learning has been used in different Web mining…

  12. Language Acquisition and Machine Learning.

    DTIC Science & Technology

    1986-02-01

    machine learning and examine its implications for computational models of language acquisition. As a framework for understanding this research, the authors propose four component tasks involved in learning from experience-aggregation, clustering, characterization, and storage. They then consider four common problems studied by machine learning researchers-learning from examples, heuristics learning, conceptual clustering, and learning macro-operators-describing each in terms of our framework. After this, they turn to the problem of grammar

  13. Relational machine learning for electronic health record-driven phenotyping.

    PubMed

    Peissig, Peggy L; Santos Costa, Vitor; Caldwell, Michael D; Rottscheit, Carla; Berg, Richard L; Mendonca, Eneida A; Page, David

    2014-12-01

    Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient's clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using inductive logic programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping. Two relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance. We developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each ML approach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p=0.039), J48 (p=0.003) and JRIP (p=0.003). ILP has the potential to improve phenotyping by independently delivering

  14. Addressing uncertainty in atomistic machine learning.

    PubMed

    Peterson, Andrew A; Christensen, Rune; Khorshidi, Alireza

    2017-05-10

    Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate of the uncertainty when the width is comparable to that in the training data. Intriguingly, we also show that the uncertainty can be localized to specific atoms in the simulation, which may offer hints for the generation of training data to strategically improve the machine-learned representation.

  15. Paradigms for machine learning

    NASA Technical Reports Server (NTRS)

    Schlimmer, Jeffrey C.; Langley, Pat

    1991-01-01

    Five paradigms are described for machine learning: connectionist (neural network) methods, genetic algorithms and classifier systems, empirical methods for inducing rules and decision trees, analytic learning methods, and case-based approaches. Some dimensions are considered along with these paradigms vary in their approach to learning, and the basic methods are reviewed that are used within each framework, together with open research issues. It is argued that the similarities among the paradigms are more important than their differences, and that future work should attempt to bridge the existing boundaries. Finally, some recent developments in the field of machine learning are discussed, and their impact on both research and applications is examined.

  16. Quantum Machine Learning

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak

    2018-01-01

    Quantum computing promises an unprecedented ability to solve intractable problems by harnessing quantum mechanical effects such as tunneling, superposition, and entanglement. The Quantum Artificial Intelligence Laboratory (QuAIL) at NASA Ames Research Center is the space agency's primary facility for conducting research and development in quantum information sciences. QuAIL conducts fundamental research in quantum physics but also explores how best to exploit and apply this disruptive technology to enable NASA missions in aeronautics, Earth and space sciences, and space exploration. At the same time, machine learning has become a major focus in computer science and captured the imagination of the public as a panacea to myriad big data problems. In this talk, we will discuss how classical machine learning can take advantage of quantum computing to significantly improve its effectiveness. Although we illustrate this concept on a quantum annealer, other quantum platforms could be used as well. If explored fully and implemented efficiently, quantum machine learning could greatly accelerate a wide range of tasks leading to new technologies and discoveries that will significantly change the way we solve real-world problems.

  17. Quantum Machine Learning over Infinite Dimensions

    DOE PAGES

    Lau, Hoi-Kwan; Pooser, Raphael; Siopsis, George; ...

    2017-02-21

    Machine learning is a fascinating and exciting eld within computer science. Recently, this ex- citement has been transferred to the quantum information realm. Currently, all proposals for the quantum version of machine learning utilize the nite-dimensional substrate of discrete variables. Here we generalize quantum machine learning to the more complex, but still remarkably practi- cal, in nite-dimensional systems. We present the critical subroutines of quantum machine learning algorithms for an all-photonic continuous-variable quantum computer that achieve an exponential speedup compared to their equivalent classical counterparts. Finally, we also map out an experi- mental implementation which can be used as amore » blueprint for future photonic demonstrations.« less

  18. Quantum Machine Learning over Infinite Dimensions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lau, Hoi-Kwan; Pooser, Raphael; Siopsis, George

    Machine learning is a fascinating and exciting eld within computer science. Recently, this ex- citement has been transferred to the quantum information realm. Currently, all proposals for the quantum version of machine learning utilize the nite-dimensional substrate of discrete variables. Here we generalize quantum machine learning to the more complex, but still remarkably practi- cal, in nite-dimensional systems. We present the critical subroutines of quantum machine learning algorithms for an all-photonic continuous-variable quantum computer that achieve an exponential speedup compared to their equivalent classical counterparts. Finally, we also map out an experi- mental implementation which can be used as amore » blueprint for future photonic demonstrations.« less

  19. Next-Generation Machine Learning for Biological Networks.

    PubMed

    Camacho, Diogo M; Collins, Katherine M; Powers, Rani K; Costello, James C; Collins, James J

    2018-06-14

    Machine learning, a collection of data-analytical techniques aimed at building predictive models from multi-dimensional datasets, is becoming integral to modern biological research. By enabling one to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a primer on machine learning for life scientists, including an introduction to deep learning. We discuss opportunities and challenges at the intersection of machine learning and network biology, which could impact disease biology, drug discovery, microbiome research, and synthetic biology. Copyright © 2018 Elsevier Inc. All rights reserved.

  20. Machine learning for neuroimaging with scikit-learn.

    PubMed

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  1. Machine learning for neuroimaging with scikit-learn

    PubMed Central

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. PMID:24600388

  2. Machine learning for medical images analysis.

    PubMed

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods. Crown Copyright © 2016. Published by Elsevier B.V. All rights reserved.

  3. WebWatcher: Machine Learning and Hypertext

    DTIC Science & Technology

    1995-05-29

    WebWatcher: Machine Learning and Hypertext Thorsten Joachims, Tom Mitchell, Dayne Freitag, and Robert Armstrong School of Computer Science Carnegie...HTML-page about machine learning in which we in- serted a hyperlink to WebWatcher (line 6). The user follows this hyperlink and gets to a page which...AND SUBTITLE WebWatcher: Machine Learning and Hypertext 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT

  4. Machine learning in genetics and genomics

    PubMed Central

    Libbrecht, Maxwell W.; Noble, William Stafford

    2016-01-01

    The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244

  5. Machine Learning in Medicine.

    PubMed

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  6. Machine Learning in Medicine

    PubMed Central

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  7. Machine vision systems using machine learning for industrial product inspection

    NASA Astrophysics Data System (ADS)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  8. Probability machines: consistent probability estimation using nonparametric learning machines.

    PubMed

    Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

    2012-01-01

    Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

  9. Evaluating the Security of Machine Learning Algorithms

    DTIC Science & Technology

    2008-05-20

    Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a...computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions... machine learning has a structure that we can use to build secure learning systems. This thesis makes three high-level contributions. First, we develop a

  10. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  11. Probabilistic machine learning and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  12. Learning About Climate and Atmospheric Models Through Machine Learning

    NASA Astrophysics Data System (ADS)

    Lucas, D. D.

    2017-12-01

    From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  13. Machine Learning Techniques in Clinical Vision Sciences.

    PubMed

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration

  14. Energy landscapes for machine learning

    NASA Astrophysics Data System (ADS)

    Ballard, Andrew J.; Das, Ritankar; Martiniani, Stefano; Mehta, Dhagash; Sagun, Levent; Stevenson, Jacob D.; Wales, David J.

    Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to gain new insight into the solution space involved in training and the nature of the corresponding predictions. In particular, we can define quantities analogous to molecular structure, thermodynamics, and kinetics, and relate these emergent properties to the structure of the underlying landscape. This Perspective aims to describe these analogies with examples from recent applications, and suggest avenues for new interdisciplinary research.

  15. Using Machine Learning to Advance Personality Assessment and Theory.

    PubMed

    Bleidorn, Wiebke; Hopwood, Christopher James

    2018-05-01

    Machine learning has led to important advances in society. One of the most exciting applications of machine learning in psychological science has been the development of assessment tools that can powerfully predict human behavior and personality traits. Thus far, machine learning approaches to personality assessment have focused on the associations between social media and other digital records with established personality measures. The goal of this article is to expand the potential of machine learning approaches to personality assessment by embedding it in a more comprehensive construct validation framework. We review recent applications of machine learning to personality assessment, place machine learning research in the broader context of fundamental principles of construct validation, and provide recommendations for how to use machine learning to advance our understanding of personality.

  16. A Novel Local Learning based Approach With Application to Breast Cancer Diagnosis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Songhua; Tourassi, Georgia

    2012-01-01

    The purpose of this study is to develop and evaluate a novel local learning-based approach for computer-assisted diagnosis of breast cancer. Our new local learning based algorithm using the linear logistic regression method as its base learner is described. Overall, our algorithm will perform its stochastic searching process until the total allowed computing time is used up by our random walk process in identifying the most suitable population subdivision scheme and their corresponding individual base learners. The proposed local learning-based approach was applied for the prediction of breast cancer given 11 mammographic and clinical findings reported by physicians using themore » BI-RADS lexicon. Our database consisted of 850 patients with biopsy confirmed diagnosis (290 malignant and 560 benign). We also compared the performance of our method with a collection of publicly available state-of-the-art machine learning methods. Predictive performance for all classifiers was evaluated using 10-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Figure 1 reports the performance of 54 machine learning methods implemented in the machine learning toolkit Weka (version 3.0). We introduced a novel local learning-based classifier and compared it with an extensive list of other classifiers for the problem of breast cancer diagnosis. Our experiments show that the algorithm superior prediction performance outperforming a wide range of other well established machine learning techniques. Our conclusion complements the existing understanding in the machine learning field that local learning may capture complicated, non-linear relationships exhibited by real-world datasets.« less

  17. Experimental Machine Learning of Quantum States

    NASA Astrophysics Data System (ADS)

    Gao, Jun; Qiao, Lu-Feng; Jiao, Zhi-Qiang; Ma, Yue-Chi; Hu, Cheng-Qiu; Ren, Ruo-Jing; Yang, Ai-Lin; Tang, Hao; Yung, Man-Hong; Jin, Xian-Min

    2018-06-01

    Quantum information technologies provide promising applications in communication and computation, while machine learning has become a powerful technique for extracting meaningful structures in "big data." A crossover between quantum information and machine learning represents a new interdisciplinary area stimulating progress in both fields. Traditionally, a quantum state is characterized by quantum-state tomography, which is a resource-consuming process when scaled up. Here we experimentally demonstrate a machine-learning approach to construct a quantum-state classifier for identifying the separability of quantum states. We show that it is possible to experimentally train an artificial neural network to efficiently learn and classify quantum states, without the need of obtaining the full information of the states. We also show how adding a hidden layer of neurons to the neural network can significantly boost the performance of the state classifier. These results shed new light on how classification of quantum states can be achieved with limited resources, and represent a step towards machine-learning-based applications in quantum information processing.

  18. Entanglement-Based Machine Learning on a Quantum Computer

    NASA Astrophysics Data System (ADS)

    Cai, X.-D.; Wu, D.; Su, Z.-E.; Chen, M.-C.; Wang, X.-L.; Li, Li; Liu, N.-L.; Lu, C.-Y.; Pan, J.-W.

    2015-03-01

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  19. An introduction to quantum machine learning

    NASA Astrophysics Data System (ADS)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  20. Machine learning of molecular properties: Locality and active learning

    NASA Astrophysics Data System (ADS)

    Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

    2018-06-01

    In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

  1. Machine Learning, deep learning and optimization in computer vision

    NASA Astrophysics Data System (ADS)

    Canu, Stéphane

    2017-03-01

    As quoted in the Large Scale Computer Vision Systems NIPS workshop, computer vision is a mature field with a long tradition of research, but recent advances in machine learning, deep learning, representation learning and optimization have provided models with new capabilities to better understand visual content. The presentation will go through these new developments in machine learning covering basic motivations, ideas, models and optimization in deep learning for computer vision, identifying challenges and opportunities. It will focus on issues related with large scale learning that is: high dimensional features, large variety of visual classes, and large number of examples.

  2. Using human brain activity to guide machine learning.

    PubMed

    Fong, Ruth C; Scheirer, Walter J; Cox, David D

    2018-03-29

    Machine learning is a field of computer science that builds algorithms that learn. In many cases, machine learning algorithms are used to recreate a human ability like adding a caption to a photo, driving a car, or playing a game. While the human brain has long served as a source of inspiration for machine learning, little effort has been made to directly use data collected from working brains as a guide for machine learning algorithms. Here we demonstrate a new paradigm of "neurally-weighted" machine learning, which takes fMRI measurements of human brain activity from subjects viewing images, and infuses these data into the training process of an object recognition learning algorithm to make it more consistent with the human brain. After training, these neurally-weighted classifiers are able to classify images without requiring any additional neural data. We show that our neural-weighting approach can lead to large performance gains when used with traditional machine vision features, as well as to significant improvements with already high-performing convolutional neural network features. The effectiveness of this approach points to a path forward for a new class of hybrid machine learning algorithms which take both inspiration and direct constraints from neuronal data.

  3. Toward Harnessing User Feedback For Machine Learning

    DTIC Science & Technology

    2006-10-02

    machine learning systems. If this resource-the users themselves-could somehow work hand-in-hand with machine learning systems, the accuracy of learning systems could be improved and the users? understanding and trust of the system could improve as well. We conducted a think-aloud study to see how willing users were to provide feedback and to understand what kinds of feedback users could give. Users were shown explanations of machine learning predictions and asked to provide feedback to improve the predictions. We found that users

  4. Machine learning applications in genetics and genomics.

    PubMed

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  5. Applications of Machine Learning and Rule Induction,

    DTIC Science & Technology

    1995-02-15

    An important area of application for machine learning is in automating the acquisition of knowledge bases required for expert systems. In this paper...we review the major paradigms for machine learning , including neural networks, instance-based methods, genetic learning, rule induction, and analytic

  6. Machine learning enhanced optical distance sensor

    NASA Astrophysics Data System (ADS)

    Amin, M. Junaid; Riza, N. A.

    2018-01-01

    Presented for the first time is a machine learning enhanced optical distance sensor. The distance sensor is based on our previously demonstrated distance measurement technique that uses an Electronically Controlled Variable Focus Lens (ECVFL) with a laser source to illuminate a target plane with a controlled optical beam spot. This spot with varying spot sizes is viewed by an off-axis camera and the spot size data is processed to compute the distance. In particular, proposed and demonstrated in this paper is the use of a regularized polynomial regression based supervised machine learning algorithm to enhance the accuracy of the operational sensor. The algorithm uses the acquired features and corresponding labels that are the actual target distance values to train a machine learning model. The optimized training model is trained over a 1000 mm (or 1 m) experimental target distance range. Using the machine learning algorithm produces a training set and testing set distance measurement errors of <0.8 mm and <2.2 mm, respectively. The test measurement error is at least a factor of 4 improvement over our prior sensor demonstration without the use of machine learning. Applications for the proposed sensor include industrial scenario distance sensing where target material specific training models can be generated to realize low <1% measurement error distance measurements.

  7. Classification of ECG signal with Support Vector Machine Method for Arrhythmia Detection

    NASA Astrophysics Data System (ADS)

    Turnip, Arjon; Ilham Rizqywan, M.; Kusumandari, Dwi E.; Turnip, Mardi; Sihombing, Poltak

    2018-03-01

    An electrocardiogram is a potential bioelectric record that occurs as a result of cardiac activity. QRS Detection with zero crossing calculation is one method that can precisely determine peak R of QRS wave as part of arrhythmia detection. In this paper, two experimental scheme (2 minutes duration with different activities: relaxed and, typing) were conducted. From the two experiments it were obtained: accuracy, sensitivity, and positive predictivity about 100% each for the first experiment and about 79%, 93%, 83% for the second experiment, respectively. Furthermore, the feature set of MIT-BIH arrhythmia using the support vector machine (SVM) method on the WEKA software is evaluated. By combining the available attributes on the WEKA algorithm, the result is constant since all classes of SVM goes to the normal class with average 88.49% accuracy.

  8. Machine Learning for the Knowledge Plane

    DTIC Science & Technology

    2006-06-01

    this idea is to combine techniques from machine learning with new architectural concepts in networking to make the internet self-aware and self...work on the machine learning portion of the Knowledge Plane. This consisted of three components: (a) we wrote a document formulating the various

  9. Machine learning: Trends, perspectives, and prospects.

    PubMed

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  10. Machine learning phases of matter

    NASA Astrophysics Data System (ADS)

    Carrasquilla, Juan; Melko, Roger G.

    2017-02-01

    Condensed-matter physics is the study of the collective behaviour of infinitely complex assemblies of electrons, nuclei, magnetic moments, atoms or qubits. This complexity is reflected in the size of the state space, which grows exponentially with the number of particles, reminiscent of the `curse of dimensionality' commonly encountered in machine learning. Despite this curse, the machine learning community has developed techniques with remarkable abilities to recognize, classify, and characterize complex sets of data. Here, we show that modern machine learning architectures, such as fully connected and convolutional neural networks, can identify phases and phase transitions in a variety of condensed-matter Hamiltonians. Readily programmable through modern software libraries, neural networks can be trained to detect multiple types of order parameter, as well as highly non-trivial states with no conventional order, directly from raw state configurations sampled with Monte Carlo.

  11. Machine Learning Approaches in Cardiovascular Imaging.

    PubMed

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  12. Workshop on Fielded Applications of Machine Learning

    DTIC Science & Technology

    1994-05-11

    This report summaries the talks presented at the Workshop on Fielded Applications of Machine Learning , and draws some initial conclusions about the state of machine learning and its potential for solving real-world problems.

  13. Machine learning, social learning and the governance of self-driving cars.

    PubMed

    Stilgoe, Jack

    2018-02-01

    Self-driving cars, a quintessentially 'smart' technology, are not born smart. The algorithms that control their movements are learning as the technology emerges. Self-driving cars represent a high-stakes test of the powers of machine learning, as well as a test case for social learning in technology governance. Society is learning about the technology while the technology learns about society. Understanding and governing the politics of this technology means asking 'Who is learning, what are they learning and how are they learning?' Focusing on the successes and failures of social learning around the much-publicized crash of a Tesla Model S in 2016, I argue that trajectories and rhetorics of machine learning in transport pose a substantial governance challenge. 'Self-driving' or 'autonomous' cars are misnamed. As with other technologies, they are shaped by assumptions about social needs, solvable problems, and economic opportunities. Governing these technologies in the public interest means improving social learning by constructively engaging with the contingencies of machine learning.

  14. Machine learning methods in chemoinformatics

    PubMed Central

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  15. Machine learning in cardiovascular medicine: are we there yet?

    PubMed

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. Bypassing the Kohn-Sham equations with machine learning.

    PubMed

    Brockherde, Felix; Vogt, Leslie; Li, Li; Tuckerman, Mark E; Burke, Kieron; Müller, Klaus-Robert

    2017-10-11

    Last year, at least 30,000 scientific papers used the Kohn-Sham scheme of density functional theory to solve electronic structure problems in a wide variety of scientific fields. Machine learning holds the promise of learning the energy functional via examples, bypassing the need to solve the Kohn-Sham equations. This should yield substantial savings in computer time, allowing larger systems and/or longer time-scales to be tackled, but attempts to machine-learn this functional have been limited by the need to find its derivative. The present work overcomes this difficulty by directly learning the density-potential and energy-density maps for test systems and various molecules. We perform the first molecular dynamics simulation with a machine-learned density functional on malonaldehyde and are able to capture the intramolecular proton transfer process. Learning density models now allows the construction of accurate density functionals for realistic molecular systems.Machine learning allows electronic structure calculations to access larger system sizes and, in dynamical simulations, longer time scales. Here, the authors perform such a simulation using a machine-learned density functional that avoids direct solution of the Kohn-Sham equations.

  17. A Machine Learning and Optimization Toolkit for the Swarm

    DTIC Science & Technology

    2014-11-17

    Machine   Learning  and  Op0miza0on   Toolkit  for  the  Swarm   Ilge  Akkaya,  Shuhei  Emoto...3. DATES COVERED 00-00-2014 to 00-00-2014 4. TITLE AND SUBTITLE A Machine Learning and Optimization Toolkit for the Swarm 5a. CONTRACT NUMBER... machine   learning   methodologies  by  providing  the  right  interfaces  between   machine   learning  tools  and

  18. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  19. MLBCD: a machine learning tool for big clinical data.

    PubMed

    Luo, Gang

    2015-01-01

    Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.

  20. Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.

    PubMed

    Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P

    2017-12-01

    Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.

  1. Machine learning and medicine: book review and commentary.

    PubMed

    Koprowski, Robert; Foster, Kenneth R

    2018-02-01

    This article is a review of the book "Master machine learning algorithms, discover how they work and implement them from scratch" (ISBN: not available, 37 USD, 163 pages) edited by Jason Brownlee published by the Author, edition, v1.10 http://MachineLearningMastery.com . An accompanying commentary discusses some of the issues that are involved with use of machine learning and data mining techniques to develop predictive models for diagnosis or prognosis of disease, and to call attention to additional requirements for developing diagnostic and prognostic algorithms that are generally useful in medicine. Appendix provides examples that illustrate potential problems with machine learning that are not addressed in the reviewed book.

  2. Machine learning in heart failure: ready for prime time.

    PubMed

    Awan, Saqib Ejaz; Sohel, Ferdous; Sanfilippo, Frank Mario; Bennamoun, Mohammed; Dwivedi, Girish

    2018-03-01

    The aim of this review is to present an up-to-date overview of the application of machine learning methods in heart failure including diagnosis, classification, readmissions and medication adherence. Recent studies have shown that the application of machine learning techniques may have the potential to improve heart failure outcomes and management, including cost savings by improving existing diagnostic and treatment support systems. Recently developed deep learning methods are expected to yield even better performance than traditional machine learning techniques in performing complex tasks by learning the intricate patterns hidden in big medical data. The review summarizes the recent developments in the application of machine and deep learning methods in heart failure management.

  3. Novel Breast Imaging and Machine Learning: Predicting Breast Lesion Malignancy at Cone-Beam CT Using Machine Learning Techniques.

    PubMed

    Uhlig, Johannes; Uhlig, Annemarie; Kunze, Meike; Beissbarth, Tim; Fischer, Uwe; Lotz, Joachim; Wienbeck, Susanne

    2018-05-24

    The purpose of this study is to evaluate the diagnostic performance of machine learning techniques for malignancy prediction at breast cone-beam CT (CBCT) and to compare them to human readers. Five machine learning techniques, including random forests, back propagation neural networks (BPN), extreme learning machines, support vector machines, and K-nearest neighbors, were used to train diagnostic models on a clinical breast CBCT dataset with internal validation by repeated 10-fold cross-validation. Two independent blinded human readers with profound experience in breast imaging and breast CBCT analyzed the same CBCT dataset. Diagnostic performance was compared using AUC, sensitivity, and specificity. The clinical dataset comprised 35 patients (American College of Radiology density type C and D breasts) with 81 suspicious breast lesions examined with contrast-enhanced breast CBCT. Forty-five lesions were histopathologically proven to be malignant. Among the machine learning techniques, BPNs provided the best diagnostic performance, with AUC of 0.91, sensitivity of 0.85, and specificity of 0.82. The diagnostic performance of the human readers was AUC of 0.84, sensitivity of 0.89, and specificity of 0.72 for reader 1 and AUC of 0.72, sensitivity of 0.71, and specificity of 0.67 for reader 2. AUC was significantly higher for BPN when compared with both reader 1 (p = 0.01) and reader 2 (p < 0.001). Machine learning techniques provide a high and robust diagnostic performance in the prediction of malignancy in breast lesions identified at CBCT. BPNs showed the best diagnostic performance, surpassing human readers in terms of AUC and specificity.

  4. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    PubMed

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  5. On the Conditioning of Machine-Learning-Assisted Turbulence Modeling

    NASA Astrophysics Data System (ADS)

    Wu, Jinlong; Sun, Rui; Wang, Qiqi; Xiao, Heng

    2017-11-01

    Recently, several researchers have demonstrated that machine learning techniques can be used to improve the RANS modeled Reynolds stress by training on available database of high fidelity simulations. However, obtaining improved mean velocity field remains an unsolved challenge, restricting the predictive capability of current machine-learning-assisted turbulence modeling approaches. In this work we define a condition number to evaluate the model conditioning of data-driven turbulence modeling approaches, and propose a stability-oriented machine learning framework to model Reynolds stress. Two canonical flows, the flow in a square duct and the flow over periodic hills, are investigated to demonstrate the predictive capability of the proposed framework. The satisfactory prediction performance of mean velocity field for both flows demonstrates the predictive capability of the proposed framework for machine-learning-assisted turbulence modeling. With showing the capability of improving the prediction of mean flow field, the proposed stability-oriented machine learning framework bridges the gap between the existing machine-learning-assisted turbulence modeling approaches and the demand of predictive capability of turbulence models in real applications.

  6. The Efficacy of Machine Learning Programs for Navy Manpower Analysis

    DTIC Science & Technology

    1993-03-01

    This thesis investigated the efficacy of two machine learning programs for Navy manpower analysis. Two machine learning programs, AIM and IXL, were...to generate models from the two commercial machine learning programs. Using a held out sub-set of the data the capabilities of the three models were...partial effects. The author recommended further investigation of AIM’s capabilities, and testing in an operational environment.... Machine learning , AIM, IXL.

  7. Classifying smoking urges via machine learning.

    PubMed

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  8. Classifying smoking urges via machine learning

    PubMed Central

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-01-01

    Background and objective Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. Methods To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. Results The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. Conclusions In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms’ performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions

  9. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data

    PubMed Central

    Hepworth, Philip J.; Nefedov, Alexey V.; Muchnik, Ilya B.; Morgan, Kenton L.

    2012-01-01

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide. PMID:22319115

  10. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    PubMed

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  11. Acceleration of saddle-point searches with machine learning.

    PubMed

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  12. Revisit of Machine Learning Supported Biological and Biomedical Studies.

    PubMed

    Yu, Xiang-Tian; Wang, Lu; Zeng, Tao

    2018-01-01

    Generally, machine learning includes many in silico methods to transform the principles underlying natural phenomenon to human understanding information, which aim to save human labor, to assist human judge, and to create human knowledge. It should have wide application potential in biological and biomedical studies, especially in the era of big biological data. To look through the application of machine learning along with biological development, this review provides wide cases to introduce the selection of machine learning methods in different practice scenarios involved in the whole biological and biomedical study cycle and further discusses the machine learning strategies for analyzing omics data in some cutting-edge biological studies. Finally, the notes on new challenges for machine learning due to small-sample high-dimension are summarized from the key points of sample unbalance, white box, and causality.

  13. Machine learning in the string landscape

    NASA Astrophysics Data System (ADS)

    Carifio, Jonathan; Halverson, James; Krioukov, Dmitri; Nelson, Brent D.

    2017-09-01

    We utilize machine learning to study the string landscape. Deep data dives and conjecture generation are proposed as useful frameworks for utilizing machine learning in the landscape, and examples of each are presented. A decision tree accurately predicts the number of weak Fano toric threefolds arising from reflexive polytopes, each of which determines a smooth F-theory compactification, and linear regression generates a previously proven conjecture for the gauge group rank in an ensemble of 4/3× 2.96× {10}^{755} F-theory compactifications. Logistic regression generates a new conjecture for when E 6 arises in the large ensemble of F-theory compactifications, which is then rigorously proven. This result may be relevant for the appearance of visible sectors in the ensemble. Through conjecture generation, machine learning is useful not only for numerics, but also for rigorous results.

  14. Source localization in an ocean waveguide using supervised machine learning.

    PubMed

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  15. Acceleration of saddle-point searches with machine learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peterson, Andrew A., E-mail: andrew-peterson@brown.edu

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the numbermore » of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.« less

  16. Interface Metaphors for Interactive Machine Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jasper, Robert J.; Blaha, Leslie M.

    To promote more interactive and dynamic machine learn- ing, we revisit the notion of user-interface metaphors. User-interface metaphors provide intuitive constructs for supporting user needs through interface design elements. A user-interface metaphor provides a visual or action pattern that leverages a user’s knowledge of another domain. Metaphors suggest both the visual representations that should be used in a display as well as the interactions that should be afforded to the user. We argue that user-interface metaphors can also offer a method of extracting interaction-based user feedback for use in machine learning. Metaphors offer indirect, context-based information that can be usedmore » in addition to explicit user inputs, such as user-provided labels. Implicit information from user interactions with metaphors can augment explicit user input for active learning paradigms. Or it might be leveraged in systems where explicit user inputs are more challenging to obtain. Each interaction with the metaphor provides an opportunity to gather data and learn. We argue this approach is especially important in streaming applications, where we desire machine learning systems that can adapt to dynamic, changing data.« less

  17. Machine Learning in Radiology: Applications Beyond Image Interpretation.

    PubMed

    Lakhani, Paras; Prater, Adam B; Hutson, R Kent; Andriole, Kathy P; Dreyer, Keith J; Morey, Jose; Prevedello, Luciano M; Clark, Toshi J; Geis, J Raymond; Itri, Jason N; Hawkins, C Matthew

    2018-02-01

    Much attention has been given to machine learning and its perceived impact in radiology, particularly in light of recent success with image classification in international competitions. However, machine learning is likely to impact radiology outside of image interpretation long before a fully functional "machine radiologist" is implemented in practice. Here, we describe an overview of machine learning, its application to radiology and other domains, and many cases of use that do not involve image interpretation. We hope that better understanding of these potential applications will help radiology practices prepare for the future and realize performance improvement and efficiency gains. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  18. Prostate Cancer Probability Prediction By Machine Learning Technique.

    PubMed

    Jović, Srđan; Miljković, Milica; Ivanović, Miljan; Šaranović, Milena; Arsić, Milena

    2017-11-26

    The main goal of the study was to explore possibility of prostate cancer prediction by machine learning techniques. In order to improve the survival probability of the prostate cancer patients it is essential to make suitable prediction models of the prostate cancer. If one make relevant prediction of the prostate cancer it is easy to create suitable treatment based on the prediction results. Machine learning techniques are the most common techniques for the creation of the predictive models. Therefore in this study several machine techniques were applied and compared. The obtained results were analyzed and discussed. It was concluded that the machine learning techniques could be used for the relevant prediction of prostate cancer.

  19. Machine Learning. Part 1. A Historical and Methodological Analysis.

    DTIC Science & Technology

    1983-05-31

    Machine learning has always been an integral part of artificial intelligence, and its methodology has evolved in concert with the major concerns of the field. In response to the difficulties of encoding ever-increasing volumes of knowledge in modern Al systems, many researchers have recently turned their attention to machine learning as a means to overcome the knowledge acquisition bottleneck. Part 1 of this paper presents a taxonomic analysis of machine learning organized primarily by learning strategies and secondarily by

  20. Applications of machine learning in cancer prediction and prognosis.

    PubMed

    Cruz, Joseph A; Wishart, David S

    2007-02-11

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  1. Implementing Machine Learning in Radiology Practice and Research.

    PubMed

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  2. Active learning machine learns to create new quantum experiments.

    PubMed

    Melnikov, Alexey A; Poulsen Nautrup, Hendrik; Krenn, Mario; Dunjko, Vedran; Tiersch, Markus; Zeilinger, Anton; Briegel, Hans J

    2018-02-06

    How useful can machine learning be in a quantum laboratory? Here we raise the question of the potential of intelligent machines in the context of scientific research. A major motivation for the present work is the unknown reachability of various entanglement classes in quantum experiments. We investigate this question by using the projective simulation model, a physics-oriented approach to artificial intelligence. In our approach, the projective simulation system is challenged to design complex photonic quantum experiments that produce high-dimensional entangled multiphoton states, which are of high interest in modern quantum experiments. The artificial intelligence system learns to create a variety of entangled states and improves the efficiency of their realization. In the process, the system autonomously (re)discovers experimental techniques which are only now becoming standard in modern quantum optical experiments-a trait which was not explicitly demanded from the system but emerged through the process of learning. Such features highlight the possibility that machines could have a significantly more creative role in future research.

  3. Statistical Machine Learning for Structured and High Dimensional Data

    DTIC Science & Technology

    2014-09-17

    AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under

  4. Machine learning with quantum relative entropy

    NASA Astrophysics Data System (ADS)

    Tsuda, Koji

    2009-12-01

    Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

  5. Machine Learning of Fault Friction

    NASA Astrophysics Data System (ADS)

    Johnson, P. A.; Rouet-Leduc, B.; Hulbert, C.; Marone, C.; Guyer, R. A.

    2017-12-01

    We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025

  6. Machine Learning Applications to Resting-State Functional MR Imaging Analysis.

    PubMed

    Billings, John M; Eder, Maxwell; Flood, William C; Dhami, Devendra Singh; Natarajan, Sriraam; Whitlow, Christopher T

    2017-11-01

    Machine learning is one of the most exciting and rapidly expanding fields within computer science. Academic and commercial research entities are investing in machine learning methods, especially in personalized medicine via patient-level classification. There is great promise that machine learning methods combined with resting state functional MR imaging will aid in diagnosis of disease and guide potential treatment for conditions thought to be impossible to identify based on imaging alone, such as psychiatric disorders. We discuss machine learning methods and explore recent advances. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Thutmose - Investigation of Machine Learning-Based Intrusion Detection Systems

    DTIC Science & Technology

    2016-06-01

    research is being done to incorporate the field of machine learning into intrusion detection. Machine learning is a branch of artificial intelligence (AI...adversarial drift." Proceedings of the 2013 ACM workshop on Artificial intelligence and security. ACM. (2013) Kantarcioglu, M., Xi, B., and Clifton, C. "A...34 Proceedings of the 4th ACM workshop on Security and artificial intelligence . ACM. (2011) Dua, S., and Du, X. Data Mining and Machine Learning in

  8. Learning as a Machine: Crossovers between Humans and Machines

    ERIC Educational Resources Information Center

    Hildebrandt, Mireille

    2017-01-01

    This article is a revised version of the keynote presented at LAK '16 in Edinburgh. The article investigates some of the assumptions of learning analytics, notably those related to behaviourism. Building on the work of Ivan Pavlov, Herbert Simon, and James Gibson as ways of "learning as a machine," the article then develops two levels of…

  9. Multi-Stage Convex Relaxation Methods for Machine Learning

    DTIC Science & Technology

    2013-03-01

    Many problems in machine learning can be naturally formulated as non-convex optimization problems. However, such direct nonconvex formulations have...original nonconvex formulation. We will develop theoretical properties of this method and algorithmic consequences. Related convex and nonconvex machine learning methods will also be investigated.

  10. Studying depression using imaging and machine learning methods.

    PubMed

    Patel, Meenal J; Khalaf, Alexander; Aizenstein, Howard J

    2016-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies.

  11. Application of Weka environment to determine factors that stand behind non-alcoholic fatty liver disease (NAFLD)

    NASA Astrophysics Data System (ADS)

    Plutecki, Michal M.; Wierzbicka, Aldona; Socha, Piotr; Mulawka, Jan J.

    2009-06-01

    The paper describes an innovative approach to discover new knowledge in non-alcoholic fatty liver disease (NAFLD). In order to determine the factors that may cause the disease a number of classification and attribute selection algorithms have been applied. Only those with the best classification results were chosen. Several interesting facts associated with this unclear disease have been discovered. All data mining computations were made in Weka environment.

  12. Characterizing Slow Slip Applying Machine Learning

    NASA Astrophysics Data System (ADS)

    Hulbert, C.; Rouet-Leduc, B.; Bolton, D. C.; Ren, C. X.; Marone, C.; Johnson, P. A.

    2017-12-01

    Over the last two decades it has become apparent from strain and GPS measurements, that slow slip on earthquake faults is a widespread phenomenon. Slow slip is also inferred from small amplitude seismic signals known as tremor and low frequency earthquakes (LFE's) and has been reproduced in laboratory studies, providing useful physical insight into the frictional properties associated with the behavior. From such laboratory studies we ask whether we can obtain quantitative information regarding the physics of friction from only the recorded continuous acoustical data originating from the fault zone. We show that by applying machine learning to the acoustical signal, we can infer upcoming slow slip failure initiation as well as the slip termination, and that we can also infer the magnitudes by a second machine learning procedure based on predicted inter-event times. We speculate that by applying this or other machine learning approaches to continuous seismic data, new information regarding the physics of faulting could be obtained.

  13. Inverse Problems in Geodynamics Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R. N.

    2018-01-01

    During the past few decades numerical studies have been widely employed to explore the style of circulation and mixing in the mantle of Earth and other planets. However, in geodynamical studies there are many properties from mineral physics, geochemistry, and petrology in these numerical models. Machine learning, as a computational statistic-related technique and a subfield of artificial intelligence, has rapidly emerged recently in many fields of sciences and engineering. We focus here on the application of supervised machine learning (SML) algorithms in predictions of mantle flow processes. Specifically, we emphasize on estimating mantle properties by employing machine learning techniques in solving an inverse problem. Using snapshots of numerical convection models as training samples, we enable machine learning models to determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at midmantle depths. Employing support vector machine algorithms, we show that SML techniques can successfully predict the magnitude of mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex geodynamic problems in mantle dynamics by employing deep learning algorithms for putting constraints on properties such as viscosity, elastic parameters, and the nature of thermal and chemical anomalies.

  14. Adaptive Learning Systems: Beyond Teaching Machines

    ERIC Educational Resources Information Center

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  15. Predicting Solar Activity Using Machine-Learning Methods

    NASA Astrophysics Data System (ADS)

    Bobra, M.

    2017-12-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections. However, we do not, as of yet, fully understand the physical mechanism that triggers solar eruptions. A machine-learning algorithm, which is favorable in cases where the amount of data is large, is one way to [1] empirically determine the signatures of this mechanism in solar image data and [2] use them to predict solar activity. In this talk, we discuss the application of various machine learning algorithms - specifically, a Support Vector Machine, a sparse linear regression (Lasso), and Convolutional Neural Network - to image data from the photosphere, chromosphere, transition region, and corona taken by instruments aboard the Solar Dynamics Observatory in order to predict solar activity on a variety of time scales. Such an approach may be useful since, at the present time, there are no physical models of flares available for real-time prediction. We discuss our results (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Jonas et al., 2017) as well as other attempts to predict flares using machine-learning (e.g. Ahmed et al., 2013; Nishizuka et al. 2017) and compare these results with the more traditional techniques used by the NOAA Space Weather Prediction Center (Crown, 2012). We also discuss some of the challenges in using machine-learning algorithms for space science applications.

  16. Learning algorithms for human-machine interfaces.

    PubMed

    Danziger, Zachary; Fishbach, Alon; Mussa-Ivaldi, Ferdinando A

    2009-05-01

    The goal of this study is to create and examine machine learning algorithms that adapt in a controlled and cadenced way to foster a harmonious learning environment between the user and the controlled device. To evaluate these algorithms, we have developed a simple experimental framework. Subjects wear an instrumented data glove that records finger motions. The high-dimensional glove signals remotely control the joint angles of a simulated planar two-link arm on a computer screen, which is used to acquire targets. A machine learning algorithm was applied to adaptively change the transformation between finger motion and the simulated robot arm. This algorithm was either LMS gradient descent or the Moore-Penrose (MP) pseudoinverse transformation. Both algorithms modified the glove-to-joint angle map so as to reduce the endpoint errors measured in past performance. The MP group performed worse than the control group (subjects not exposed to any machine learning), while the LMS group outperformed the control subjects. However, the LMS subjects failed to achieve better generalization than the control subjects, and after extensive training converged to the same level of performance as the control subjects. These results highlight the limitations of coadaptive learning using only endpoint error reduction.

  17. Learning Algorithms for Human–Machine Interfaces

    PubMed Central

    Fishbach, Alon; Mussa-Ivaldi, Ferdinando A.

    2012-01-01

    The goal of this study is to create and examine machine learning algorithms that adapt in a controlled and cadenced way to foster a harmonious learning environment between the user and the controlled device. To evaluate these algorithms, we have developed a simple experimental framework. Subjects wear an instrumented data glove that records finger motions. The high-dimensional glove signals remotely control the joint angles of a simulated planar two-link arm on a computer screen, which is used to acquire targets. A machine learning algorithm was applied to adaptively change the transformation between finger motion and the simulated robot arm. This algorithm was either LMS gradient descent or the Moore–Penrose (MP) pseudoinverse transformation. Both algorithms modified the glove-to-joint angle map so as to reduce the endpoint errors measured in past performance. The MP group performed worse than the control group (subjects not exposed to any machine learning), while the LMS group outperformed the control subjects. However, the LMS subjects failed to achieve better generalization than the control subjects, and after extensive training converged to the same level of performance as the control subjects. These results highlight the limitations of coadaptive learning using only endpoint error reduction. PMID:19203886

  18. Simulation-driven machine learning: Bearing fault classification

    NASA Astrophysics Data System (ADS)

    Sobie, Cameron; Freitas, Carina; Nicolai, Mike

    2018-01-01

    Increasing the accuracy of mechanical fault detection has the potential to improve system safety and economic performance by minimizing scheduled maintenance and the probability of unexpected system failure. Advances in computational performance have enabled the application of machine learning algorithms across numerous applications including condition monitoring and failure detection. Past applications of machine learning to physical failure have relied explicitly on historical data, which limits the feasibility of this approach to in-service components with extended service histories. Furthermore, recorded failure data is often only valid for the specific circumstances and components for which it was collected. This work directly addresses these challenges for roller bearings with race faults by generating training data using information gained from high resolution simulations of roller bearing dynamics, which is used to train machine learning algorithms that are then validated against four experimental datasets. Several different machine learning methodologies are compared starting from well-established statistical feature-based methods to convolutional neural networks, and a novel application of dynamic time warping (DTW) to bearing fault classification is proposed as a robust, parameter free method for race fault detection.

  19. Interpreting Medical Information Using Machine Learning and Individual Conditional Expectation.

    PubMed

    Nohara, Yasunobu; Wakata, Yoshifumi; Nakashima, Naoki

    2015-01-01

    Recently, machine-learning techniques have spread many fields. However, machine-learning is still not popular in medical research field due to difficulty of interpreting. In this paper, we introduce a method of interpreting medical information using machine learning technique. The method gave new explanation of partial dependence plot and individual conditional expectation plot from medical research field.

  20. Machine learning modelling for predicting soil liquefaction susceptibility

    NASA Astrophysics Data System (ADS)

    Samui, P.; Sitharam, T. G.

    2011-01-01

    This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

  1. Implementing Machine Learning in the PCWG Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clifton, Andrew; Ding, Yu; Stuart, Peter

    The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.

  2. Machine learning and data science in soft materials engineering

    NASA Astrophysics Data System (ADS)

    Ferguson, Andrew L.

    2018-01-01

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by ‘de-jargonizing’ data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  3. Machine learning and data science in soft materials engineering.

    PubMed

    Ferguson, Andrew L

    2018-01-31

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  4. Cognitive learning: a machine learning approach for automatic process characterization from design

    NASA Astrophysics Data System (ADS)

    Foucher, J.; Baderot, J.; Martinez, S.; Dervilllé, A.; Bernard, G.

    2018-03-01

    Cutting edge innovation requires accurate and fast process-control to obtain fast learning rate and industry adoption. Current tools available for such task are mainly manual and user dependent. We present in this paper cognitive learning, which is a new machine learning based technique to facilitate and to speed up complex characterization by using the design as input, providing fast training and detection time. We will focus on the machine learning framework that allows object detection, defect traceability and automatic measurement tools.

  5. Applications of Machine Learning for Radiation Therapy.

    PubMed

    Arimura, Hidetaka; Nakamoto, Takahiro

    2016-01-01

    Radiation therapy has been highly advanced as image guided radiation therapy (IGRT) by making advantage of image engineering technologies. Recently, novel frameworks based on image engineering technologies as well as machine learning technologies have been studied for sophisticating the radiation therapy. In this review paper, the author introduces several researches of applications of machine learning for radiation therapy. For examples, a method to determine the threshold values for standardized uptake value (SUV) for estimation of gross tumor volume (GTV) in positron emission tomography (PET) images, an approach to estimate the multileaf collimator (MLC) position errors between treatment plans and radiation delivery time, and prediction frameworks for esophageal stenosis and radiation pneumonitis risk after radiation therapy are described. Finally, the author introduces seven issues that one should consider when applying machine learning models to radiation therapy.

  6. Applications of Machine Learning in Cancer Prediction and Prognosis

    PubMed Central

    Cruz, Joseph A.; Wishart, David S.

    2006-01-01

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15–25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression. PMID:19458758

  7. Designing Anticancer Peptides by Constructive Machine Learning.

    PubMed

    Grisoni, Francesca; Neuhaus, Claudia S; Gabernet, Gisela; Müller, Alex T; Hiss, Jan A; Schneider, Gisbert

    2018-04-21

    Constructive (generative) machine learning enables the automated generation of novel chemical structures without the need for explicit molecular design rules. This study presents the experimental application of such a deep machine learning model to design membranolytic anticancer peptides (ACPs) de novo. A recurrent neural network with long short-term memory cells was trained on α-helical cationic amphipathic peptide sequences and then fine-tuned with 26 known ACPs by transfer learning. This optimized model was used to generate unique and novel amino acid sequences. Twelve of the peptides were synthesized and tested for their activity on MCF7 human breast adenocarcinoma cells and selectivity against human erythrocytes. Ten of these peptides were active against cancer cells. Six of the active peptides killed MCF7 cancer cells without affecting human erythrocytes with at least threefold selectivity. These results advocate constructive machine learning for the automated design of peptides with desired biological activities. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. A review of machine learning in obesity.

    PubMed

    DeGregory, K W; Kuiper, P; DeSilvio, T; Pleuss, J D; Miller, R; Roginski, J W; Fisher, C B; Harness, D; Viswanath, S; Heymsfield, S B; Dungan, I; Thomas, D M

    2018-05-01

    Rich sources of obesity-related data arising from sensors, smartphone apps, electronic medical health records and insurance data can bring new insights for understanding, preventing and treating obesity. For such large datasets, machine learning provides sophisticated and elegant tools to describe, classify and predict obesity-related risks and outcomes. Here, we review machine learning methods that predict and/or classify such as linear and logistic regression, artificial neural networks, deep learning and decision tree analysis. We also review methods that describe and characterize data such as cluster analysis, principal component analysis, network science and topological data analysis. We introduce each method with a high-level overview followed by examples of successful applications. The algorithms were then applied to National Health and Nutrition Examination Survey to demonstrate methodology, utility and outcomes. The strengths and limitations of each method were also evaluated. This summary of machine learning algorithms provides a unique overview of the state of data analysis applied specifically to obesity. © 2018 World Obesity Federation.

  9. Machine Shop. Student Learning Guide.

    ERIC Educational Resources Information Center

    Palm Beach County Board of Public Instruction, West Palm Beach, FL.

    This student learning guide contains eight modules for completing a course in machine shop. It is designed especially for use in Palm Beach County, Florida. Each module covers one task, and consists of a purpose, performance objective, enabling objectives, learning activities and resources, information sheets, student self-check with answer key,…

  10. Prediction of antiepileptic drug treatment outcomes using machine learning.

    PubMed

    Colic, Sinisa; Wither, Robert G; Lang, Min; Zhang, Liang; Eubanks, James H; Bardakjian, Berj L

    2017-02-01

    Antiepileptic drug (AED) treatments produce inconsistent outcomes, often necessitating patients to go through several drug trials until a successful treatment can be found. This study proposes the use of machine learning techniques to predict epilepsy treatment outcomes of commonly used AEDs. Machine learning algorithms were trained and evaluated using features obtained from intracranial electroencephalogram (iEEG) recordings of the epileptiform discharges observed in Mecp2-deficient mouse model of the Rett Syndrome. Previous work have linked the presence of cross-frequency coupling (I CFC ) of the delta (2-5 Hz) rhythm with the fast ripple (400-600 Hz) rhythm in epileptiform discharges. Using the I CFC to label post-treatment outcomes we compared support vector machines (SVMs) and random forest (RF) machine learning classifiers for providing likelihood scores of successful treatment outcomes. (a) There was heterogeneity in AED treatment outcomes, (b) machine learning techniques could be used to rank the efficacy of AEDs by estimating likelihood scores for successful treatment outcome, (c) I CFC features yielded the most effective a priori identification of appropriate AED treatment, and (d) both classifiers performed comparably. Machine learning approaches yielded predictions of successful drug treatment outcomes which in turn could reduce the burdens of drug trials and lead to substantial improvements in patient quality of life.

  11. Comparison of Models for the Prediction of Medical Costs of Spinal Fusion in Taiwan Diagnosis-Related Groups by Machine Learning Algorithms.

    PubMed

    Kuo, Ching-Yen; Yu, Liang-Chin; Chen, Hou-Chaung; Chan, Chien-Lung

    2018-01-01

    The aims of this study were to compare the performance of machine learning methods for the prediction of the medical costs associated with spinal fusion in terms of profit or loss in Taiwan Diagnosis-Related Groups (Tw-DRGs) and to apply these methods to explore the important factors associated with the medical costs of spinal fusion. A data set was obtained from a regional hospital in Taoyuan city in Taiwan, which contained data from 2010 to 2013 on patients of Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities). Naïve-Bayesian, support vector machines, logistic regression, C4.5 decision tree, and random forest methods were employed for prediction using WEKA 3.8.1. Five hundred thirty-two cases were categorized as belonging to the Tw-DRG49702 group. The mean medical cost was US $4,549.7, and the mean age of the patients was 62.4 years. The mean length of stay was 9.3 days. The length of stay was an important variable in terms of determining medical costs for patients undergoing spinal fusion. The random forest method had the best predictive performance in comparison to the other methods, achieving an accuracy of 84.30%, a sensitivity of 71.4%, a specificity of 92.2%, and an AUC of 0.904. Our study demonstrated that the random forest model can be employed to predict the medical costs of Tw-DRG49702, and could inform hospital strategy in terms of increasing the financial management efficiency of this operation.

  12. Large-Scale Machine Learning for Classification and Search

    ERIC Educational Resources Information Center

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  13. Machine-learning in astronomy

    NASA Astrophysics Data System (ADS)

    Hobson, Michael; Graff, Philip; Feroz, Farhan; Lasenby, Anthony

    2014-05-01

    Machine-learning methods may be used to perform many tasks required in the analysis of astronomical data, including: data description and interpretation, pattern recognition, prediction, classification, compression, inference and many more. An intuitive and well-established approach to machine learning is the use of artificial neural networks (NNs), which consist of a group of interconnected nodes, each of which processes information that it receives and then passes this product on to other nodes via weighted connections. In particular, I discuss the first public release of the generic neural network training algorithm, called SkyNet, and demonstrate its application to astronomical problems focusing on its use in the BAMBI package for accelerated Bayesian inference in cosmology, and the identification of gamma-ray bursters. The SkyNet and BAMBI packages, which are fully parallelised using MPI, are available at http://www.mrao.cam.ac.uk/software/.

  14. A review of supervised machine learning applied to ageing research.

    PubMed

    Fabris, Fabio; Magalhães, João Pedro de; Freitas, Alex A

    2017-04-01

    Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.

  15. Machine learning models in breast cancer survival prediction.

    PubMed

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of

  16. AstroML: Python-powered Machine Learning for Astronomy

    NASA Astrophysics Data System (ADS)

    Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.

    2014-01-01

    As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.

  17. Machine-Learning Approach for Design of Nanomagnetic-Based Antennas

    NASA Astrophysics Data System (ADS)

    Gianfagna, Carmine; Yu, Huan; Swaminathan, Madhavan; Pulugurtha, Raj; Tummala, Rao; Antonini, Giulio

    2017-08-01

    We propose a machine-learning approach for design of planar inverted-F antennas with a magneto-dielectric nanocomposite substrate. It is shown that machine-learning techniques can be efficiently used to characterize nanomagnetic-based antennas by accurately mapping the particle radius and volume fraction of the nanomagnetic material to antenna parameters such as gain, bandwidth, radiation efficiency, and resonant frequency. A modified mixing rule model is also presented. In addition, the inverse problem is addressed through machine learning as well, where given the antenna parameters, the corresponding design space of possible material parameters is identified.

  18. A machine learning model with human cognitive biases capable of learning from small and biased datasets.

    PubMed

    Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro

    2018-05-09

    Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.

  19. Learning Machine Learning: A Case Study

    ERIC Educational Resources Information Center

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  20. Behavioral Profiling of Scada Network Traffic Using Machine Learning Algorithms

    DTIC Science & Technology

    2014-03-27

    BEHAVIORAL PROFILING OF SCADA NETWORK TRAFFIC USING MACHINE LEARNING ALGORITHMS THESIS Jessica R. Werling, Captain, USAF AFIT-ENG-14-M-81 DEPARTMENT...subject to copyright protection in the United States. AFIT-ENG-14-M-81 BEHAVIORAL PROFILING OF SCADA NETWORK TRAFFIC USING MACHINE LEARNING ...AFIT-ENG-14-M-81 BEHAVIORAL PROFILING OF SCADA NETWORK TRAFFIC USING MACHINE LEARNING ALGORITHMS Jessica R. Werling, B.S.C.S. Captain, USAF Approved

  1. Prediction of antiepileptic drug treatment outcomes using machine learning

    NASA Astrophysics Data System (ADS)

    Colic, Sinisa; Wither, Robert G.; Lang, Min; Zhang, Liang; Eubanks, James H.; Bardakjian, Berj L.

    2017-02-01

    Objective. Antiepileptic drug (AED) treatments produce inconsistent outcomes, often necessitating patients to go through several drug trials until a successful treatment can be found. This study proposes the use of machine learning techniques to predict epilepsy treatment outcomes of commonly used AEDs. Approach. Machine learning algorithms were trained and evaluated using features obtained from intracranial electroencephalogram (iEEG) recordings of the epileptiform discharges observed in Mecp2-deficient mouse model of the Rett Syndrome. Previous work have linked the presence of cross-frequency coupling (I CFC) of the delta (2-5 Hz) rhythm with the fast ripple (400-600 Hz) rhythm in epileptiform discharges. Using the I CFC to label post-treatment outcomes we compared support vector machines (SVMs) and random forest (RF) machine learning classifiers for providing likelihood scores of successful treatment outcomes. Main results. (a) There was heterogeneity in AED treatment outcomes, (b) machine learning techniques could be used to rank the efficacy of AEDs by estimating likelihood scores for successful treatment outcome, (c) I CFC features yielded the most effective a priori identification of appropriate AED treatment, and (d) both classifiers performed comparably. Significance. Machine learning approaches yielded predictions of successful drug treatment outcomes which in turn could reduce the burdens of drug trials and lead to substantial improvements in patient quality of life.

  2. Machine Learning and Inverse Problem in Geodynamics

    NASA Astrophysics Data System (ADS)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R.

    2017-12-01

    During the past few decades numerical modeling and traditional HPC have been widely deployed in many diverse fields for problem solutions. However, in recent years the rapid emergence of machine learning (ML), a subfield of the artificial intelligence (AI), in many fields of sciences, engineering, and finance seems to mark a turning point in the replacement of traditional modeling procedures with artificial intelligence-based techniques. The study of the circulation in the interior of Earth relies on the study of high pressure mineral physics, geochemistry, and petrology where the number of the mantle parameters is large and the thermoelastic parameters are highly pressure- and temperature-dependent. More complexity arises from the fact that many of these parameters that are incorporated in the numerical models as input parameters are not yet well established. In such complex systems the application of machine learning algorithms can play a valuable role. Our focus in this study is the application of supervised machine learning (SML) algorithms in predicting mantle properties with the emphasis on SML techniques in solving the inverse problem. As a sample problem we focus on the spin transition in ferropericlase and perovskite that may cause slab and plume stagnation at mid-mantle depths. The degree of the stagnation depends on the degree of negative density anomaly at the spin transition zone. The training and testing samples for the machine learning models are produced by the numerical convection models with known magnitudes of density anomaly (as the class labels of the samples). The volume fractions of the stagnated slabs and plumes which can be considered as measures for the degree of stagnation are assigned as sample features. The machine learning models can determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at mid-mantle depths. Employing support vector machine (SVM) algorithms we show that SML techniques

  3. Machine vision and appearance based learning

    NASA Astrophysics Data System (ADS)

    Bernstein, Alexander

    2017-03-01

    Smart algorithms are used in Machine vision to organize or extract high-level information from the available data. The resulted high-level understanding the content of images received from certain visual sensing system and belonged to an appearance space can be only a key first step in solving various specific tasks such as mobile robot navigation in uncertain environments, road detection in autonomous driving systems, etc. Appearance-based learning has become very popular in the field of machine vision. In general, the appearance of a scene is a function of the scene content, the lighting conditions, and the camera position. Mobile robots localization problem in machine learning framework via appearance space analysis is considered. This problem is reduced to certain regression on an appearance manifold problem, and newly regression on manifolds methods are used for its solution.

  4. Contemporary machine learning: techniques for practitioners in the physical sciences

    NASA Astrophysics Data System (ADS)

    Spears, Brian

    2017-10-01

    Machine learning is the science of using computers to find relationships in data without explicitly knowing or programming those relationships in advance. Often without realizing it, we employ machine learning every day as we use our phones or drive our cars. Over the last few years, machine learning has found increasingly broad application in the physical sciences. This most often involves building a model relationship between a dependent, measurable output and an associated set of controllable, but complicated, independent inputs. The methods are applicable both to experimental observations and to databases of simulated output from large, detailed numerical simulations. In this tutorial, we will present an overview of current tools and techniques in machine learning - a jumping-off point for researchers interested in using machine learning to advance their work. We will discuss supervised learning techniques for modeling complicated functions, beginning with familiar regression schemes, then advancing to more sophisticated decision trees, modern neural networks, and deep learning methods. Next, we will cover unsupervised learning and techniques for reducing the dimensionality of input spaces and for clustering data. We'll show example applications from both magnetic and inertial confinement fusion. Along the way, we will describe methods for practitioners to help ensure that their models generalize from their training data to as-yet-unseen test data. We will finally point out some limitations to modern machine learning and speculate on some ways that practitioners from the physical sciences may be particularly suited to help. This work was performed by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  5. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling

    PubMed Central

    Cuperlovic-Culf, Miroslava

    2018-01-01

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649

  6. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.

    PubMed

    Cuperlovic-Culf, Miroslava

    2018-01-11

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.

  7. Study of Environmental Data Complexity using Extreme Learning Machine

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2017-04-01

    The main goals of environmental data science using machine learning algorithm deal, in a broad sense, around the calibration, the prediction and the visualization of hidden relationship between input and output variables. In order to optimize the models and to understand the phenomenon under study, the characterization of the complexity (at different levels) should be taken into account. Therefore, the identification of the linear or non-linear behavior between input and output variables adds valuable information for the knowledge of the phenomenon complexity. The present research highlights and investigates the different issues that can occur when identifying the complexity (linear/non-linear) of environmental data using machine learning algorithm. In particular, the main attention is paid to the description of a self-consistent methodology for the use of Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. By applying two ELM models (with linear and non-linear activation functions) and by comparing their efficiency, quantification of the linearity can be evaluated. The considered approach is accompanied by simulated and real high dimensional and multivariate data case studies. In conclusion, the current challenges and future development in complexity quantification using environmental data mining are discussed. References - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392. - Leuenberger, M., Kanevski, M., 2015. Extreme Learning Machines for spatial environmental data. Computers and Geosciences 85, 64-73.

  8. Adiabatic Quantum Anomaly Detection and Machine Learning

    NASA Astrophysics Data System (ADS)

    Pudenz, Kristen; Lidar, Daniel

    2012-02-01

    We present methods of anomaly detection and machine learning using adiabatic quantum computing. The machine learning algorithm is a boosting approach which seeks to optimally combine somewhat accurate classification functions to create a unified classifier which is much more accurate than its components. This algorithm then becomes the first part of the larger anomaly detection algorithm. In the anomaly detection routine, we first use adiabatic quantum computing to train two classifiers which detect two sets, the overlap of which forms the anomaly class. We call this the learning phase. Then, in the testing phase, the two learned classification functions are combined to form the final Hamiltonian for an adiabatic quantum computation, the low energy states of which represent the anomalies in a binary vector space.

  9. Machine Learning: A Crucial Tool for Sensor Design

    PubMed Central

    Zhao, Weixiang; Bhushan, Abhinav; Santamaria, Anthony D.; Simon, Melinda G.; Davis, Cristina E.

    2009-01-01

    Sensors have been widely used for disease diagnosis, environmental quality monitoring, food quality control, industrial process analysis and control, and other related fields. As a key tool for sensor data analysis, machine learning is becoming a core part of novel sensor design. Dividing a complete machine learning process into three steps: data pre-treatment, feature extraction and dimension reduction, and system modeling, this paper provides a review of the methods that are widely used for each step. For each method, the principles and the key issues that affect modeling results are discussed. After reviewing the potential problems in machine learning processes, this paper gives a summary of current algorithms in this field and provides some feasible directions for future studies. PMID:20191110

  10. Machine learning for Big Data analytics in plants.

    PubMed

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Advances in Machine Learning and Data Mining for Astronomy

    NASA Astrophysics Data System (ADS)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  12. Development of E-Learning Materials for Machining Safety Education

    NASA Astrophysics Data System (ADS)

    Nakazawa, Tsuyoshi; Mita, Sumiyoshi; Matsubara, Masaaki; Takashima, Takeo; Tanaka, Koichi; Izawa, Satoru; Kawamura, Takashi

    We developed two e-learning materials for Manufacturing Practice safety education: movie learning materials and hazard-detection learning materials. Using these video and sound media, students can learn how to operate machines safely with movie learning materials, which raise the effectiveness of preparation and review for manufacturing practice. Using these materials, students can realize safety operation well. Students can apply knowledge learned in lectures to the detection of hazards and use study methods for hazard detection during machine operation using the hazard-detection learning materials. Particularly, the hazard-detection learning materials raise students‧ safety consciousness and increase students‧ comprehension of knowledge from lectures and comprehension of operations during Manufacturing Practice.

  13. Modeling Geomagnetic Variations using a Machine Learning Framework

    NASA Astrophysics Data System (ADS)

    Cheung, C. M. M.; Handmer, C.; Kosar, B.; Gerules, G.; Poduval, B.; Mackintosh, G.; Munoz-Jaramillo, A.; Bobra, M.; Hernandez, T.; McGranaghan, R. M.

    2017-12-01

    We present a framework for data-driven modeling of Heliophysics time series data. The Solar Terrestrial Interaction Neural net Generator (STING) is an open source python module built on top of state-of-the-art statistical learning frameworks (traditional machine learning methods as well as deep learning). To showcase the capability of STING, we deploy it for the problem of predicting the temporal variation of geomagnetic fields. The data used includes solar wind measurements from the OMNI database and geomagnetic field data taken by magnetometers at US Geological Survey observatories. We examine the predictive capability of different machine learning techniques (recurrent neural networks, support vector machines) for a range of forecasting times (minutes to 12 hours). STING is designed to be extensible to other types of data. We show how STING can be used on large sets of data from different sensors/observatories and adapted to tackle other problems in Heliophysics.

  14. An iterative learning control method with application for CNC machine tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, D.I.; Kim, S.

    1996-01-01

    A proportional, integral, and derivative (PID) type iterative learning controller is proposed for precise tracking control of industrial robots and computer numerical controller (CNC) machine tools performing repetitive tasks. The convergence of the output error by the proposed learning controller is guaranteed under a certain condition even when the system parameters are not known exactly and unknown external disturbances exist. As the proposed learning controller is repeatedly applied to the industrial robot or the CNC machine tool with the path-dependent repetitive task, the distance difference between the desired path and the actual tracked or machined path, which is one ofmore » the most significant factors in the evaluation of control performance, is progressively reduced. The experimental results demonstrate that the proposed learning controller can improve machining accuracy when the CNC machine tool performs repetitive machining tasks.« less

  15. Robust Fault Diagnosis in Electric Drives Using Machine Learning

    DTIC Science & Technology

    2004-09-08

    detection of fault conditions of the inverter. A machine learning framework is developed to systematically select torque-speed domain operation points...were used to generate various fault condition data for machine learning . The technique is viable for accurate, reliable and fast fault detection in electric drives.

  16. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    ERIC Educational Resources Information Center

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  17. Machine Learning Toolkit for Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less

  18. Introduction to machine learning for brain imaging.

    PubMed

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. Copyright © 2010 Elsevier Inc. All rights reserved.

  19. Machine learning molecular dynamics for the simulation of infrared spectra.

    PubMed

    Gastegger, Michael; Behler, Jörg; Marquetand, Philipp

    2017-10-01

    Machine learning has emerged as an invaluable tool in many research areas. In the present work, we harness this power to predict highly accurate molecular infrared spectra with unprecedented computational efficiency. To account for vibrational anharmonic and dynamical effects - typically neglected by conventional quantum chemistry approaches - we base our machine learning strategy on ab initio molecular dynamics simulations. While these simulations are usually extremely time consuming even for small molecules, we overcome these limitations by leveraging the power of a variety of machine learning techniques, not only accelerating simulations by several orders of magnitude, but also greatly extending the size of systems that can be treated. To this end, we develop a molecular dipole moment model based on environment dependent neural network charges and combine it with the neural network potential approach of Behler and Parrinello. Contrary to the prevalent big data philosophy, we are able to obtain very accurate machine learning models for the prediction of infrared spectra based on only a few hundreds of electronic structure reference points. This is made possible through the use of molecular forces during neural network potential training and the introduction of a fully automated sampling scheme. We demonstrate the power of our machine learning approach by applying it to model the infrared spectra of a methanol molecule, n -alkanes containing up to 200 atoms and the protonated alanine tripeptide, which at the same time represents the first application of machine learning techniques to simulate the dynamics of a peptide. In all of these case studies we find an excellent agreement between the infrared spectra predicted via machine learning models and the respective theoretical and experimental spectra.

  20. Component Pin Recognition Using Algorithms Based on Machine Learning

    NASA Astrophysics Data System (ADS)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  1. Paradigms for Realizing Machine Learning Algorithms.

    PubMed

    Agneeswaran, Vijay Srinivas; Tonpay, Pranay; Tiwary, Jayati

    2013-12-01

    The article explains the three generations of machine learning algorithms-with all three trying to operate on big data. The first generation tools are SAS, SPSS, etc., while second generation realizations include Mahout and RapidMiner (that work over Hadoop), and the third generation paradigms include Spark and GraphLab, among others. The essence of the article is that for a number of machine learning algorithms, it is important to look beyond the Hadoop's Map-Reduce paradigm in order to make them work on big data. A number of promising contenders have emerged in the third generation that can be exploited to realize deep analytics on big data.

  2. A Machine LearningFramework to Forecast Wave Conditions

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; James, S. C.; O'Donncha, F.

    2017-12-01

    Recently, significant effort has been undertaken to quantify and extract wave energy because it is renewable, environmental friendly, abundant, and often close to population centers. However, a major challenge is the ability to accurately and quickly predict energy production, especially across a 48-hour cycle. Accurate forecasting of wave conditions is a challenging undertaking that typically involves solving the spectral action-balance equation on a discretized grid with high spatial resolution. The nature of the computations typically demands high-performance computing infrastructure. Using a case-study site at Monterey Bay, California, a machine learning framework was trained to replicate numerically simulated wave conditions at a fraction of the typical computational cost. Specifically, the physics-based Simulating WAves Nearshore (SWAN) model, driven by measured wave conditions, nowcast ocean currents, and wind data, was used to generate training data for machine learning algorithms. The model was run between April 1st, 2013 and May 31st, 2017 generating forecasts at three-hour intervals yielding 11,078 distinct model outputs. SWAN-generated fields of 3,104 wave heights and a characteristic period could be replicated through simple matrix multiplications using the mapping matrices from machine learning algorithms. In fact, wave-height RMSEs from the machine learning algorithms (9 cm) were less than those for the SWAN model-verification exercise where those simulations were compared to buoy wave data within the model domain (>40 cm). The validated machine learning approach, which acts as an accurate surrogate for the SWAN model, can now be used to perform real-time forecasts of wave conditions for the next 48 hours using available forecasted boundary wave conditions, ocean currents, and winds. This solution has obvious applications to wave-energy generation as accurate wave conditions can be forecasted with over a three-order-of-magnitude reduction in

  3. Comparison between extreme learning machine and wavelet neural networks in data classification

    NASA Astrophysics Data System (ADS)

    Yahia, Siwar; Said, Salwa; Jemai, Olfa; Zaied, Mourad; Ben Amar, Chokri

    2017-03-01

    Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.

  4. Ten quick tips for machine learning in computational biology.

    PubMed

    Chicco, Davide

    2017-01-01

    Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. We believe our ten suggestions can strongly help any machine learning practitioner to carry on a successful project in computational biology and related sciences.

  5. Newton Methods for Large Scale Problems in Machine Learning

    ERIC Educational Resources Information Center

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  6. An active role for machine learning in drug development

    PubMed Central

    Murphy, Robert F.

    2014-01-01

    Due to the complexity of biological systems, cutting-edge machine-learning methods will be critical for future drug development. In particular, machine-vision methods to extract detailed information from imaging assays and active-learning methods to guide experimentation will be required to overcome the dimensionality problem in drug development. PMID:21587249

  7. Machine learning-based dual-energy CT parametric mapping

    NASA Astrophysics Data System (ADS)

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W.; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Helo, Rose Al; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C.; Rassouli, Negin; Gilkeson, Robert C.; Traughber, Bryan J.; Cheng, Chee-Wai; Muzic, Raymond F., Jr.

    2018-06-01

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Zeff), relative electron density (ρ e), mean excitation energy (I x ), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 s. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency.

  8. Machine learning-based dual-energy CT parametric mapping.

    PubMed

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Al Helo, Rose; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C; Rassouli, Negin; Gilkeson, Robert C; Traughber, Bryan J; Cheng, Chee-Wai; Muzic, Raymond F

    2018-06-08

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Z eff ), relative electron density (ρ e ), mean excitation energy (I x ), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 s. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency.

  9. BELM: Bayesian extreme learning machine.

    PubMed

    Soria-Olivas, Emilio; Gómez-Sanchis, Juan; Martín, José D; Vila-Francés, Joan; Martínez, Marcelino; Magdalena, José R; Serrano, Antonio J

    2011-03-01

    The theory of extreme learning machine (ELM) has become very popular on the last few years. ELM is a new approach for learning the parameters of the hidden layers of a multilayer neural network (as the multilayer perceptron or the radial basis function neural network). Its main advantage is the lower computational cost, which is especially relevant when dealing with many patterns defined in a high-dimensional space. This brief proposes a bayesian approach to ELM, which presents some advantages over other approaches: it allows the introduction of a priori knowledge; obtains the confidence intervals (CIs) without the need of applying methods that are computationally intensive, e.g., bootstrap; and presents high generalization capabilities. Bayesian ELM is benchmarked against classical ELM in several artificial and real datasets that are widely used for the evaluation of machine learning algorithms. Achieved results show that the proposed approach produces a competitive accuracy with some additional advantages, namely, automatic production of CIs, reduction of probability of model overfitting, and use of a priori knowledge.

  10. Osteoporosis risk prediction using machine learning and conventional methods.

    PubMed

    Kim, Sung Kean; Yoo, Tae Keun; Oh, Ein; Kim, Deok Won

    2013-01-01

    A number of clinical decision tools for osteoporosis risk assessment have been developed to select postmenopausal women for the measurement of bone mineral density. We developed and validated machine learning models with the aim of more accurately identifying the risk of osteoporosis in postmenopausal women, and compared with the ability of a conventional clinical decision tool, osteoporosis self-assessment tool (OST). We collected medical records from Korean postmenopausal women based on the Korea National Health and Nutrition Surveys (KNHANES V-1). The training data set was used to construct models based on popular machine learning algorithms such as support vector machines (SVM), random forests (RF), artificial neural networks (ANN), and logistic regression (LR) based on various predictors associated with low bone density. The learning models were compared with OST. SVM had significantly better area under the curve (AUC) of the receiver operating characteristic (ROC) than ANN, LR, and OST. Validation on the test set showed that SVM predicted osteoporosis risk with an AUC of 0.827, accuracy of 76.7%, sensitivity of 77.8%, and specificity of 76.0%. We were the first to perform comparisons of the performance of osteoporosis prediction between the machine learning and conventional methods using population-based epidemiological data. The machine learning methods may be effective tools for identifying postmenopausal women at high risk for osteoporosis.

  11. Bidirectional extreme learning machine for regression problem and its learning effectiveness.

    PubMed

    Yang, Yimin; Wang, Yaonan; Yuan, Xiaofang

    2012-09-01

    It is clear that the learning effectiveness and learning speed of neural networks are in general far slower than required, which has been a major bottleneck for many applications. Recently, a simple and efficient learning method, referred to as extreme learning machine (ELM), was proposed by Huang , which has shown that, compared to some conventional methods, the training time of neural networks can be reduced by a thousand times. However, one of the open problems in ELM research is whether the number of hidden nodes can be further reduced without affecting learning effectiveness. This brief proposes a new learning algorithm, called bidirectional extreme learning machine (B-ELM), in which some hidden nodes are not randomly selected. In theory, this algorithm tends to reduce network output error to 0 at an extremely early learning stage. Furthermore, we find a relationship between the network output error and the network output weights in the proposed B-ELM. Simulation results demonstrate that the proposed method can be tens to hundreds of times faster than other incremental ELM algorithms.

  12. Feasibility of Active Machine Learning for Multiclass Compound Classification.

    PubMed

    Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias

    2016-01-25

    A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.

  13. Recent developments in machine learning applications in landslide susceptibility mapping

    NASA Astrophysics Data System (ADS)

    Lun, Na Kai; Liew, Mohd Shahir; Matori, Abdul Nasir; Zawawi, Noor Amila Wan Abdullah

    2017-11-01

    While the prediction of spatial distribution of potential landslide occurrences is a primary interest in landslide hazard mitigation, it remains a challenging task. To overcome the scarceness of complete, sufficiently detailed geomorphological attributes and environmental conditions, various machine-learning techniques are increasingly applied to effectively map landslide susceptibility for large regions. Nevertheless, limited review papers are devoted to this field, particularly on the various domain specific applications of machine learning techniques. Available literature often report relatively good predictive performance, however, papers discussing the limitations of each approaches are quite uncommon. The foremost aim of this paper is to narrow these gaps in literature and to review up-to-date machine learning and ensemble learning techniques applied in landslide susceptibility mapping. It provides new readers an introductory understanding on the subject matter and researchers a contemporary review of machine learning advancements alongside the future direction of these techniques in the landslide mitigation field.

  14. Application of machine learning methods in bioinformatics

    NASA Astrophysics Data System (ADS)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  15. What is the machine learning?

    NASA Astrophysics Data System (ADS)

    Chang, Spencer; Cohen, Timothy; Ostdiek, Bryan

    2018-03-01

    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables—aided by physical intuition—that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus nonlinear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.

  16. Machine learning for micro-tomography

    NASA Astrophysics Data System (ADS)

    Parkinson, Dilworth Y.; Pelt, Daniël. M.; Perciano, Talita; Ushizima, Daniela; Krishnan, Harinarayan; Barnard, Harold S.; MacDowell, Alastair A.; Sethian, James

    2017-09-01

    Machine learning has revolutionized a number of fields, but many micro-tomography users have never used it for their work. The micro-tomography beamline at the Advanced Light Source (ALS), in collaboration with the Center for Applied Mathematics for Energy Research Applications (CAMERA) at Lawrence Berkeley National Laboratory, has now deployed a series of tools to automate data processing for ALS users using machine learning. This includes new reconstruction algorithms, feature extraction tools, and image classification and recommen- dation systems for scientific image. Some of these tools are either in automated pipelines that operate on data as it is collected or as stand-alone software. Others are deployed on computing resources at Berkeley Lab-from workstations to supercomputers-and made accessible to users through either scripting or easy-to-use graphical interfaces. This paper presents a progress report on this work.

  17. A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.

    PubMed

    Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei

    2017-09-21

    In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.

  18. A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors

    PubMed Central

    Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei

    2017-01-01

    In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors. PMID:28934163

  19. Machine learning phases of matter

    NASA Astrophysics Data System (ADS)

    Carrasquilla, Juan; Stoudenmire, Miles; Melko, Roger

    We show how the technology that allows automatic teller machines read hand-written digits in cheques can be used to encode and recognize phases of matter and phase transitions in many-body systems. In particular, we analyze the (quasi-)order-disorder transitions in the classical Ising and XY models. Furthermore, we successfully use machine learning to study classical Z2 gauge theories that have important technological application in the coming wave of quantum information technologies and whose phase transitions have no conventional order parameter.

  20. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare.

    PubMed

    Mozaffari-Kermani, Mehran; Sur-Kolay, Susmita; Raghunathan, Anand; Jha, Niraj K

    2015-11-01

    Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.

  1. Informatics and machine learning to define the phenotype.

    PubMed

    Basile, Anna Okula; Ritchie, Marylyn DeRiggi

    2018-03-01

    For the past decade, the focus of complex disease research has been the genotype. From technological advancements to the development of analysis methods, great progress has been made. However, advances in our definition of the phenotype have remained stagnant. Phenotype characterization has recently emerged as an exciting area of informatics and machine learning. The copious amounts of diverse biomedical data that have been collected may be leveraged with data-driven approaches to elucidate trait-related features and patterns. Areas covered: In this review, the authors discuss the phenotype in traditional genetic associations and the challenges this has imposed.Approaches for phenotype refinement that can aid in more accurate characterization of traits are also discussed. Further, the authors highlight promising machine learning approaches for establishing a phenotype and the challenges of electronic health record (EHR)-derived data. Expert commentary: The authors hypothesize that through unsupervised machine learning, data-driven approaches can be used to define phenotypes rather than relying on expert clinician knowledge. Through the use of machine learning and an unbiased set of features extracted from clinical repositories, researchers will have the potential to further understand complex traits and identify patient subgroups. This knowledge may lead to more preventative and precise clinical care.

  2. Learning Activity Packets for Milling Machines. Unit II--Horizontal Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to set up and operate a horizontal mill. Tasks addressed in the LAP include mounting style "A" or "B" arbors and adjusting arbor…

  3. Machine Learning in the Big Data Era: Are We There Yet?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sukumar, Sreenivas Rangan

    In this paper, we discuss the machine learning challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are machine learning algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstandingmore » challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security and healthcare to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less

  4. Learning Machine, Vietnamese Based Human-Computer Interface.

    ERIC Educational Resources Information Center

    Northwest Regional Educational Lab., Portland, OR.

    The sixth session of IT@EDU98 consisted of seven papers on the topic of the learning machine--Vietnamese based human-computer interface, and was chaired by Phan Viet Hoang (Informatics College, Singapore). "Knowledge Based Approach for English Vietnamese Machine Translation" (Hoang Kiem, Dinh Dien) presents the knowledge base approach,…

  5. A distributed algorithm for machine learning

    NASA Astrophysics Data System (ADS)

    Chen, Shihong

    2018-04-01

    This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.

  6. Correct machine learning on protein sequences: a peer-reviewing perspective.

    PubMed

    Walsh, Ian; Pollastri, Gianluca; Tosatto, Silvio C E

    2016-09-01

    Machine learning methods are becoming increasingly popular to predict protein features from sequences. Machine learning in bioinformatics can be powerful but carries also the risk of introducing unexpected biases, which may lead to an overestimation of the performance. This article espouses a set of guidelines to allow both peer reviewers and authors to avoid common machine learning pitfalls. Understanding biology is necessary to produce useful data sets, which have to be large and diverse. Separating the training and test process is imperative to avoid over-selling method performance, which is also dependent on several hidden parameters. A novel predictor has always to be compared with several existing methods, including simple baseline strategies. Using the presented guidelines will help nonspecialists to appreciate the critical issues in machine learning. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  7. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  8. Generative Modeling for Machine Learning on the D-Wave

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thulasidasan, Sunil

    These are slides on Generative Modeling for Machine Learning on the D-Wave. The following topics are detailed: generative models; Boltzmann machines: a generative model; restricted Boltzmann machines; learning parameters: RBM training; practical ways to train RBM; D-Wave as a Boltzmann sampler; mapping RBM onto the D-Wave; Chimera restricted RBM; mapping binary RBM to Ising model; experiments; data; D-Wave effective temperature, parameters noise, etc.; experiments: contrastive divergence (CD) 1 step; after 50 steps of CD; after 100 steps of CD; D-Wave (experiments 1, 2, 3); D-Wave observations.

  9. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    PubMed

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  10. A comparative analysis of support vector machines and extreme learning machines.

    PubMed

    Liu, Xueyi; Gao, Chuanhou; Li, Ping

    2012-09-01

    The theory of extreme learning machines (ELMs) has recently become increasingly popular. As a new learning algorithm for single-hidden-layer feed-forward neural networks, an ELM offers the advantages of low computational cost, good generalization ability, and ease of implementation. Hence the comparison and model selection between ELMs and other kinds of state-of-the-art machine learning approaches has become significant and has attracted many research efforts. This paper performs a comparative analysis of the basic ELMs and support vector machines (SVMs) from two viewpoints that are different from previous works: one is the Vapnik-Chervonenkis (VC) dimension, and the other is their performance under different training sample sizes. It is shown that the VC dimension of an ELM is equal to the number of hidden nodes of the ELM with probability one. Additionally, their generalization ability and computational complexity are exhibited with changing training sample size. ELMs have weaker generalization ability than SVMs for small sample but can generalize as well as SVMs for large sample. Remarkably, great superiority in computational speed especially for large-scale sample problems is found in ELMs. The results obtained can provide insight into the essential relationship between them, and can also serve as complementary knowledge for their past experimental and theoretical comparisons. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. A Developmental Approach to Machine Learning?

    PubMed Central

    Smith, Linda B.; Slone, Lauren K.

    2017-01-01

    Visual learning depends on both the algorithms and the training material. This essay considers the natural statistics of infant- and toddler-egocentric vision. These natural training sets for human visual object recognition are very different from the training data fed into machine vision systems. Rather than equal experiences with all kinds of things, toddlers experience extremely skewed distributions with many repeated occurrences of a very few things. And though highly variable when considered as a whole, individual views of things are experienced in a specific order – with slow, smooth visual changes moment-to-moment, and developmentally ordered transitions in scene content. We propose that the skewed, ordered, biased visual experiences of infants and toddlers are the training data that allow human learners to develop a way to recognize everything, both the pervasively present entities and the rarely encountered ones. The joint consideration of real-world statistics for learning by researchers of human and machine learning seems likely to bring advances in both disciplines. PMID:29259573

  12. Evaluation of an Integrated Multi-Task Machine Learning System with Humans in the Loop

    DTIC Science & Technology

    2007-01-01

    machine learning components natural language processing, and optimization...was examined with a test explicitly developed to measure the impact of integrated machine learning when used by a human user in a real world setting...study revealed that integrated machine learning does produce a positive impact on overall performance. This paper also discusses how specific machine learning components contributed to human-system

  13. Application of Machine Learning to Rotorcraft Health Monitoring

    NASA Technical Reports Server (NTRS)

    Cody, Tyler; Dempsey, Paula J.

    2017-01-01

    Machine learning is a powerful tool for data exploration and model building with large data sets. This project aimed to use machine learning techniques to explore the inherent structure of data from rotorcraft gear tests, relationships between features and damage states, and to build a system for predicting gear health for future rotorcraft transmission applications. Classical machine learning techniques are difficult, if not irresponsible to apply to time series data because many make the assumption of independence between samples. To overcome this, Hidden Markov Models were used to create a binary classifier for identifying scuffing transitions and Recurrent Neural Networks were used to leverage long distance relationships in predicting discrete damage states. When combined in a workflow, where the binary classifier acted as a filter for the fatigue monitor, the system was able to demonstrate accuracy in damage state prediction and scuffing identification. The time dependent nature of the data restricted data exploration to collecting and analyzing data from the model selection process. The limited amount of available data was unable to give useful information, and the division of training and testing sets tended to heavily influence the scores of the models across combinations of features and hyper-parameters. This work built a framework for tracking scuffing and fatigue on streaming data and demonstrates that machine learning has much to offer rotorcraft health monitoring by using Bayesian learning and deep learning methods to capture the time dependent nature of the data. Suggested future work is to implement the framework developed in this project using a larger variety of data sets to test the generalization capabilities of the models and allow for data exploration.

  14. RG-inspired machine learning for lattice field theory

    NASA Astrophysics Data System (ADS)

    Foreman, Sam; Giedt, Joel; Meurice, Yannick; Unmuth-Yockey, Judah

    2018-03-01

    Machine learning has been a fast growing field of research in several areas dealing with large datasets. We report recent attempts to use renormalization group (RG) ideas in the context of machine learning. We examine coarse graining procedures for perceptron models designed to identify the digits of the MNIST data. We discuss the correspondence between principal components analysis (PCA) and RG flows across the transition for worm configurations of the 2D Ising model. Preliminary results regarding the logarithmic divergence of the leading PCA eigenvalue were presented at the conference. More generally, we discuss the relationship between PCA and observables in Monte Carlo simulations and the possibility of reducing the number of learning parameters in supervised learning based on RG inspired hierarchical ansatzes.

  15. Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆

    PubMed Central

    Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-01-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969

  16. Energy-free machine learning force field for aluminum.

    PubMed

    Kruglov, Ivan; Sergeev, Oleg; Yanilkin, Alexey; Oganov, Artem R

    2017-08-17

    We used the machine learning technique of Li et al. (PRL 114, 2015) for molecular dynamics simulations. Atomic configurations were described by feature matrix based on internal vectors, and linear regression was used as a learning technique. We implemented this approach in the LAMMPS code. The method was applied to crystalline and liquid aluminum and uranium at different temperatures and densities, and showed the highest accuracy among different published potentials. Phonon density of states, entropy and melting temperature of aluminum were calculated using this machine learning potential. The results are in excellent agreement with experimental data and results of full ab initio calculations.

  17. The application of machine learning techniques in the clinical drug therapy.

    PubMed

    Meng, Huan-Yu; Jin, Wan-Lin; Yan, Cheng-Kai; Yang, Huan

    2018-05-25

    The development of a novel drug is an extremely complicated process that includes the target identification, design and manufacture, and proper therapy of the novel drug, as well as drug dose selection, drug efficacy evaluation, and adverse drug reaction control. Due to the limited resources, high costs, long duration, and low hit-to-lead ratio in the development of pharmacogenetics and computer technology, machine learning techniques have assisted novel drug development and have gradually received more attention by researchers. According to current research, machine learning techniques are widely applied in the process of the discovery of new drugs and novel drug targets, the decision surrounding proper therapy and drug dose, and the prediction of drug efficacy and adverse drug reactions. In this article, we discussed the history, workflow, and advantages and disadvantages of machine learning techniques in the processes mentioned above. Although the advantages of machine learning techniques are fairly obvious, the application of machine learning techniques is currently limited. With further research, the application of machine techniques in drug development could be much more widespread and could potentially be one of the major methods used in drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Machine-Learning Algorithms to Code Public Health Spending Accounts

    PubMed Central

    Leider, Jonathon P.; Resnick, Beth A.; Alfonso, Y. Natalia; Bishai, David

    2017-01-01

    Objectives: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. Methods: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Results: Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Conclusions: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence

  19. Machine-Learning Algorithms to Code Public Health Spending Accounts.

    PubMed

    Brady, Eoghan S; Leider, Jonathon P; Resnick, Beth A; Alfonso, Y Natalia; Bishai, David

    Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

  20. Machine learning strategies for systems with invariance properties

    NASA Astrophysics Data System (ADS)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  1. Machine learning strategies for systems with invariance properties

    DOE PAGES

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    2016-05-06

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neuralmore » networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.« less

  2. Machine learning strategies for systems with invariance properties

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neuralmore » networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.« less

  3. Machine Learning in Intrusion Detection

    DTIC Science & Technology

    2005-07-01

    machine learning tasks. Anomaly detection provides the core technology for a broad spectrum of security-centric applications. In this dissertation, we examine various aspects of anomaly based intrusion detection in computer security. First, we present a new approach to learn program behavior for intrusion detection. Text categorization techniques are adopted to convert each process to a vector and calculate the similarity between two program activities. Then the k-nearest neighbor classifier is employed to classify program behavior as normal or intrusive. We demonstrate

  4. Machine Learning Techniques for Stellar Light Curve Classification

    NASA Astrophysics Data System (ADS)

    Hinners, Trisha A.; Tat, Kevin; Thorp, Rachel

    2018-07-01

    We apply machine learning techniques in an attempt to predict and classify stellar properties from noisy and sparse time-series data. We preprocessed over 94 GB of Kepler light curves from the Mikulski Archive for Space Telescopes (MAST) to classify according to 10 distinct physical properties using both representation learning and feature engineering approaches. Studies using machine learning in the field have been primarily done on simulated data, making our study one of the first to use real light-curve data for machine learning approaches. We tuned our data using previous work with simulated data as a template and achieved mixed results between the two approaches. Representation learning using a long short-term memory recurrent neural network produced no successful predictions, but our work with feature engineering was successful for both classification and regression. In particular, we were able to achieve values for stellar density, stellar radius, and effective temperature with low error (∼2%–4%) and good accuracy (∼75%) for classifying the number of transits for a given star. The results show promise for improvement for both approaches upon using larger data sets with a larger minority class. This work has the potential to provide a foundation for future tools and techniques to aid in the analysis of astrophysical data.

  5. General Machine Learning Model, Review, and Experimental-Theoretic Study of Magnolol Activity in Enterotoxigenic Induced Oxidative Stress.

    PubMed

    Deng, Yanli; Liu, Yong; Tang, Shaoxun; Zhou, Chuanshe; Han, Xuefeng; Xiao, Wenjun; Pastur-Romay, Lucas Anton; Vazquez-Naya, Jose Manuel; Loureiro, Javier Pereira; Munteanu, Cristian R; Tan, Zhiliang

    2017-01-01

    This study evaluated the antioxidative effects of magnolol based on the mouse model induced by Enterotoxigenic Escherichia coli (E. coli, ETEC). All experimental mice were equally treated with ETEC suspensions (3.45×109 CFU/ml) after oral administration of magnolol for 7 days at the dose of 0, 100, 300 and 500 mg/kg Body Weight (BW), respectively. The oxidative metabolites and antioxidases for each sample (organism of mouse) were determined: Malondialdehyde (MDA), Nitric Oxide (NO), Glutathione (GSH), Myeloperoxidase (MPO), Catalase (CAT), Superoxide Dismutase (SOD), and Glutathione Peroxidase (GPx). In addition, we also determined the corresponding mRNA expressions of CAT, SOD and GPx as well as the Total Antioxidant Capacity (T-AOC). The experiment was completed with a theoretical study that predicts a series of 79 ChEMBL activities of magnolol with 47 proteins in 18 organisms using a Quantitative Structure- Activity Relationship (QSAR) classifier based on the Moving Averages (MAs) of Rcpi descriptors in three types of experimental conditions (biological activity with specific units, protein target and organisms). Six Machine Learning methods from Weka software were tested and the best QSAR classification model was provided by Random Forest with True Positive Rate (TPR) of 0.701 and Area under Receiver Operating Characteristic (AUROC) of 0.790 (test subset, 10-fold crossvalidation). The model is predicting if the new ChEMBL activities are greater or lower than the average values for the magnolol targets in different organisms. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  6. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    ERIC Educational Resources Information Center

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  7. A comparison of machine learning and Bayesian modelling for molecular serotyping.

    PubMed

    Newton, Richard; Wernisch, Lorenz

    2017-08-11

    Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological

  8. Exploring the Function Space of Deep-Learning Machines

    NASA Astrophysics Data System (ADS)

    Li, Bo; Saad, David

    2018-06-01

    The function space of deep-learning machines is investigated by studying growth in the entropy of functions of a given error with respect to a reference function, realized by a deep-learning machine. Using physics-inspired methods we study both sparsely and densely connected architectures to discover a layerwise convergence of candidate functions, marked by a corresponding reduction in entropy when approaching the reference function, gain insight into the importance of having a large number of layers, and observe phase transitions as the error increases.

  9. ML-o-Scope: A Diagnostic Visualization System for Deep Machine Learning Pipelines

    DTIC Science & Technology

    2014-05-16

    ML-o-scope: a diagnostic visualization system for deep machine learning pipelines Daniel Bruckner Electrical Engineering and Computer Sciences... machine learning pipelines 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f...the system as a support for tuning large scale object-classification pipelines. 1 Introduction A new generation of pipelined machine learning models

  10. ClearTK 2.0: Design Patterns for Machine Learning in UIMA.

    PubMed

    Bethard, Steven; Ogren, Philip; Becker, Lee

    2014-05-01

    ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has followed a number of important design principles including: conceptually simple annotator interfaces, readable pipeline descriptions, minimal collection readers, type system agnostic code, modules organized for ease of import, and assisting user comprehension of the complex UIMA framework.

  11. Machine learning approaches in medical image analysis: From detection to diagnosis.

    PubMed

    de Bruijne, Marleen

    2016-10-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols, learning from weak labels, and interpretation and evaluation of results. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction.

    PubMed

    Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel

    2010-01-15

    With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.

  13. Quantum machine learning for quantum anomaly detection

    NASA Astrophysics Data System (ADS)

    Liu, Nana; Rebentrost, Patrick

    2018-04-01

    Anomaly detection is used for identifying data that deviate from "normal" data patterns. Its usage on classical data finds diverse applications in many important areas such as finance, fraud detection, medical diagnoses, data cleaning, and surveillance. With the advent of quantum technologies, anomaly detection of quantum data, in the form of quantum states, may become an important component of quantum applications. Machine-learning algorithms are playing pivotal roles in anomaly detection using classical data. Two widely used algorithms are the kernel principal component analysis and the one-class support vector machine. We find corresponding quantum algorithms to detect anomalies in quantum states. We show that these two quantum algorithms can be performed using resources that are logarithmic in the dimensionality of quantum states. For pure quantum states, these resources can also be logarithmic in the number of quantum states used for training the machine-learning algorithm. This makes these algorithms potentially applicable to big quantum data applications.

  14. Predicting the dissolution kinetics of silicate glasses using machine learning

    NASA Astrophysics Data System (ADS)

    Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu

    2018-05-01

    Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.

  15. Novel jet observables from machine learning

    NASA Astrophysics Data System (ADS)

    Datta, Kaustuv; Larkoski, Andrew J.

    2018-03-01

    Previous studies have demonstrated the utility and applicability of machine learning techniques to jet physics. In this paper, we construct new observables for the discrimination of jets from different originating particles exclusively from information identified by the machine. The approach we propose is to first organize information in the jet by resolved phase space and determine the effective N -body phase space at which discrimination power saturates. This then allows for the construction of a discrimination observable from the N -body phase space coordinates. A general form of this observable can be expressed with numerous parameters that are chosen so that the observable maximizes the signal vs. background likelihood. Here, we illustrate this technique applied to discrimination of H\\to b\\overline{b} decays from massive g\\to b\\overline{b} splittings. We show that for a simple parametrization, we can construct an observable that has discrimination power comparable to, or better than, widely-used observables motivated from theory considerations. For the case of jets on which modified mass-drop tagger grooming is applied, the observable that the machine learns is essentially the angle of the dominant gluon emission off of the b\\overline{b} pair.

  16. Building machines that learn and think like people.

    PubMed

    Lake, Brenden M; Ullman, Tomer D; Tenenbaum, Joshua B; Gershman, Samuel J

    2017-01-01

    Recent progress in artificial intelligence has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats that of humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent neural network advances with more structured cognitive models.

  17. ClearTK 2.0: Design Patterns for Machine Learning in UIMA

    PubMed Central

    Bethard, Steven; Ogren, Philip; Becker, Lee

    2014-01-01

    ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has followed a number of important design principles including: conceptually simple annotator interfaces, readable pipeline descriptions, minimal collection readers, type system agnostic code, modules organized for ease of import, and assisting user comprehension of the complex UIMA framework. PMID:29104966

  18. PMLB: a large benchmark suite for machine learning evaluation and comparison.

    PubMed

    Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

    2017-01-01

    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

  19. Hierarchical extreme learning machine based reinforcement learning for goal localization

    NASA Astrophysics Data System (ADS)

    AlDahoul, Nouar; Zaw Htike, Zaw; Akmeliawati, Rini

    2017-03-01

    The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB.

  20. Machine learning topological states

    NASA Astrophysics Data System (ADS)

    Deng, Dong-Ling; Li, Xiaopeng; Das Sarma, S.

    2017-11-01

    Artificial neural networks and machine learning have now reached a new era after several decades of improvement where applications are to explode in many fields of science, industry, and technology. Here, we use artificial neural networks to study an intriguing phenomenon in quantum physics—the topological phases of matter. We find that certain topological states, either symmetry-protected or with intrinsic topological order, can be represented with classical artificial neural networks. This is demonstrated by using three concrete spin systems, the one-dimensional (1D) symmetry-protected topological cluster state and the 2D and 3D toric code states with intrinsic topological orders. For all three cases, we show rigorously that the topological ground states can be represented by short-range neural networks in an exact and efficient fashion—the required number of hidden neurons is as small as the number of physical spins and the number of parameters scales only linearly with the system size. For the 2D toric-code model, we find that the proposed short-range neural networks can describe the excited states with Abelian anyons and their nontrivial mutual statistics as well. In addition, by using reinforcement learning we show that neural networks are capable of finding the topological ground states of nonintegrable Hamiltonians with strong interactions and studying their topological phase transitions. Our results demonstrate explicitly the exceptional power of neural networks in describing topological quantum states, and at the same time provide valuable guidance to machine learning of topological phases in generic lattice models.

  1. Geologic map of the Weka Dur gold deposit, Badakhshan Province, Afghanistan, modified from the 1967 original map compilation of M.P. Guguev and others

    USGS Publications Warehouse

    Peters, Stephen G.; Stettner, Will R.; Masonic, Linda M.

    2014-01-01

    The Weka Dur gold deposit lies in a cluster of other gold deposits in Badakhshan Province (Ragh district), such as the Kadar, Nesheb Dur, and Rishaw gold occurrences. These gold occurrences lie within a zone of late Hercynian folding and are most likely related to fluids that originated from orogenic processes. The Weka Dur deposit is the largest recorded gold occurrence in Afghanistan and is hosted in Proterozoic mica schist and amphibolite that is intruded by diabase dikes and other intrusive rocks. The tabular orebody is 350 meters (m) long and 2 m wide and can be traced downdip for 110 m. Mineralization consists of ochreous, brecciated schists containing high gold concentrations along gently and steeply dipping fissures. The brecciated rocks grade to 46.7 grams per ton (g/t) silver and contain arsenopyrite, galena, chalcopyrite, and scheelite. Trenches and adits were constructed, mapped, and sampled during the 1960s. Calculated resources are 958.3 kilograms of gold, averaging 4.1 g/t gold.

  2. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment

    PubMed Central

    2011-01-01

    Background Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. Results This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. Conclusions AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of

  3. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment.

    PubMed

    Stålring, Jonna C; Carlsson, Lars A; Almeida, Pedro; Boyer, Scott

    2011-07-28

    Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models

  4. Splendidly blended: a machine learning set up for CDU control

    NASA Astrophysics Data System (ADS)

    Utzny, Clemens

    2017-06-01

    As the concepts of machine learning and artificial intelligence continue to grow in importance in the context of internet related applications it is still in its infancy when it comes to process control within the semiconductor industry. Especially the branch of mask manufacturing presents a challenge to the concepts of machine learning since the business process intrinsically induces pronounced product variability on the background of small plate numbers. In this paper we present the architectural set up of a machine learning algorithm which successfully deals with the demands and pitfalls of mask manufacturing. A detailed motivation of this basic set up followed by an analysis of its statistical properties is given. The machine learning set up for mask manufacturing involves two learning steps: an initial step which identifies and classifies the basic global CD patterns of a process. These results form the basis for the extraction of an optimized training set via balanced sampling. A second learning step uses this training set to obtain the local as well as global CD relationships induced by the manufacturing process. Using two production motivated examples we show how this approach is flexible and powerful enough to deal with the exacting demands of mask manufacturing. In one example we show how dedicated covariates can be used in conjunction with increased spatial resolution of the CD map model in order to deal with pathological CD effects at the mask boundary. The other example shows how the model set up enables strategies for dealing tool specific CD signature differences. In this case the balanced sampling enables a process control scheme which allows usage of the full tool park within the specified tight tolerance budget. Overall, this paper shows that the current rapid developments off the machine learning algorithms can be successfully used within the context of semiconductor manufacturing.

  5. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

    PubMed

    Zeng, Xueqiang; Luo, Gang

    2017-12-01

    Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.

  6. Machine learning in laboratory medicine: waiting for the flood?

    PubMed

    Cabitza, Federico; Banfi, Giuseppe

    2018-03-28

    This review focuses on machine learning and on how methods and models combining data analytics and artificial intelligence have been applied to laboratory medicine so far. Although still in its infancy, the potential for applying machine learning to laboratory data for both diagnostic and prognostic purposes deserves more attention by the readership of this journal, as well as by physician-scientists who will want to take advantage of this new computer-based support in pathology and laboratory medicine.

  7. Machine Learning for Biological Trajectory Classification Applications

    NASA Technical Reports Server (NTRS)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  8. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    PubMed

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  9. Machine Learning with Distances

    DTIC Science & Technology

    2015-02-16

    of training class-wise densities p(x|y) to test input density p′(x). For the fitting of qπ to p ′, we may use the Kullback - Leibler (KL...Problems of Information Transmission, 23(9):95–101, 1987. [103] S. Kullback and R. A. Leibler . On information and sufficiency. The Annals of ...distributions. The Kullback - Leibler (KL) distance is the de-facto standard distance measure in statis- tics and machine learning, because

  10. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    PubMed

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. Learn about Physical Science: Simple Machines. [CD-ROM].

    ERIC Educational Resources Information Center

    2000

    This CD-ROM, designed for students in grades K-2, explores the world of simple machines. It allows students to delve into the mechanical world and learn the ways in which simple machines make work easier. Animated demonstrations are provided of the lever, pulley, wheel, screw, wedge, and inclined plane. Activities include practical matching and…

  12. Visible Machine Learning for Biomedicine.

    PubMed

    Yu, Michael K; Ma, Jianzhu; Fisher, Jasmin; Kreisberg, Jason F; Raphael, Benjamin J; Ideker, Trey

    2018-06-14

    A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology. Copyright © 2018. Published by Elsevier Inc.

  13. Relative optical navigation around small bodies via Extreme Learning Machine

    NASA Astrophysics Data System (ADS)

    Law, Andrew M.

    To perform close proximity operations under a low-gravity environment, relative and absolute positions are vital information to the maneuver. Hence navigation is inseparably integrated in space travel. Extreme Learning Machine (ELM) is presented as an optical navigation method around small celestial bodies. Optical Navigation uses visual observation instruments such as a camera to acquire useful data and determine spacecraft position. The required input data for operation is merely a single image strip and a nadir image. ELM is a machine learning Single Layer feed-Forward Network (SLFN), a type of neural network (NN). The algorithm is developed on the predicate that input weights and biases can be randomly assigned and does not require back-propagation. The learned model is the output layer weights which are used to calculate a prediction. Together, Extreme Learning Machine Optical Navigation (ELM OpNav) utilizes optical images and ELM algorithm to train the machine to navigate around a target body. In this thesis the asteroid, Vesta, is the designated celestial body. The trained ELMs estimate the position of the spacecraft during operation with a single data set. The results show the approach is promising and potentially suitable for on-board navigation.

  14. Machine learning derived risk prediction of anorexia nervosa.

    PubMed

    Guo, Yiran; Wei, Zhi; Keating, Brendan J; Hakonarson, Hakon

    2016-01-20

    Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children's Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC's of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting.

  15. Creating Situational Awareness in Spacecraft Operations with the Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Li, Z.

    2016-09-01

    This paper presents a machine learning approach for the situational awareness capability in spacecraft operations. There are two types of time dependent data patterns for spacecraft datasets: the absolute time pattern (ATP) and the relative time pattern (RTP). The machine learning captures the data patterns of the satellite datasets through the data training during the normal operations, which is represented by its time dependent trend. The data monitoring compares the values of the incoming data with the predictions of machine learning algorithm, which can detect any meaningful changes to a dataset above the noise level. If the difference between the value of incoming telemetry and the machine learning prediction are larger than the threshold defined by the standard deviation of datasets, it could indicate the potential anomaly that may need special attention. The application of the machine-learning approach to the Advanced Himawari Imager (AHI) on Japanese Himawari spacecraft series is presented, which has the same configuration as the Advanced Baseline Imager (ABI) on Geostationary Environment Operational Satellite (GOES) R series. The time dependent trends generated by the data-training algorithm are in excellent agreement with the datasets. The standard deviation in the time dependent trend provides a metric for measuring the data quality, which is particularly useful in evaluating the detector quality for both AHI and ABI with multiple detectors in each channel. The machine-learning approach creates the situational awareness capability, and enables engineers to handle the huge data volume that would have been impossible with the existing approach, and it leads to significant advances to more dynamic, proactive, and autonomous spacecraft operations.

  16. Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning.

    PubMed

    van Ginneken, Bram

    2017-03-01

    Half a century ago, the term "computer-aided diagnosis" (CAD) was introduced in the scientific literature. Pulmonary imaging, with chest radiography and computed tomography, has always been one of the focus areas in this field. In this study, I describe how machine learning became the dominant technology for tackling CAD in the lungs, generally producing better results than do classical rule-based approaches, and how the field is now rapidly changing: in the last few years, we have seen how even better results can be obtained with deep learning. The key differences among rule-based processing, machine learning, and deep learning are summarized and illustrated for various applications of CAD in the chest.

  17. Designing Contestability: Interaction Design, Machine Learning, and Mental Health

    PubMed Central

    Hirsch, Tad; Merced, Kritzia; Narayanan, Shrikanth; Imel, Zac E.; Atkins, David C.

    2017-01-01

    We describe the design of an automated assessment and training tool for psychotherapists to illustrate challenges with creating interactive machine learning (ML) systems, particularly in contexts where human life, livelihood, and wellbeing are at stake. We explore how existing theories of interaction design and machine learning apply to the psychotherapy context, and identify “contestability” as a new principle for designing systems that evaluate human behavior. Finally, we offer several strategies for making ML systems more accountable to human actors. PMID:28890949

  18. Classifying Black Hole States with Machine Learning

    NASA Astrophysics Data System (ADS)

    Huppenkothen, Daniela

    2018-01-01

    Galactic black hole binaries are known to go through different states with apparent signatures in both X-ray light curves and spectra, leading to important implications for accretion physics as well as our knowledge of General Relativity. Existing frameworks of classification are usually based on human interpretation of low-dimensional representations of the data, and generally only apply to fairly small data sets. Machine learning, in contrast, allows for rapid classification of large, high-dimensional data sets. In this talk, I will report on advances made in classification of states observed in Black Hole X-ray Binaries, focusing on the two sources GRS 1915+105 and Cygnus X-1, and show both the successes and limitations of using machine learning to derive physical constraints on these systems.

  19. Dimension Reduction With Extreme Learning Machine.

    PubMed

    Kasun, Liyanaarachchi Lekamalage Chamara; Yang, Yan; Huang, Guang-Bin; Zhang, Zhengyou

    2016-08-01

    Data may often contain noise or irrelevant information, which negatively affect the generalization capability of machine learning algorithms. The objective of dimension reduction algorithms, such as principal component analysis (PCA), non-negative matrix factorization (NMF), random projection (RP), and auto-encoder (AE), is to reduce the noise or irrelevant information of the data. The features of PCA (eigenvectors) and linear AE are not able to represent data as parts (e.g. nose in a face image). On the other hand, NMF and non-linear AE are maimed by slow learning speed and RP only represents a subspace of original data. This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed, and learns the between-class scatter subspace. To this end, this paper investigates a linear and non-linear dimension reduction framework referred to as extreme learning machine AE (ELM-AE) and sparse ELM-AE (SELM-AE). In contrast to tied weight AE, the hidden neurons in ELM-AE and SELM-AE need not be tuned, and their parameters (e.g, input weights in additive neurons) are initialized using orthogonal and sparse random weights, respectively. Experimental results on USPS handwritten digit recognition data set, CIFAR-10 object recognition, and NORB object recognition data set show the efficacy of linear and non-linear ELM-AE and SELM-AE in terms of discriminative capability, sparsity, training time, and normalized mean square error.

  20. CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arendt, Dustin L.; Komurlu, Caner; Blaha, Leslie M.

    We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's interactions, CHISSL trains a classification model guided by the user's grouping of the data. It then predicts the group of unlabeled instances and arranges some of these alongside the instances manually organized by the user. We hypothesize that this mode of human andmore » machine collaboration is more effective than Active Learning, wherein the machine decides for itself which instances should be labeled by the user. We found supporting evidence for this hypothesis in a pilot study where we applied CHISSL to organize a collection of handwritten digits.« less

  1. Machine learning in autistic spectrum disorder behavioral research: A review and ways forward.

    PubMed

    Thabtah, Fadi

    2018-02-13

    Autistic Spectrum Disorder (ASD) is a mental disorder that retards acquisition of linguistic, communication, cognitive, and social skills and abilities. Despite being diagnosed with ASD, some individuals exhibit outstanding scholastic, non-academic, and artistic capabilities, in such cases posing a challenging task for scientists to provide answers. In the last few years, ASD has been investigated by social and computational intelligence scientists utilizing advanced technologies such as machine learning to improve diagnostic timing, precision, and quality. Machine learning is a multidisciplinary research topic that employs intelligent techniques to discover useful concealed patterns, which are utilized in prediction to improve decision making. Machine learning techniques such as support vector machines, decision trees, logistic regressions, and others, have been applied to datasets related to autism in order to construct predictive models. These models claim to enhance the ability of clinicians to provide robust diagnoses and prognoses of ASD. However, studies concerning the use of machine learning in ASD diagnosis and treatment suffer from conceptual, implementation, and data issues such as the way diagnostic codes are used, the type of feature selection employed, the evaluation measures chosen, and class imbalances in data among others. A more serious claim in recent studies is the development of a new method for ASD diagnoses based on machine learning. This article critically analyses these recent investigative studies on autism, not only articulating the aforementioned issues in these studies but also recommending paths forward that enhance machine learning use in ASD with respect to conceptualization, implementation, and data. Future studies concerning machine learning in autism research are greatly benefitted by such proposals.

  2. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    NASA Astrophysics Data System (ADS)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  3. Predicting Market Impact Costs Using Nonparametric Machine Learning Models

    PubMed Central

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  4. Voice based gender classification using machine learning

    NASA Astrophysics Data System (ADS)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  5. Using machine learning for sequence-level automated MRI protocol selection in neuroradiology.

    PubMed

    Brown, Andrew D; Marotta, Thomas R

    2018-05-01

    Incorrect imaging protocol selection can lead to important clinical findings being missed, contributing to both wasted health care resources and patient harm. We present a machine learning method for analyzing the unstructured text of clinical indications and patient demographics from magnetic resonance imaging (MRI) orders to automatically protocol MRI procedures at the sequence level. We compared 3 machine learning models - support vector machine, gradient boosting machine, and random forest - to a baseline model that predicted the most common protocol for all observations in our test set. The gradient boosting machine model significantly outperformed the baseline and demonstrated the best performance of the 3 models in terms of accuracy (95%), precision (86%), recall (80%), and Hamming loss (0.0487). This demonstrates the feasibility of automating sequence selection by applying machine learning to MRI orders. Automated sequence selection has important safety, quality, and financial implications and may facilitate improvements in the quality and safety of medical imaging service delivery.

  6. Learning Activity Packets for Grinding Machines. Unit I--Grinding Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the first unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  7. In vitro molecular machine learning algorithm via symmetric internal loops of DNA.

    PubMed

    Lee, Ji-Hoon; Lee, Seung Hwan; Baek, Christina; Chun, Hyosun; Ryu, Je-Hwan; Kim, Jin-Woo; Deaton, Russell; Zhang, Byoung-Tak

    2017-08-01

    Programmable biomolecules, such as DNA strands, deoxyribozymes, and restriction enzymes, have been used to solve computational problems, construct large-scale logic circuits, and program simple molecular games. Although studies have shown the potential of molecular computing, the capability of computational learning with DNA molecules, i.e., molecular machine learning, has yet to be experimentally verified. Here, we present a novel molecular learning in vitro model in which symmetric internal loops of double-stranded DNA are exploited to measure the differences between training instances, thus enabling the molecules to learn from small errors. The model was evaluated on a data set of twenty dialogue sentences obtained from the television shows Friends and Prison Break. The wet DNA-computing experiments confirmed that the molecular learning machine was able to generalize the dialogue patterns of each show and successfully identify the show from which the sentences originated. The molecular machine learning model described here opens the way for solving machine learning problems in computer science and biology using in vitro molecular computing with the data encoded in DNA molecules. Copyright © 2017. Published by Elsevier B.V.

  8. (Machine-)Learning to analyze in vivo microscopy: Support vector machines.

    PubMed

    Wang, Michael F Z; Fernandez-Gonzalez, Rodrigo

    2017-11-01

    The development of new microscopy techniques for super-resolved, long-term monitoring of cellular and subcellular dynamics in living organisms is revealing new fundamental aspects of tissue development and repair. However, new microscopy approaches present several challenges. In addition to unprecedented requirements for data storage, the analysis of high resolution, time-lapse images is too complex to be done manually. Machine learning techniques are ideally suited for the (semi-)automated analysis of multidimensional image data. In particular, support vector machines (SVMs), have emerged as an efficient method to analyze microscopy images obtained from animals. Here, we discuss the use of SVMs to analyze in vivo microscopy data. We introduce the mathematical framework behind SVMs, and we describe the metrics used by SVMs and other machine learning approaches to classify image data. We discuss the influence of different SVM parameters in the context of an algorithm for cell segmentation and tracking. Finally, we describe how the application of SVMs has been critical to study protein localization in yeast screens, for lineage tracing in C. elegans, or to determine the developmental stage of Drosophila embryos to investigate gene expression dynamics. We propose that SVMs will become central tools in the analysis of the complex image data that novel microscopy modalities have made possible. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Analysis of Machine Learning Techniques for Heart Failure Readmissions.

    PubMed

    Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

    2016-11-01

    The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.

  10. Machine learning for epigenetics and future medical applications

    PubMed Central

    Holder, Lawrence B.; Haque, M. Muksitul; Skinner, Michael K.

    2017-01-01

    ABSTRACT Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review. PMID:28524769

  11. Machine learning for epigenetics and future medical applications.

    PubMed

    Holder, Lawrence B; Haque, M Muksitul; Skinner, Michael K

    2017-07-03

    Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.

  12. A machine learning approach to the accurate prediction of monitor units for a compact proton machine.

    PubMed

    Sun, Baozhou; Lam, Dao; Yang, Deshan; Grantham, Kevin; Zhang, Tiezhi; Mutic, Sasa; Zhao, Tianyu

    2018-05-01

    Clinical treatment planning systems for proton therapy currently do not calculate monitor units (MUs) in passive scatter proton therapy due to the complexity of the beam delivery systems. Physical phantom measurements are commonly employed to determine the field-specific output factors (OFs) but are often subject to limited machine time, measurement uncertainties and intensive labor. In this study, a machine learning-based approach was developed to predict output (cGy/MU) and derive MUs, incorporating the dependencies on gantry angle and field size for a single-room proton therapy system. The goal of this study was to develop a secondary check tool for OF measurements and eventually eliminate patient-specific OF measurements. The OFs of 1754 fields previously measured in a water phantom with calibrated ionization chambers and electrometers for patient-specific fields with various range and modulation width combinations for 23 options were included in this study. The training data sets for machine learning models in three different methods (Random Forest, XGBoost and Cubist) included 1431 (~81%) OFs. Ten-fold cross-validation was used to prevent "overfitting" and to validate each model. The remaining 323 (~19%) OFs were used to test the trained models. The difference between the measured and predicted values from machine learning models was analyzed. Model prediction accuracy was also compared with that of the semi-empirical model developed by Kooy (Phys. Med. Biol. 50, 2005). Additionally, gantry angle dependence of OFs was measured for three groups of options categorized on the selection of the second scatters. Field size dependence of OFs was investigated for the measurements with and without patient-specific apertures. All three machine learning methods showed higher accuracy than the semi-empirical model which shows considerably large discrepancy of up to 7.7% for the treatment fields with full range and full modulation width. The Cubist-based solution

  13. Machine Learning methods for Quantitative Radiomic Biomarkers.

    PubMed

    Parmar, Chintan; Grossmann, Patrick; Bussink, Johan; Lambin, Philippe; Aerts, Hugo J W L

    2015-08-17

    Radiomics extracts and mines large number of medical imaging features quantifying tumor phenotypic characteristics. Highly accurate and reliable machine-learning approaches can drive the success of radiomic applications in clinical care. In this radiomic study, fourteen feature selection methods and twelve classification methods were examined in terms of their performance and stability for predicting overall survival. A total of 440 radiomic features were extracted from pre-treatment computed tomography (CT) images of 464 lung cancer patients. To ensure the unbiased evaluation of different machine-learning methods, publicly available implementations along with reported parameter configurations were used. Furthermore, we used two independent radiomic cohorts for training (n = 310 patients) and validation (n = 154 patients). We identified that Wilcoxon test based feature selection method WLCX (stability = 0.84 ± 0.05, AUC = 0.65 ± 0.02) and a classification method random forest RF (RSD = 3.52%, AUC = 0.66 ± 0.03) had highest prognostic performance with high stability against data perturbation. Our variability analysis indicated that the choice of classification method is the most dominant source of performance variation (34.21% of total variance). Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice.

  14. Detection of longitudinal visual field progression in glaucoma using machine learning.

    PubMed

    Yousefi, Siamak; Kiwaki, Taichi; Zheng, Yuhui; Suigara, Hiroki; Asaoka, Ryo; Murata, Hiroshi; Lemij, Hans; Yamanishi, Kenji

    2018-06-16

    Global indices of standard automated perimerty are insensitive to localized losses, while point-wise indices are sensitive but highly variable. Region-wise indices sit in between. This study introduces a machine-learning-based index for glaucoma progression detection that outperforms global, region-wise, and point-wise indices. Development and comparison of a prognostic index. Visual fields from 2085 eyes of 1214 subjects were used to identify glaucoma progression patterns using machine learning. Visual fields from 133 eyes of 71 glaucoma patients were collected 10 times over 10 weeks to provide a no-change, test-retest dataset. The parameters of all methods were identified using visual field sequences in the test-retest dataset to meet fixed 95% specificity. An independent dataset of 270 eyes of 136 glaucoma patients and survival analysis were utilized to compare methods. The time to detect progression in 25% of the eyes in the longitudinal dataset using global mean deviation (MD) was 5.2 years (95% confidence interval, 4.1 - 6.5 years); 4.5 years (4.0 - 5.5) using region-wise, 3.9 years (3.5 - 4.6) using point-wise, and 3.5 years (3.1 - 4.0) using machine learning analysis. The time until 25% of eyes showed subsequently confirmed progression after two additional visits were included were 6.6 years (5.6 - 7.4 years), 5.7 years (4.8 - 6.7), 5.6 years (4.7 - 6.5), and 5.1 years (4.5 - 6.0) for global, region-wise, point-wise, and machine learning analyses, respectively. Machine learning analysis detects progressing eyes earlier than other methods consistently, with or without confirmation visits. In particular, machine learning detects more slowly progressing eyes than other methods. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Interactive Algorithms for Unsupervised Machine Learning

    DTIC Science & Technology

    2015-06-01

    committee members, Nina Balcan, Sanjoy Dasgupta, and John Langford. Nina’s unbounded energy and her passion for machine learning are qualities that I...52 3.3.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.3 Real World Experiments...80 4.4.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4.2 Real World

  16. Health Informatics via Machine Learning for the Clinical Management of Patients.

    PubMed

    Clifton, D A; Niehaus, K E; Charlton, P; Colopy, G W

    2015-08-13

    To review how health informatics systems based on machine learning methods have impacted the clinical management of patients, by affecting clinical practice. We reviewed literature from 2010-2015 from databases such as Pubmed, IEEE xplore, and INSPEC, in which methods based on machine learning are likely to be reported. We bring together a broad body of literature, aiming to identify those leading examples of health informatics that have advanced the methodology of machine learning. While individual methods may have further examples that might be added, we have chosen some of the most representative, informative exemplars in each case. Our survey highlights that, while much research is taking place in this high-profile field, examples of those that affect the clinical management of patients are seldom found. We show that substantial progress is being made in terms of methodology, often by data scientists working in close collaboration with clinical groups. Health informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale.

  17. Machine Learning in Computer-Aided Synthesis Planning.

    PubMed

    Coley, Connor W; Green, William H; Jensen, Klavs F

    2018-05-15

    Computer-aided synthesis planning (CASP) is focused on the goal of accelerating the process by which chemists decide how to synthesize small molecule compounds. The ideal CASP program would take a molecular structure as input and output a sorted list of detailed reaction schemes that each connect that target to purchasable starting materials via a series of chemically feasible reaction steps. Early work in this field relied on expert-crafted reaction rules and heuristics to describe possible retrosynthetic disconnections and selectivity rules but suffered from incompleteness, infeasible suggestions, and human bias. With the relatively recent availability of large reaction corpora (such as the United States Patent and Trademark Office (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. As a result, synthesis planning has been opened to machine learning techniques, and the field is advancing rapidly. In this Account, we focus on two critical aspects of CASP and recent machine learning approaches to both challenges. First, we discuss the problem of retrosynthetic planning, which requires a recommender system to propose synthetic disconnections starting from a target molecule. We describe how the search strategy, necessary to overcome the exponential growth of the search space with increasing number of reaction steps, can be assisted through a learned synthetic complexity metric. We also describe how the recursive expansion can be performed by a straightforward nearest neighbor model that makes clever use of reaction data to generate high quality retrosynthetic disconnections. Second, we discuss the problem of anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan (i.e., reduce false positives) to increase the likelihood of experimental success

  18. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography.

    PubMed

    Narula, Sukrit; Shameer, Khader; Salem Omar, Alaa Mabrouk; Dudley, Joel T; Sengupta, Partho P

    2016-11-29

    Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation. This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH). Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation. Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p < 0.01), average early diastolic tissue velocity (e') (p < 0.01), and strain (p = 0.04). Because ATH were younger, adjusted analysis was undertaken in younger HCM patients and compared with ATH with left ventricular wall thickness >13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e', and strain. Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning-based system for automated interpretation of echocardiographic images, which may help novice readers with

  19. Machine Learning and Data Mining Methods in Diabetes Research.

    PubMed

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  20. Making Individual Prognoses in Psychiatry Using Neuroimaging and Machine Learning.

    PubMed

    Janssen, Ronald J; Mourão-Miranda, Janaina; Schnack, Hugo G

    2018-04-22

    Psychiatric prognosis is a difficult problem. Making a prognosis requires looking far into the future, as opposed to making a diagnosis, which is concerned with the current state. During the follow-up period, many factors will influence the course of the disease. Combined with the usually scarcer longitudinal data and the variability in the definition of outcomes/transition, this makes prognostic predictions a challenging endeavor. Employing neuroimaging data in this endeavor introduces the additional hurdle of high dimensionality. Machine-learning techniques are especially suited to tackle this challenging problem. This review starts with a brief introduction to machine learning in the context of its application to clinical neuroimaging data. We highlight a few issues that are especially relevant for prediction of outcome and transition using neuroimaging. We then review the literature that discusses the application of machine learning for this purpose. Critical examination of the studies and their results with respect to the relevant issues revealed the following: 1) there is growing evidence for the prognostic capability of machine-learning-based models using neuroimaging; and 2) reported accuracies may be too optimistic owing to small sample sizes and the lack of independent test samples. Finally, we discuss options to improve the reliability of (prognostic) prediction models. These include new methodologies and multimodal modeling. Paramount, however, is our conclusion that future work will need to provide properly (cross-)validated accuracy estimates of models trained on sufficiently large datasets. Nevertheless, with the technological advances enabling acquisition of large databases of patients and healthy subjects, machine learning represents a powerful tool in the search for psychiatric biomarkers. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  1. Machine learning for science: state of the art and future prospects.

    PubMed

    Mjolsness, E; DeCoste, D

    2001-09-14

    Recent advances in machine learning methods, along with successful applications across a wide variety of fields such as planetary science and bioinformatics, promise powerful new tools for practicing scientists. This viewpoint highlights some useful characteristics of modern machine learning methods and their relevance to scientific applications. We conclude with some speculations on near-term progress and promising directions.

  2. Ship localization in Santa Barbara Channel using machine learning classifiers.

    PubMed

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  3. Machine learning and computer vision approaches for phenotypic profiling

    PubMed Central

    Morris, Quaid

    2017-01-01

    With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. PMID:27940887

  4. Machine learning and computer vision approaches for phenotypic profiling.

    PubMed

    Grys, Ben T; Lo, Dara S; Sahin, Nil; Kraus, Oren Z; Morris, Quaid; Boone, Charles; Andrews, Brenda J

    2017-01-02

    With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. © 2017 Grys et al.

  5. Manifold learning in machine vision and robotics

    NASA Astrophysics Data System (ADS)

    Bernstein, Alexander

    2017-02-01

    Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.

  6. A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.

    PubMed

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.

  7. A Collaborative Framework for Distributed Privacy-Preserving Support Vector Machine Learning

    PubMed Central

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates “privacy-insensitive” intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414

  8. Time-Frequency Learning Machines for Nonstationarity Detection Using Surrogates

    NASA Astrophysics Data System (ADS)

    Borgnat, Pierre; Flandrin, Patrick; Richard, Cédric; Ferrari, André; Amoud, Hassan; Honeine, Paul

    2012-03-01

    Time-frequency representations provide a powerful tool for nonstationary signal analysis and classification, supporting a wide range of applications [12]. As opposed to conventional Fourier analysis, these techniques reveal the evolution in time of the spectral content of signals. In Ref. [7,38], time-frequency analysis is used to test stationarity of any signal. The proposed method consists of a comparison between global and local time-frequency features. The originality is to make use of a family of stationary surrogate signals for defining the null hypothesis of stationarity and, based upon this information, to derive statistical tests. An open question remains, however, about how to choose relevant time-frequency features. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. These learning machines have gained popularity due to their conceptual simplicity and their outstanding performance [30]. Initiated by Vapnik’s support vector machines (SVM) [35], they offer now a wide class of supervised and unsupervised learning algorithms. In Ref. [17-19], the authors have shown how the most effective and innovative learning machines can be tuned to operate in the time-frequency domain. This chapter follows this line of research by taking advantage of learning machines to test and quantify stationarity. Based on one-class SVM, our approach uses the entire time-frequency representation and does not require arbitrary feature extraction. Applied to a set of surrogates, it provides the domain boundary that includes most of these stationarized signals. This allows us to test the stationarity of the signal under investigation. This chapter is organized as follows. In Section 22.2, we introduce the surrogate data method to generate stationarized signals, namely, the null hypothesis of stationarity. The concept of time-frequency learning machines is presented in Section 22.3, and applied to one-class SVM in order

  9. A Machine Learning Concept for DTN Routing

    NASA Technical Reports Server (NTRS)

    Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos

    2017-01-01

    This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.

  10. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    ERIC Educational Resources Information Center

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  11. Machine-learning-assisted materials discovery using failed experiments

    NASA Astrophysics Data System (ADS)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted

  12. Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality.

    PubMed

    Braithwaite, Scott R; Giraud-Carrier, Christophe; West, Josh; Barnes, Michael D; Hanson, Carl Lee

    2016-05-16

    One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data.

  13. Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality

    PubMed Central

    2016-01-01

    Background One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Objective Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Methods Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Results Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Conclusions Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data. PMID:27185366

  14. Survey of Machine Learning Methods for Database Security

    NASA Astrophysics Data System (ADS)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  15. Data Mining and Machine Learning in Astronomy

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  16. Machine learning of network metrics in ATLAS Distributed Data Management

    NASA Astrophysics Data System (ADS)

    Lassnig, Mario; Toler, Wesley; Vamosi, Ralf; Bogado, Joaquin; ATLAS Collaboration

    2017-10-01

    The increasing volume of physics data poses a critical challenge to the ATLAS experiment. In anticipation of high luminosity physics, automation of everyday data management tasks has become necessary. Previously many of these tasks required human decision-making and operation. Recent advances in hardware and software have made it possible to entrust more complicated duties to automated systems using models trained by machine learning algorithms. In this contribution we show results from one of our ongoing automation efforts that focuses on network metrics. First, we describe our machine learning framework built atop the ATLAS Analytics Platform. This framework can automatically extract and aggregate data, train models with various machine learning algorithms, and eventually score the resulting models and parameters. Second, we use these models to forecast metrics relevant for networkaware job scheduling and data brokering. We show the characteristics of the data and evaluate the forecasting accuracy of our models.

  17. Radio Frequency Interference Detection using Machine Learning.

    NASA Astrophysics Data System (ADS)

    Mosiane, Olorato; Oozeer, Nadeem; Aniyan, Arun; Bassett, Bruce A.

    2017-05-01

    Radio frequency interference (RFI) has plagued radio astronomy which potentially might be as bad or worse by the time the Square Kilometre Array (SKA) comes up. RFI can be either internal (generated by instruments) or external that originates from intentional or unintentional radio emission generated by man. With the huge amount of data that will be available with up coming radio telescopes, an automated aproach will be required to detect RFI. In this paper to try automate this process we present the result of applying machine learning techniques to cross match RFI from the Karoo Array Telescope (KAT-7) data. We found that not all the features selected to characterise RFI are always important. We further investigated 3 machine learning techniques and conclude that the Random forest classifier performs with a 98% Area Under Curve and 91% recall in detecting RFI.

  18. Automatic microseismic event picking via unsupervised machine learning

    NASA Astrophysics Data System (ADS)

    Chen, Yangkang

    2018-01-01

    Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

  19. Development of a machine learning potential for graphene

    NASA Astrophysics Data System (ADS)

    Rowe, Patrick; Csányi, Gábor; Alfè, Dario; Michaelides, Angelos

    2018-02-01

    We present an accurate interatomic potential for graphene, constructed using the Gaussian approximation potential (GAP) machine learning methodology. This GAP model obtains a faithful representation of a density functional theory (DFT) potential energy surface, facilitating highly accurate (approaching the accuracy of ab initio methods) molecular dynamics simulations. This is achieved at a computational cost which is orders of magnitude lower than that of comparable calculations which directly invoke electronic structure methods. We evaluate the accuracy of our machine learning model alongside that of a number of popular empirical and bond-order potentials, using both experimental and ab initio data as references. We find that whilst significant discrepancies exist between the empirical interatomic potentials and the reference data—and amongst the empirical potentials themselves—the machine learning model introduced here provides exemplary performance in all of the tested areas. The calculated properties include: graphene phonon dispersion curves at 0 K (which we predict with sub-meV accuracy), phonon spectra at finite temperature, in-plane thermal expansion up to 2500 K as compared to NPT ab initio molecular dynamics simulations and a comparison of the thermally induced dispersion of graphene Raman bands to experimental observations. We have made our potential freely available online at [http://www.libatoms.org].

  20. Machine Learning Force Field Parameters from Ab Initio Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Ying; Li, Hui; Pickard, Frank C.

    Machine learning (ML) techniques with the genetic algorithm (GA) have been applied to determine a polarizable force field parameters using only ab initio data from quantum mechanics (QM) calculations of molecular clusters at the MP2/6-31G(d,p), DFMP2(fc)/jul-cc-pVDZ, and DFMP2(fc)/jul-cc-pVTZ levels to predict experimental condensed phase properties (i.e., density and heat of vaporization). The performance of this ML/GA approach is demonstrated on 4943 dimer electrostatic potentials and 1250 cluster interaction energies for methanol. Excellent agreement between the training data set from QM calculations and the optimized force field model was achieved. The results were further improved by introducing an offset factor duringmore » the machine learning process to compensate for the discrepancy between the QM calculated energy and the energy reproduced by optimized force field, while maintaining the local “shape” of the QM energy surface. Throughout the machine learning process, experimental observables were not involved in the objective function, but were only used for model validation. The best model, optimized from the QM data at the DFMP2(fc)/jul-cc-pVTZ level, appears to perform even better than the original AMOEBA force field (amoeba09.prm), which was optimized empirically to match liquid properties. The present effort shows the possibility of using machine learning techniques to develop descriptive polarizable force field using only QM data. The ML/GA strategy to optimize force fields parameters described here could easily be extended to other molecular systems.« less

  1. Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

    2015-01-01

    Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC

  2. Multilayer Extreme Learning Machine With Subnetwork Nodes for Representation Learning.

    PubMed

    Yang, Yimin; Wu, Q M Jonathan

    2016-11-01

    The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feedforward neural networks, provides efficient unified learning solutions for the applications of clustering, regression, and classification. It presents competitive accuracy with superb efficiency in many applications. However, ELM with subnetwork nodes architecture has not attracted much research attentions. Recently, many methods have been proposed for supervised/unsupervised dimension reduction or representation learning, but these methods normally only work for one type of problem. This paper studies the general architecture of multilayer ELM (ML-ELM) with subnetwork nodes, showing that: 1) the proposed method provides a representation learning platform with unsupervised/supervised and compressed/sparse representation learning and 2) experimental results on ten image datasets and 16 classification datasets show that, compared to other conventional feature learning methods, the proposed ML-ELM with subnetwork nodes performs competitively or much better than other feature learning methods.

  3. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

    PubMed Central

    Kai, Joe; Garibaldi, Jonathan M.; Qureshi, Nadeem

    2017-01-01

    Background Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. Methods Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the ‘receiver operating curve’ (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). Findings 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723–0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739–0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755–0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755–0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759–0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. Conclusions Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others

  4. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

    PubMed

    Weng, Stephen F; Reps, Jenna; Kai, Joe; Garibaldi, Jonathan M; Qureshi, Nadeem

    2017-01-01

    Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the 'receiver operating curve' (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723-0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739-0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755-0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755-0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759-0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.

  5. On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products.

    PubMed

    Varshney, Kush R; Alemzadeh, Homa

    2017-09-01

    Machine learning algorithms increasingly influence our decisions and interact with us in all parts of our daily lives. Therefore, just as we consider the safety of power plants, highways, and a variety of other engineered socio-technical systems, we must also take into account the safety of systems involving machine learning. Heretofore, the definition of safety has not been formalized in a machine learning context. In this article, we do so by defining machine learning safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. We then use this definition to examine safety in all sorts of applications in cyber-physical systems, decision sciences, and data products. We find that the foundational principle of modern statistical machine learning, empirical risk minimization, is not always a sufficient objective. We discuss how four different categories of strategies for achieving safety in engineering, including inherently safe design, safety reserves, safe fail, and procedural safeguards can be mapped to a machine learning context. We then discuss example techniques that can be adopted in each category, such as considering interpretability and causality of predictive models, objective functions beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software and open data.

  6. Galaxy Classification using Machine Learning

    NASA Astrophysics Data System (ADS)

    Fowler, Lucas; Schawinski, Kevin; Brandt, Ben-Elias; widmer, Nicole

    2017-01-01

    We present our current research into the use of machine learning to classify galaxy imaging data with various convolutional neural network configurations in TensorFlow. We are investigating how five-band Sloan Digital Sky Survey imaging data can be used to train on physical properties such as redshift, star formation rate, mass and morphology. We also investigate the performance of artificially redshifted images in recovering physical properties as image quality degrades.

  7. Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets.

    PubMed

    Korotcov, Alexandru; Tkachenko, Valery; Russo, Daniel P; Ekins, Sean

    2017-12-04

    Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further

  8. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution

    PubMed Central

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis. PMID:26958271

  9. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution.

    PubMed

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis.

  10. Applying machine learning to identify autistic adults using imitation: An exploratory study.

    PubMed

    Li, Baihua; Sharma, Arjun; Meng, James; Purushwalkam, Senthil; Gowen, Emma

    2017-01-01

    Autism spectrum condition (ASC) is primarily diagnosed by behavioural symptoms including social, sensory and motor aspects. Although stereotyped, repetitive motor movements are considered during diagnosis, quantitative measures that identify kinematic characteristics in the movement patterns of autistic individuals are poorly studied, preventing advances in understanding the aetiology of motor impairment, or whether a wider range of motor characteristics could be used for diagnosis. The aim of this study was to investigate whether data-driven machine learning based methods could be used to address some fundamental problems with regard to identifying discriminative test conditions and kinematic parameters to classify between ASC and neurotypical controls. Data was based on a previous task where 16 ASC participants and 14 age, IQ matched controls observed then imitated a series of hand movements. 40 kinematic parameters extracted from eight imitation conditions were analysed using machine learning based methods. Two optimal imitation conditions and nine most significant kinematic parameters were identified and compared with some standard attribute evaluators. To our knowledge, this is the first attempt to apply machine learning to kinematic movement parameters measured during imitation of hand movements to investigate the identification of ASC. Although based on a small sample, the work demonstrates the feasibility of applying machine learning methods to analyse high-dimensional data and suggest the potential of machine learning for identifying kinematic biomarkers that could contribute to the diagnostic classification of autism.

  11. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics

    PubMed Central

    HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE

    2017-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. PMID:29275361

  12. Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

    DTIC Science & Technology

    2015-09-12

    AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-11-1-0239 5c.  PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY

  13. A strategy for quantum algorithm design assisted by machine learning

    NASA Astrophysics Data System (ADS)

    Bang, Jeongho; Ryu, Junghee; Yoo, Seokwon; Pawłowski, Marcin; Lee, Jinhyoung

    2014-07-01

    We propose a method for quantum algorithm design assisted by machine learning. The method uses a quantum-classical hybrid simulator, where a ‘quantum student’ is being taught by a ‘classical teacher’. In other words, in our method, the learning system is supposed to evolve into a quantum algorithm for a given problem, assisted by a classical main-feedback system. Our method is applicable for designing quantum oracle-based algorithms. We chose, as a case study, an oracle decision problem, called a Deutsch-Jozsa problem. We showed by using Monte Carlo simulations that our simulator can faithfully learn a quantum algorithm for solving the problem for a given oracle. Remarkably, the learning time is proportional to the square root of the total number of parameters, rather than showing the exponential dependence found in the classical machine learning-based method.

  14. Perspectives on Machine Learning for Classification of Schizotypy Using fMRI Data.

    PubMed

    Madsen, Kristoffer H; Krohne, Laerke G; Cai, Xin-Lu; Wang, Yi; Chan, Raymond C K

    2018-03-15

    Functional magnetic resonance imaging is capable of estimating functional activation and connectivity in the human brain, and lately there has been increased interest in the use of these functional modalities combined with machine learning for identification of psychiatric traits. While these methods bear great potential for early diagnosis and better understanding of disease processes, there are wide ranges of processing choices and pitfalls that may severely hamper interpretation and generalization performance unless carefully considered. In this perspective article, we aim to motivate the use of machine learning schizotypy research. To this end, we describe common data processing steps while commenting on best practices and procedures. First, we introduce the important role of schizotypy to motivate the importance of reliable classification, and summarize existing machine learning literature on schizotypy. Then, we describe procedures for extraction of features based on fMRI data, including statistical parametric mapping, parcellation, complex network analysis, and decomposition methods, as well as classification with a special focus on support vector classification and deep learning. We provide more detailed descriptions and software as supplementary material. Finally, we present current challenges in machine learning for classification of schizotypy and comment on future trends and perspectives.

  15. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View

    PubMed Central

    2016-01-01

    Background As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. PMID:27986644

  16. Current Developments in Machine Learning Techniques in Biological Data Mining.

    PubMed

    Dumancas, Gerard G; Adrianto, Indra; Bello, Ghalib; Dozmorov, Mikhail

    2017-01-01

    This supplement is intended to focus on the use of machine learning techniques to generate meaningful information on biological data. This supplement under Bioinformatics and Biology Insights aims to provide scientists and researchers working in this rapid and evolving field with online, open-access articles authored by leading international experts in this field. Advances in the field of biology have generated massive opportunities to allow the implementation of modern computational and statistical techniques. Machine learning methods in particular, a subfield of computer science, have evolved as an indispensable tool applied to a wide spectrum of bioinformatics applications. Thus, it is broadly used to investigate the underlying mechanisms leading to a specific disease, as well as the biomarker discovery process. With a growth in this specific area of science comes the need to access up-to-date, high-quality scholarly articles that will leverage the knowledge of scientists and researchers in the various applications of machine learning techniques in mining biological data.

  17. Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.

    PubMed

    Lane, Thomas; Russo, Daniel P; Zorn, Kimberley M; Clark, Alex M; Korotcov, Alexandru; Tkachenko, Valery; Reynolds, Robert C; Perryman, Alexander L; Freundlich, Joel S; Ekins, Sean

    2018-04-26

    Tuberculosis is a global health dilemma. In 2016, the WHO reported 10.4 million incidences and 1.7 million deaths. The need to develop new treatments for those infected with Mycobacterium tuberculosis ( Mtb) has led to many large-scale phenotypic screens and many thousands of new active compounds identified in vitro. However, with limited funding, efforts to discover new active molecules against Mtb needs to be more efficient. Several computational machine learning approaches have been shown to have good enrichment and hit rates. We have curated small molecule Mtb data and developed new models with a total of 18,886 molecules with activity cutoffs of 10 μM, 1 μM, and 100 nM. These data sets were used to evaluate different machine learning methods (including deep learning) and metrics and to generate predictions for additional molecules published in 2017. One Mtb model, a combined in vitro and in vivo data Bayesian model at a 100 nM activity yielded the following metrics for 5-fold cross validation: accuracy = 0.88, precision = 0.22, recall = 0.91, specificity = 0.88, kappa = 0.31, and MCC = 0.41. We have also curated an evaluation set ( n = 153 compounds) published in 2017, and when used to test our model, it showed the comparable statistics (accuracy = 0.83, precision = 0.27, recall = 1.00, specificity = 0.81, kappa = 0.36, and MCC = 0.47). We have also compared these models with additional machine learning algorithms showing Bayesian machine learning models constructed with literature Mtb data generated by different laboratories generally were equivalent to or outperformed deep neural networks with external test sets. Finally, we have also compared our training and test sets to show they were suitably diverse and different in order to represent useful evaluation sets. Such Mtb machine learning models could help prioritize compounds for testing in vitro and in vivo.

  18. Research on knowledge representation, machine learning, and knowledge acquisition

    NASA Technical Reports Server (NTRS)

    Buchanan, Bruce G.

    1987-01-01

    Research in knowledge representation, machine learning, and knowledge acquisition performed at Knowledge Systems Lab. is summarized. The major goal of the research was to develop flexible, effective methods for representing the qualitative knowledge necessary for solving large problems that require symbolic reasoning as well as numerical computation. The research focused on integrating different representation methods to describe different kinds of knowledge more effectively than any one method can alone. In particular, emphasis was placed on representing and using spatial information about three dimensional objects and constraints on the arrangement of these objects in space. Another major theme is the development of robust machine learning programs that can be integrated with a variety of intelligent systems. To achieve this goal, learning methods were designed, implemented and experimented within several different problem solving environments.

  19. Diagnostic Machine Learning Models for Acute Abdominal Pain: Towards an e-Learning Tool for Medical Students.

    PubMed

    Khumrin, Piyapong; Ryan, Anna; Judd, Terry; Verspoor, Karin

    2017-01-01

    Computer-aided learning systems (e-learning systems) can help medical students gain more experience with diagnostic reasoning and decision making. Within this context, providing feedback that matches students' needs (i.e. personalised feedback) is both critical and challenging. In this paper, we describe the development of a machine learning model to support medical students' diagnostic decisions. Machine learning models were trained on 208 clinical cases presenting with abdominal pain, to predict five diagnoses. We assessed which of these models are likely to be most effective for use in an e-learning tool that allows students to interact with a virtual patient. The broader goal is to utilise these models to generate personalised feedback based on the specific patient information requested by students and their active diagnostic hypotheses.

  20. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  1. Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

    PubMed

    Howard, Rebecca; Rattray, Magnus; Prosperi, Mattia; Custovic, Adnan

    2015-07-01

    Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as 'asthma endotypes'. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies.

  2. Using Simple Machines to Leverage Learning

    ERIC Educational Resources Information Center

    Dotger, Sharon

    2008-01-01

    What would your students say if you told them they could lift you off the ground using a block and a board? Using a simple machine, they'll find out they can, and they'll learn about work, energy, and motion in the process! In addition, this integrated lesson gives students the opportunity to investigate variables while practicing measurement…

  3. Oceanic eddy detection and lifetime forecast using machine learning methods

    NASA Astrophysics Data System (ADS)

    Ashkezari, Mohammad D.; Hill, Christopher N.; Follett, Christopher N.; Forget, Gaël.; Follows, Michael J.

    2016-12-01

    We report a novel altimetry-based machine learning approach for eddy identification and characterization. The machine learning models use daily maps of geostrophic velocity anomalies and are trained according to the phase angle between the zonal and meridional components at each grid point. The trained models are then used to identify the corresponding eddy phase patterns and to predict the lifetime of a detected eddy structure. The performance of the proposed method is examined at two dynamically different regions to demonstrate its robust behavior and region independency.

  4. Using machine learning algorithms to guide rehabilitation planning for home care clients.

    PubMed

    Zhu, Mu; Zhang, Zhanyang; Hirdes, John P; Stolee, Paul

    2007-12-20

    Targeting older clients for rehabilitation is a clinical challenge and a research priority. We investigate the potential of machine learning algorithms - Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) - to guide rehabilitation planning for home care clients. This study is a secondary analysis of data on 24,724 longer-term clients from eight home care programs in Ontario. Data were collected with the RAI-HC assessment system, in which the Activities of Daily Living Clinical Assessment Protocol (ADLCAP) is used to identify clients with rehabilitation potential. For study purposes, a client is defined as having rehabilitation potential if there was: i) improvement in ADL functioning, or ii) discharge home. SVM and KNN results are compared with those obtained using the ADLCAP. For comparison, the machine learning algorithms use the same functional and health status indicators as the ADLCAP. The KNN and SVM algorithms achieved similar substantially improved performance over the ADLCAP, although false positive and false negative rates were still fairly high (FP > .18, FN > .34 versus FP > .29, FN. > .58 for ADLCAP). Results are used to suggest potential revisions to the ADLCAP. Machine learning algorithms achieved superior predictions than the current protocol. Machine learning results are less readily interpretable, but can also be used to guide development of improved clinical protocols.

  5. Behavioral Modeling for Mental Health using Machine Learning Algorithms.

    PubMed

    Srividya, M; Mohanavalli, S; Bhalaji, N

    2018-04-03

    Mental health is an indicator of emotional, psychological and social well-being of an individual. It determines how an individual thinks, feels and handle situations. Positive mental health helps one to work productively and realize their full potential. Mental health is important at every stage of life, from childhood and adolescence through adulthood. Many factors contribute to mental health problems which lead to mental illness like stress, social anxiety, depression, obsessive compulsive disorder, drug addiction, and personality disorders. It is becoming increasingly important to determine the onset of the mental illness to maintain proper life balance. The nature of machine learning algorithms and Artificial Intelligence (AI) can be fully harnessed for predicting the onset of mental illness. Such applications when implemented in real time will benefit the society by serving as a monitoring tool for individuals with deviant behavior. This research work proposes to apply various machine learning algorithms such as support vector machines, decision trees, naïve bayes classifier, K-nearest neighbor classifier and logistic regression to identify state of mental health in a target group. The responses obtained from the target group for the designed questionnaire were first subject to unsupervised learning techniques. The labels obtained as a result of clustering were validated by computing the Mean Opinion Score. These cluster labels were then used to build classifiers to predict the mental health of an individual. Population from various groups like high school students, college students and working professionals were considered as target groups. The research presents an analysis of applying the aforementioned machine learning algorithms on the target groups and also suggests directions for future work.

  6. Recent machine learning advancements in sensor-based mobility analysis: Deep learning for Parkinson's disease assessment.

    PubMed

    Eskofier, Bjoern M; Lee, Sunghoon I; Daneault, Jean-Francois; Golabchi, Fatemeh N; Ferreira-Carvalho, Gabriela; Vergara-Diaz, Gloria; Sapienza, Stefano; Costante, Gianluca; Klucken, Jochen; Kautz, Thomas; Bonato, Paolo

    2016-08-01

    The development of wearable sensors has opened the door for long-term assessment of movement disorders. However, there is still a need for developing methods suitable to monitor motor symptoms in and outside the clinic. The purpose of this paper was to investigate deep learning as a method for this monitoring. Deep learning recently broke records in speech and image classification, but it has not been fully investigated as a potential approach to analyze wearable sensor data. We collected data from ten patients with idiopathic Parkinson's disease using inertial measurement units. Several motor tasks were expert-labeled and used for classification. We specifically focused on the detection of bradykinesia. For this, we compared standard machine learning pipelines with deep learning based on convolutional neural networks. Our results showed that deep learning outperformed other state-of-the-art machine learning algorithms by at least 4.6 % in terms of classification rate. We contribute a discussion of the advantages and disadvantages of deep learning for sensor-based movement assessment and conclude that deep learning is a promising method for this field.

  7. Large-Scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation

    DTIC Science & Technology

    2016-08-10

    AFRL-AFOSR-JP-TR-2016-0073 Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation ...2016 4.  TITLE AND SUBTITLE Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation 5a...performances on various machine learning tasks and it naturally lends itself to fast parallel implementations . Despite this, very little work has been

  8. Exploiting the Dynamics of Soft Materials for Machine Learning

    PubMed Central

    Hauser, Helmut; Li, Tao; Pfeifer, Rolf

    2018-01-01

    Abstract Soft materials are increasingly utilized for various purposes in many engineering applications. These materials have been shown to perform a number of functions that were previously difficult to implement using rigid materials. Here, we argue that the diverse dynamics generated by actuating soft materials can be effectively used for machine learning purposes. This is demonstrated using a soft silicone arm through a technique of multiplexing, which enables the rich transient dynamics of the soft materials to be fully exploited as a computational resource. The computational performance of the soft silicone arm is examined through two standard benchmark tasks. Results show that the soft arm compares well to or even outperforms conventional machine learning techniques under multiple conditions. We then demonstrate that this system can be used for the sensory time series prediction problem for the soft arm itself, which suggests its immediate applicability to a real-world machine learning problem. Our approach, on the one hand, represents a radical departure from traditional computational methods, whereas on the other hand, it fits nicely into a more general perspective of computation by way of exploiting the properties of physical materials in the real world. PMID:29708857

  9. Exploiting the Dynamics of Soft Materials for Machine Learning.

    PubMed

    Nakajima, Kohei; Hauser, Helmut; Li, Tao; Pfeifer, Rolf

    2018-06-01

    Soft materials are increasingly utilized for various purposes in many engineering applications. These materials have been shown to perform a number of functions that were previously difficult to implement using rigid materials. Here, we argue that the diverse dynamics generated by actuating soft materials can be effectively used for machine learning purposes. This is demonstrated using a soft silicone arm through a technique of multiplexing, which enables the rich transient dynamics of the soft materials to be fully exploited as a computational resource. The computational performance of the soft silicone arm is examined through two standard benchmark tasks. Results show that the soft arm compares well to or even outperforms conventional machine learning techniques under multiple conditions. We then demonstrate that this system can be used for the sensory time series prediction problem for the soft arm itself, which suggests its immediate applicability to a real-world machine learning problem. Our approach, on the one hand, represents a radical departure from traditional computational methods, whereas on the other hand, it fits nicely into a more general perspective of computation by way of exploiting the properties of physical materials in the real world.

  10. A Machine Learning Framework for Plan Payment Risk Adjustment.

    PubMed

    Rose, Sherri

    2016-12-01

    To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.

  11. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient?

    PubMed

    Skoraczyński, G; Dittwald, P; Miasojedow, B; Szymkuć, S; Gajewska, E P; Grzybowski, B A; Gambin, A

    2017-06-15

    As machine learning/artificial intelligence algorithms are defeating chess masters and, most recently, GO champions, there is interest - and hope - that they will prove equally useful in assisting chemists in predicting outcomes of organic reactions. This paper demonstrates, however, that the applicability of machine learning to the problems of chemical reactivity over diverse types of chemistries remains limited - in particular, with the currently available chemical descriptors, fundamental mathematical theorems impose upper bounds on the accuracy with which raction yields and times can be predicted. Improving the performance of machine-learning methods calls for the development of fundamentally new chemical descriptors.

  12. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  13. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  14. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines.

    PubMed

    Neftci, Emre O; Pedroni, Bruno U; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware.

  15. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines

    PubMed Central

    Neftci, Emre O.; Pedroni, Bruno U.; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware. PMID:27445650

  16. Amplifying human ability through autonomics and machine learning in IMPACT

    NASA Astrophysics Data System (ADS)

    Dzieciuch, Iryna; Reeder, John; Gutzwiller, Robert; Gustafson, Eric; Coronado, Braulio; Martinez, Luis; Croft, Bryan; Lange, Douglas S.

    2017-05-01

    Amplifying human ability for controlling complex environments featuring autonomous units can be aided by learned models of human and system performance. In developing a command and control system that allows a small number of people to control a large number of autonomous teams, we employ an autonomics framework to manage the networks that represent mission plans and the networks that are composed of human controllers and their autonomous assistants. Machine learning allows us to build models of human and system performance useful for monitoring plans and managing human attention and task loads. Machine learning also aids in the development of tactics that human supervisors can successfully monitor through the command and control system.

  17. Classification of Strawberry Fruit Shape by Machine Learning

    NASA Astrophysics Data System (ADS)

    Ishikawa, T.; Hayashi, A.; Nagamatsu, S.; Kyutoku, Y.; Dan, I.; Wada, T.; Oku, K.; Saeki, Y.; Uto, T.; Tanabata, T.; Isobe, S.; Kochi, N.

    2018-05-01

    Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning methods to other plant species.

  18. A strategy to apply machine learning to small datasets in materials science

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Ling, Chen

    2018-12-01

    There is growing interest in applying machine learning techniques in the research of materials science. However, although it is recognized that materials datasets are typically smaller and sometimes more diverse compared to other fields, the influence of availability of materials data on training machine learning models has not yet been studied, which prevents the possibility to establish accurate predictive rules using small materials datasets. Here we analyzed the fundamental interplay between the availability of materials data and the predictive capability of machine learning models. Instead of affecting the model precision directly, the effect of data size is mediated by the degree of freedom (DoF) of model, resulting in the phenomenon of association between precision and DoF. The appearance of precision-DoF association signals the issue of underfitting and is characterized by large bias of prediction, which consequently restricts the accurate prediction in unknown domains. We proposed to incorporate the crude estimation of property in the feature space to establish ML models using small sized materials data, which increases the accuracy of prediction without the cost of higher DoF. In three case studies of predicting the band gap of binary semiconductors, lattice thermal conductivity, and elastic properties of zeolites, the integration of crude estimation effectively boosted the predictive capability of machine learning models to state-of-art levels, demonstrating the generality of the proposed strategy to construct accurate machine learning models using small materials dataset.

  19. Classification of large-sized hyperspectral imagery using fast machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira

    2017-07-01

    We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

  20. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    PubMed

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  1. 3D Visualization of Machine Learning Algorithms with Astronomical Data

    NASA Astrophysics Data System (ADS)

    Kent, Brian R.

    2016-01-01

    We present innovative machine learning (ML) methods using unsupervised clustering with minimum spanning trees (MSTs) to study 3D astronomical catalogs. Utilizing Python code to build trees based on galaxy catalogs, we can render the results with the visualization suite Blender to produce interactive 360 degree panoramic videos. The catalogs and their ML results can be explored in a 3D space using mobile devices, tablets or desktop browsers. We compare the statistics of the MST results to a number of machine learning methods relating to optimization and efficiency.

  2. Finding New Perovskite Halides via Machine learning

    NASA Astrophysics Data System (ADS)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  3. Finding new perovskite halides via machine learning

    DOE PAGES

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; ...

    2016-04-26

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach toward rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning, henceforth referred to as ML) via building a support vectormore » machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX 3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br, or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 185 experimentally known ABX 3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor, and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. As a result, the trained and validated models then predict, with a high degree of confidence, several novel ABX 3 compositions with perovskite crystal structure.« less

  4. Finding new perovskite halides via machine learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach toward rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning, henceforth referred to as ML) via building a support vectormore » machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX 3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br, or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 185 experimentally known ABX 3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor, and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. As a result, the trained and validated models then predict, with a high degree of confidence, several novel ABX 3 compositions with perovskite crystal structure.« less

  5. Toward accelerating landslide mapping with interactive machine learning techniques

    NASA Astrophysics Data System (ADS)

    Stumpf, André; Lachiche, Nicolas; Malet, Jean-Philippe; Kerle, Norman; Puissant, Anne

    2013-04-01

    Despite important advances in the development of more automated methods for landslide mapping from optical remote sensing images, the elaboration of inventory maps after major triggering events still remains a tedious task. Image classification with expert defined rules typically still requires significant manual labour for the elaboration and adaption of rule sets for each particular case. Machine learning algorithm, on the contrary, have the ability to learn and identify complex image patterns from labelled examples but may require relatively large amounts of training data. In order to reduce the amount of required training data active learning has evolved as key concept to guide the sampling for applications such as document classification, genetics and remote sensing. The general underlying idea of most active learning approaches is to initialize a machine learning model with a small training set, and to subsequently exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labelled by the user and added in the training set. With relatively few queries and labelled samples, an active learning strategy should ideally yield at least the same accuracy than an equivalent classifier trained with many randomly selected samples. Our study was dedicated to the development of an active learning approach for landslide mapping from VHR remote sensing images with special consideration of the spatial distribution of the samples. The developed approach is a region-based query heuristic that enables to guide the user attention towards few compact spatial batches rather than distributed points resulting in time savings of 50% and more compared to standard active learning techniques. The approach was tested with multi-temporal and multi-sensor satellite images capturing recent large scale triggering events in Brazil and China and demonstrated balanced user's and producer's accuracies between 74% and 80%. The assessment also

  6. Amp: A modular approach to machine learning in atomistic simulations

    NASA Astrophysics Data System (ADS)

    Khorshidi, Alireza; Peterson, Andrew A.

    2016-10-01

    Electronic structure calculations, such as those employing Kohn-Sham density functional theory or ab initio wavefunction theories, have allowed for atomistic-level understandings of a wide variety of phenomena and properties of matter at small scales. However, the computational cost of electronic structure methods drastically increases with length and time scales, which makes these methods difficult for long time-scale molecular dynamics simulations or large-sized systems. Machine-learning techniques can provide accurate potentials that can match the quality of electronic structure calculations, provided sufficient training data. These potentials can then be used to rapidly simulate large and long time-scale phenomena at similar quality to the parent electronic structure approach. Machine-learning potentials usually take a bias-free mathematical form and can be readily developed for a wide variety of systems. Electronic structure calculations have favorable properties-namely that they are noiseless and targeted training data can be produced on-demand-that make them particularly well-suited for machine learning. This paper discusses our modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package (Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems. Potentials developed through the atom-centered approach are simultaneously applicable for systems with various sizes. Interpolation can be enhanced by introducing custom descriptors of the local environment. We demonstrate this in the current work for Gaussian-type, bispectrum, and Zernike-type descriptors. Amp has an intuitive and modular structure with an interface through the python scripting language yet has parallelizable fortran components for demanding tasks; it is designed to integrate closely with the widely used Atomic Simulation Environment (ASE), which

  7. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies.

    PubMed

    Hansen, Katja; Montavon, Grégoire; Biegler, Franziska; Fazli, Siamac; Rupp, Matthias; Scheffler, Matthias; von Lilienfeld, O Anatole; Tkatchenko, Alexandre; Müller, Klaus-Robert

    2013-08-13

    The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.

  8. Machine Learning for Detecting Gene-Gene Interactions

    PubMed Central

    McKinney, Brett A.; Reif, David M.; Ritchie, Marylyn D.; Moore, Jason H.

    2011-01-01

    Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics. PMID:16722772

  9. A comparative study of machine learning models for ethnicity classification

    NASA Astrophysics Data System (ADS)

    Trivedi, Advait; Bessie Amali, D. Geraldine

    2017-11-01

    This paper endeavours to adopt a machine learning approach to solve the problem of ethnicity recognition. Ethnicity identification is an important vision problem with its use cases being extended to various domains. Despite the multitude of complexity involved, ethnicity identification comes naturally to humans. This meta information can be leveraged to make several decisions, be it in target marketing or security. With the recent development of intelligent systems a sub module to efficiently capture ethnicity would be useful in several use cases. Several attempts to identify an ideal learning model to represent a multi-ethnic dataset have been recorded. A comparative study of classifiers such as support vector machines, logistic regression has been documented. Experimental results indicate that the logical classifier provides a much accurate classification than the support vector machine.

  10. Sequential Nonlinear Learning for Distributed Multiagent Systems via Extreme Learning Machines.

    PubMed

    Vanli, Nuri Denizcan; Sayin, Muhammed O; Delibalta, Ibrahim; Kozat, Suleyman Serdar

    2017-03-01

    We study online nonlinear learning over distributed multiagent systems, where each agent employs a single hidden layer feedforward neural network (SLFN) structure to sequentially minimize arbitrary loss functions. In particular, each agent trains its own SLFN using only the data that is revealed to itself. On the other hand, the aim of the multiagent system is to train the SLFN at each agent as well as the optimal centralized batch SLFN that has access to all the data, by exchanging information between neighboring agents. We address this problem by introducing a distributed subgradient-based extreme learning machine algorithm. The proposed algorithm provides guaranteed upper bounds on the performance of the SLFN at each agent and shows that each of these individual SLFNs asymptotically achieves the performance of the optimal centralized batch SLFN. Our performance guarantees explicitly distinguish the effects of data- and network-dependent parameters on the convergence rate of the proposed algorithm. The experimental results illustrate that the proposed algorithm achieves the oracle performance significantly faster than the state-of-the-art methods in the machine learning and signal processing literature. Hence, the proposed method is highly appealing for the applications involving big data.

  11. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Machine learning challenges in Mars rover traverse science

    NASA Technical Reports Server (NTRS)

    Castano, R.; Judd, M.; Anderson, R. C.; Estlin, T.

    2003-01-01

    The successful implementation of machine learning in autonomous rover traverse science requires addressing challenges that range from the analytical technical realm, to the fuzzy, philosophical domain of entrenched belief systems within scientists and mission managers.

  13. Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods

    PubMed Central

    Stone, Bryan L; Johnson, Michael D; Tarczy-Hornoch, Peter; Wilcox, Adam B; Mooney, Sean D; Sheng, Xiaoming; Haug, Peter J; Nkoy, Flory L

    2017-01-01

    Background To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient’s weight kept rising in the past year). This process becomes infeasible with limited budgets. Objective This study’s goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. Methods This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new

  14. Machine Learning in Medical Imaging.

    PubMed

    Giger, Maryellen L

    2018-03-01

    Advances in both imaging and computers have synergistically led to a rapid rise in the potential use of artificial intelligence in various radiological imaging tasks, such as risk assessment, detection, diagnosis, prognosis, and therapy response, as well as in multi-omics disease discovery. A brief overview of the field is given here, allowing the reader to recognize the terminology, the various subfields, and components of machine learning, as well as the clinical potential. Radiomics, an expansion of computer-aided diagnosis, has been defined as the conversion of images to minable data. The ultimate benefit of quantitative radiomics is to (1) yield predictive image-based phenotypes of disease for precision medicine or (2) yield quantitative image-based phenotypes for data mining with other -omics for discovery (ie, imaging genomics). For deep learning in radiology to succeed, note that well-annotated large data sets are needed since deep networks are complex, computer software and hardware are evolving constantly, and subtle differences in disease states are more difficult to perceive than differences in everyday objects. In the future, machine learning in radiology is expected to have a substantial clinical impact with imaging examinations being routinely obtained in clinical practice, providing an opportunity to improve decision support in medical image interpretation. The term of note is decision support, indicating that computers will augment human decision making, making it more effective and efficient. The clinical impact of having computers in the routine clinical practice may allow radiologists to further integrate their knowledge with their clinical colleagues in other medical specialties and allow for precision medicine. Copyright © 2018. Published by Elsevier Inc.

  15. Biomarkers for Musculoskeletal Pain Conditions: Use of Brain Imaging and Machine Learning.

    PubMed

    Boissoneault, Jeff; Sevel, Landrew; Letzen, Janelle; Robinson, Michael; Staud, Roland

    2017-01-01

    Chronic musculoskeletal pain condition often shows poor correlations between tissue abnormalities and clinical pain. Therefore, classification of pain conditions like chronic low back pain, osteoarthritis, and fibromyalgia depends mostly on self report and less on objective findings like X-ray or magnetic resonance imaging (MRI) changes. However, recent advances in structural and functional brain imaging have identified brain abnormalities in chronic pain conditions that can be used for illness classification. Because the analysis of complex and multivariate brain imaging data is challenging, machine learning techniques have been increasingly utilized for this purpose. The goal of machine learning is to train specific classifiers to best identify variables of interest on brain MRIs (i.e., biomarkers). This report describes classification techniques capable of separating MRI-based brain biomarkers of chronic pain patients from healthy controls with high accuracy (70-92%) using machine learning, as well as critical scientific, practical, and ethical considerations related to their potential clinical application. Although self-report remains the gold standard for pain assessment, machine learning may aid in the classification of chronic pain disorders like chronic back pain and fibromyalgia as well as provide mechanistic information regarding their neural correlates.

  16. Smarter Instruments, Smarter Archives: Machine Learning for Tactical Science

    NASA Astrophysics Data System (ADS)

    Thompson, D. R.; Kiran, R.; Allwood, A.; Altinok, A.; Estlin, T.; Flannery, D.

    2014-12-01

    There has been a growing interest by Earth and Planetary Sciences in machine learning, visualization and cyberinfrastructure to interpret ever-increasing volumes of instrument data. Such tools are commonly used to analyze archival datasets, but they can also play a valuable real-time role during missions. Here we discuss ways that machine learning can benefit tactical science decisions during Earth and Planetary Exploration. Machine learning's potential begins at the instrument itself. Smart instruments endowed with pattern recognition can immediately recognize science features of interest. This allows robotic explorers to optimize their limited communications bandwidth, triaging science products and prioritizing the most relevant data. Smart instruments can also target their data collection on the fly, using principles of experimental design to reduce redundancy and generally improve sampling efficiency for time-limited operations. Moreover, smart instruments can respond immediately to transient or unexpected phenomena. Examples include detections of cometary plumes, terrestrial floods, or volcanism. We show recent examples of smart instruments from 2014 tests including: aircraft and spacecraft remote sensing instruments that recognize cloud contamination, field tests of a "smart camera" for robotic surface geology, and adaptive data collection by X-Ray fluorescence spectrometers. Machine learning can also assist human operators when tactical decision making is required. Terrestrial scenarios include airborne remote sensing, where the decision to re-fly a transect must be made immediately. Planetary scenarios include deep space encounters or planetary surface exploration, where the number of command cycles is limited and operators make rapid daily decisions about where next to collect measurements. Visualization and modeling can reveal trends, clusters, and outliers in new data. This can help operators recognize instrument artifacts or spot anomalies in real time

  17. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.

    PubMed

    Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne

    2018-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.

  18. Integrating Machine Learning into Space Operations

    NASA Astrophysics Data System (ADS)

    Kelly, K. G.

    There are significant challenges with managing activities in space, which for the scope of this paper are primarily the identification of objects in orbit, maintaining accurate estimates of the orbits of those objects, detecting changes to those orbits, warning of possible collisions between objects and detection of anomalous behavior. The challenges come from the large amounts of data to be processed, which is often incomplete and noisy, limitations on the ability to influence objects in space and the overall strategic importance of space to national interests. The focus of this paper is on defining an approach to leverage the improved capabilities that are possible using state of the art machine learning in a way that empowers operations personnel without sacrificing the security and mission assurance associated with manual operations performed by trained personnel. There has been significant research in the development of algorithms and techniques for applying machine learning in this domain, but deploying new techniques into such a mission critical domain is difficult and time consuming. Establishing a common framework could improve the efficiency with which new techniques are integrated into operations and the overall effectiveness at providing improvements.

  19. Machine Learning Interface for Medical Image Analysis.

    PubMed

    Zhang, Yi C; Kagen, Alexander C

    2017-10-01

    TensorFlow is a second-generation open-source machine learning software library with a built-in framework for implementing neural networks in wide variety of perceptual tasks. Although TensorFlow usage is well established with computer vision datasets, the TensorFlow interface with DICOM formats for medical imaging remains to be established. Our goal is to extend the TensorFlow API to accept raw DICOM images as input; 1513 DaTscan DICOM images were obtained from the Parkinson's Progression Markers Initiative (PPMI) database. DICOM pixel intensities were extracted and shaped into tensors, or n-dimensional arrays, to populate the training, validation, and test input datasets for machine learning. A simple neural network was constructed in TensorFlow to classify images into normal or Parkinson's disease groups. Training was executed over 1000 iterations for each cross-validation set. The gradient descent optimization and Adagrad optimization algorithms were used to minimize cross-entropy between the predicted and ground-truth labels. Cross-validation was performed ten times to produce a mean accuracy of 0.938 ± 0.047 (95 % CI 0.908-0.967). The mean sensitivity was 0.974 ± 0.043 (95 % CI 0.947-1.00) and mean specificity was 0.822 ± 0.207 (95 % CI 0.694-0.950). We extended the TensorFlow API to enable DICOM compatibility in the context of DaTscan image analysis. We implemented a neural network classifier that produces diagnostic accuracies on par with excellent results from previous machine learning models. These results indicate the potential role of TensorFlow as a useful adjunct diagnostic tool in the clinical setting.

  20. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques.

    PubMed

    Wang, Guanjin; Lam, Kin-Man; Deng, Zhaohong; Choi, Kup-Sze

    2015-08-01

    Bladder cancer is a common cancer in genitourinary malignancy. For muscle invasive bladder cancer, surgical removal of the bladder, i.e. radical cystectomy, is in general the definitive treatment which, unfortunately, carries significant morbidities and mortalities. Accurate prediction of the mortality of radical cystectomy is therefore needed. Statistical methods have conventionally been used for this purpose, despite the complex interactions of high-dimensional medical data. Machine learning has emerged as a promising technique for handling high-dimensional data, with increasing application in clinical decision support, e.g. cancer prediction and prognosis. Its ability to reveal the hidden nonlinear interactions and interpretable rules between dependent and independent variables is favorable for constructing models of effective generalization performance. In this paper, seven machine learning methods are utilized to predict the 5-year mortality of radical cystectomy, including back-propagation neural network (BPN), radial basis function (RBFN), extreme learning machine (ELM), regularized ELM (RELM), support vector machine (SVM), naive Bayes (NB) classifier and k-nearest neighbour (KNN), on a clinicopathological dataset of 117 patients of the urology unit of a hospital in Hong Kong. The experimental results indicate that RELM achieved the highest average prediction accuracy of 0.8 at a fast learning speed. The research findings demonstrate the potential of applying machine learning techniques to support clinical decision making. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Vowel Imagery Decoding toward Silent Speech BCI Using Extreme Learning Machine with Electroencephalogram

    PubMed Central

    Kim, Jongin; Park, Hyeong-jun

    2016-01-01

    The purpose of this study is to classify EEG data on imagined speech in a single trial. We recorded EEG data while five subjects imagined different vowels, /a/, /e/, /i/, /o/, and /u/. We divided each single trial dataset into thirty segments and extracted features (mean, variance, standard deviation, and skewness) from all segments. To reduce the dimension of the feature vector, we applied a feature selection algorithm based on the sparse regression model. These features were classified using a support vector machine with a radial basis function kernel, an extreme learning machine, and two variants of an extreme learning machine with different kernels. Because each single trial consisted of thirty segments, our algorithm decided the label of the single trial by selecting the most frequent output among the outputs of the thirty segments. As a result, we observed that the extreme learning machine and its variants achieved better classification rates than the support vector machine with a radial basis function kernel and linear discrimination analysis. Thus, our results suggested that EEG responses to imagined speech could be successfully classified in a single trial using an extreme learning machine with a radial basis function and linear kernel. This study with classification of imagined speech might contribute to the development of silent speech BCI systems. PMID:28097128

  2. Quantum adiabatic machine learning

    NASA Astrophysics Data System (ADS)

    Pudenz, Kristen L.; Lidar, Daniel A.

    2013-05-01

    We develop an approach to machine learning and anomaly detection via quantum adiabatic evolution. This approach consists of two quantum phases, with some amount of classical preprocessing to set up the quantum problems. In the training phase we identify an optimal set of weak classifiers, to form a single strong classifier. In the testing phase we adiabatically evolve one or more strong classifiers on a superposition of inputs in order to find certain anomalous elements in the classification space. Both the training and testing phases are executed via quantum adiabatic evolution. All quantum processing is strictly limited to two-qubit interactions so as to ensure physical feasibility. We apply and illustrate this approach in detail to the problem of software verification and validation, with a specific example of the learning phase applied to a problem of interest in flight control systems. Beyond this example, the algorithm can be used to attack a broad class of anomaly detection problems.

  3. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less

  4. Photometric Supernova Classification with Machine Learning

    NASA Astrophysics Data System (ADS)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  5. Machine learning search for variable stars

    NASA Astrophysics Data System (ADS)

    Pashchenko, Ilya N.; Sokolovsky, Kirill V.; Gavras, Panagiotis

    2018-04-01

    Photometric variability detection is often considered as a hypothesis testing problem: an object is variable if the null hypothesis that its brightness is constant can be ruled out given the measurements and their uncertainties. The practical applicability of this approach is limited by uncorrected systematic errors. We propose a new variability detection technique sensitive to a wide range of variability types while being robust to outliers and underestimated measurement uncertainties. We consider variability detection as a classification problem that can be approached with machine learning. Logistic Regression (LR), Support Vector Machines (SVM), k Nearest Neighbours (kNN), Neural Nets (NN), Random Forests (RF), and Stochastic Gradient Boosting classifier (SGB) are applied to 18 features (variability indices) quantifying scatter and/or correlation between points in a light curve. We use a subset of Optical Gravitational Lensing Experiment phase two (OGLE-II) Large Magellanic Cloud (LMC) photometry (30 265 light curves) that was searched for variability using traditional methods (168 known variable objects) as the training set and then apply the NN to a new test set of 31 798 OGLE-II LMC light curves. Among 205 candidates selected in the test set, 178 are real variables, while 13 low-amplitude variables are new discoveries. The machine learning classifiers considered are found to be more efficient (select more variables and fewer false candidates) compared to traditional techniques using individual variability indices or their linear combination. The NN, SGB, SVM, and RF show a higher efficiency compared to LR and kNN.

  6. Use of Advanced Machine-Learning Techniques for Non-Invasive Monitoring of Hemorrhage

    DTIC Science & Technology

    2010-04-01

    that state-of-the-art machine learning techniques when integrated with novel non-invasive monitoring technologies could detect subtle, physiological...decompensation. Continuous, non-invasively measured hemodynamic signals (e.g., ECG, blood pressures, stroke volume) were used for the development of machine ... learning algorithms. Accuracy estimates were obtained by building models using 27 subjects and testing on the 28th. This process was repeated 28 times

  7. Mining the Galaxy Zoo Database: Machine Learning Applications

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.

    2010-01-01

    The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.

  8. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines

    PubMed Central

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  9. Image analysis and machine learning for detecting malaria.

    PubMed

    Poostchi, Mahdieh; Silamut, Kamolrat; Maude, Richard J; Jaeger, Stefan; Thoma, George

    2018-04-01

    Malaria remains a major burden on global health, with roughly 200 million cases worldwide and more than 400,000 deaths per year. Besides biomedical research and political efforts, modern information technology is playing a key role in many attempts at fighting the disease. One of the barriers toward a successful mortality reduction has been inadequate malaria diagnosis in particular. To improve diagnosis, image analysis software and machine learning methods have been used to quantify parasitemia in microscopic blood slides. This article gives an overview of these techniques and discusses the current developments in image analysis and machine learning for microscopic malaria diagnosis. We organize the different approaches published in the literature according to the techniques used for imaging, image preprocessing, parasite detection and cell segmentation, feature computation, and automatic cell classification. Readers will find the different techniques listed in tables, with the relevant articles cited next to them, for both thin and thick blood smear images. We also discussed the latest developments in sections devoted to deep learning and smartphone technology for future malaria diagnosis. Published by Elsevier Inc.

  10. Machine learning-based methods for prediction of linear B-cell epitopes.

    PubMed

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  11. Image Reconstruction is a New Frontier of Machine Learning.

    PubMed

    Wang, Ge; Ye, Jong Chu; Mueller, Klaus; Fessler, Jeffrey A

    2018-06-01

    Over past several years, machine learning, or more generally artificial intelligence, has generated overwhelming research interest and attracted unprecedented public attention. As tomographic imaging researchers, we share the excitement from our imaging perspective [item 1) in the Appendix], and organized this special issue dedicated to the theme of "Machine learning for image reconstruction." This special issue is a sister issue of the special issue published in May 2016 of this journal with the theme "Deep learning in medical imaging" [item 2) in the Appendix]. While the previous special issue targeted medical image processing/analysis, this special issue focuses on data-driven tomographic reconstruction. These two special issues are highly complementary, since image reconstruction and image analysis are two of the main pillars for medical imaging. Together we cover the whole workflow of medical imaging: from tomographic raw data/features to reconstructed images and then extracted diagnostic features/readings.

  12. Formation enthalpies for transition metal alloys using machine learning

    NASA Astrophysics Data System (ADS)

    Ubaru, Shashanka; Miedlar, Agnieszka; Saad, Yousef; Chelikowsky, James R.

    2017-06-01

    The enthalpy of formation is an important thermodynamic property. Developing fast and accurate methods for its prediction is of practical interest in a variety of applications. Material informatics techniques based on machine learning have recently been introduced in the literature as an inexpensive means of exploiting materials data, and can be used to examine a variety of thermodynamics properties. We investigate the use of such machine learning tools for predicting the formation enthalpies of binary intermetallic compounds that contain at least one transition metal. We consider certain easily available properties of the constituting elements complemented by some basic properties of the compounds, to predict the formation enthalpies. We show how choosing these properties (input features) based on a literature study (using prior physics knowledge) seems to outperform machine learning based feature selection methods such as sensitivity analysis and LASSO (least absolute shrinkage and selection operator) based methods. A nonlinear kernel based support vector regression method is employed to perform the predictions. The predictive ability of our model is illustrated via several experiments on a dataset containing 648 binary alloys. We train and validate the model using the formation enthalpies calculated using a model by Miedema, which is a popular semiempirical model used for the prediction of formation enthalpies of metal alloys.

  13. Bio-Inspired Human-Level Machine Learning

    DTIC Science & Technology

    2015-10-25

    extensions to high-level cognitive functions such as anagram solving problem. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF...extensions to high-level cognitive functions such as anagram solving problem. We expect that the bio-inspired human-level machine learning combined with...numbers of 1011 neurons and 1014 synaptic connections in the human brain. In previous work, we experimentally demonstrated the feasibility of cognitive

  14. A machine learning approach for viral genome classification.

    PubMed

    Remita, Mohamed Amine; Halioui, Ahmed; Malick Diouara, Abou Abdallah; Daigle, Bruno; Kiani, Golrokh; Diallo, Abdoulaye Baniré

    2017-04-11

    Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The classification and annotation of these genomes constitute important assets in the discovery of genomic variability, taxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific well-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and accurate tools for classifying and typing newly sequenced strains of diverse virus families. Here, we introduce a virus classification platform, CASTOR, based on machine learning methods. CASTOR is inspired by a well-known technique in molecular biology: restriction fragment length polymorphism (RFLP). It simulates, in silico, the restriction digestion of genomic material by different enzymes into fragments. It uses two metrics to construct feature vectors for machine learning algorithms in the classification step. We benchmark CASTOR for the classification of distinct datasets of human papillomaviruses (HPV), hepatitis B viruses (HBV) and human immunodeficiency viruses type 1 (HIV-1). Results reveal true positive rates of 99%, 99% and 98% for HPV Alpha species, HBV genotyping and HIV-1 M subtyping, respectively. Furthermore, CASTOR shows a competitive performance compared to well-known HIV-1 specific classifiers (REGA and COMET) on whole genomes and pol fragments. The performance of CASTOR, its genericity and robustness could permit to perform novel and accurate large scale virus studies. The CASTOR web platform provides an open access, collaborative and reproducible machine learning classifiers. CASTOR can be accessed at http://castor.bioinfo.uqam.ca .

  15. Deep learning of support vector machines with class probability output networks.

    PubMed

    Kim, Sangwook; Yu, Zhibin; Kil, Rhee Man; Lee, Minho

    2015-04-01

    Deep learning methods endeavor to learn features automatically at multiple levels and allow systems to learn complex functions mapping from the input space to the output space for the given data. The ability to learn powerful features automatically is increasingly important as the volume of data and range of applications of machine learning methods continues to grow. This paper proposes a new deep architecture that uses support vector machines (SVMs) with class probability output networks (CPONs) to provide better generalization power for pattern classification problems. As a result, deep features are extracted without additional feature engineering steps, using multiple layers of the SVM classifiers with CPONs. The proposed structure closely approaches the ideal Bayes classifier as the number of layers increases. Using a simulation of classification problems, the effectiveness of the proposed method is demonstrated. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

    PubMed Central

    Belo, David; Gamboa, Hugo

    2017-01-01

    The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components. PMID:28831239

  17. A Comparison of Machine Learning Approaches for Corn Yield Estimation

    NASA Astrophysics Data System (ADS)

    Kim, N.; Lee, Y. W.

    2017-12-01

    Machine learning is an efficient empirical method for classification and prediction, and it is another approach to crop yield estimation. The objective of this study is to estimate corn yield in the Midwestern United States by employing the machine learning approaches such as the support vector machine (SVM), random forest (RF), and deep neural networks (DNN), and to perform the comprehensive comparison for their results. We constructed the database using satellite images from MODIS, the climate data of PRISM climate group, and GLDAS soil moisture data. In addition, to examine the seasonal sensitivities of corn yields, two period groups were set up: May to September (MJJAS) and July and August (JA). In overall, the DNN showed the highest accuracies in term of the correlation coefficient for the two period groups. The differences between our predictions and USDA yield statistics were about 10-11 %.

  18. Machine learning algorithms to classify spinal muscular atrophy subtypes.

    PubMed

    Srivastava, Tuhin; Darras, Basil T; Wu, Jim S; Rutkove, Seward B

    2012-07-24

    The development of better biomarkers for disease assessment remains an ongoing effort across the spectrum of neurologic illnesses. One approach for refining biomarkers is based on the concept of machine learning, in which individual, unrelated biomarkers are simultaneously evaluated. In this cross-sectional study, we assess the possibility of using machine learning, incorporating both quantitative muscle ultrasound (QMU) and electrical impedance myography (EIM) data, for classification of muscles affected by spinal muscular atrophy (SMA). Twenty-one normal subjects, 15 subjects with SMA type 2, and 10 subjects with SMA type 3 underwent EIM and QMU measurements of unilateral biceps, wrist extensors, quadriceps, and tibialis anterior. EIM and QMU parameters were then applied in combination using a support vector machine (SVM), a type of machine learning, in an attempt to accurately categorize 165 individual muscles. For all 3 classification problems, normal vs SMA, normal vs SMA 3, and SMA 2 vs SMA 3, use of SVM provided the greatest accuracy in discrimination, surpassing both EIM and QMU individually. For example, the accuracy, as measured by the receiver operating characteristic area under the curve (ROC-AUC) for the SVM discriminating SMA 2 muscles from SMA 3 muscles was 0.928; in comparison, the ROC-AUCs for EIM and QMU parameters alone were only 0.877 (p < 0.05) and 0.627 (p < 0.05), respectively. Combining EIM and QMU data categorizes individual SMA-affected muscles with very high accuracy. Further investigation of this approach for classifying and for following the progression of neuromuscular illness is warranted.

  19. Quantum machine learning: a classical perspective

    NASA Astrophysics Data System (ADS)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  20. Quantum machine learning: a classical perspective

    PubMed Central

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed. PMID:29434508

  1. Quantum machine learning: a classical perspective.

    PubMed

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  2. Development of machine learning models for diagnosis of glaucoma.

    PubMed

    Kim, Seong Jae; Cho, Kyong Jin; Oh, Sejong

    2017-01-01

    The study aimed to develop machine learning models that have strong prediction power and interpretability for diagnosis of glaucoma based on retinal nerve fiber layer (RNFL) thickness and visual field (VF). We collected various candidate features from the examination of retinal nerve fiber layer (RNFL) thickness and visual field (VF). We also developed synthesized features from original features. We then selected the best features proper for classification (diagnosis) through feature evaluation. We used 100 cases of data as a test dataset and 399 cases of data as a training and validation dataset. To develop the glaucoma prediction model, we considered four machine learning algorithms: C5.0, random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN). We repeatedly composed a learning model using the training dataset and evaluated it by using the validation dataset. Finally, we got the best learning model that produces the highest validation accuracy. We analyzed quality of the models using several measures. The random forest model shows best performance and C5.0, SVM, and KNN models show similar accuracy. In the random forest model, the classification accuracy is 0.98, sensitivity is 0.983, specificity is 0.975, and AUC is 0.979. The developed prediction models show high accuracy, sensitivity, specificity, and AUC in classifying among glaucoma and healthy eyes. It will be used for predicting glaucoma against unknown examination records. Clinicians may reference the prediction results and be able to make better decisions. We may combine multiple learning models to increase prediction accuracy. The C5.0 model includes decision rules for prediction. It can be used to explain the reasons for specific predictions.

  3. Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View.

    PubMed

    Luo, Wei; Phung, Dinh; Tran, Truyen; Gupta, Sunil; Rana, Santu; Karmakar, Chandan; Shilton, Alistair; Yearwood, John; Dimitrova, Nevenka; Ho, Tu Bao; Venkatesh, Svetha; Berk, Michael

    2016-12-16

    As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. ©Wei Luo, Dinh Phung, Truyen Tran, Sunil Gupta, Santu Rana, Chandan Karmakar, Alistair Shilton, John Yearwood, Nevenka Dimitrova, Tu Bao Ho, Svetha Venkatesh, Michael Berk. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 16.12.2016.

  4. Relationships Between the External and Internal Training Load in Professional Soccer: What Can We Learn From Machine Learning?

    PubMed

    Jaspers, Arne; De Beéck, Tim Op; Brink, Michel S; Frencken, Wouter G P; Staes, Filip; Davis, Jesse J; Helsen, Werner F

    2018-05-01

    Machine learning may contribute to understanding the relationship between the external load and internal load in professional soccer. Therefore, the relationship between external load indicators (ELIs) and the rating of perceived exertion (RPE) was examined using machine learning techniques on a group and individual level. Training data were collected from 38 professional soccer players over 2 seasons. The external load was measured using global positioning system technology and accelerometry. The internal load was obtained using the RPE. Predictive models were constructed using 2 machine learning techniques, artificial neural networks and least absolute shrinkage and selection operator (LASSO) models, and 1 naive baseline method. The predictions were based on a large set of ELIs. Using each technique, 1 group model involving all players and 1 individual model for each player were constructed. These models' performance on predicting the reported RPE values for future training sessions was compared with the naive baseline's performance. Both the artificial neural network and LASSO models outperformed the baseline. In addition, the LASSO model made more accurate predictions for the RPE than did the artificial neural network model. Furthermore, decelerations were identified as important ELIs. Regardless of the applied machine learning technique, the group models resulted in equivalent or better predictions for the reported RPE values than the individual models. Machine learning techniques may have added value in predicting RPE for future sessions to optimize training design and evaluation. These techniques may also be used in conjunction with expert knowledge to select key ELIs for load monitoring.

  5. Application of Machine Learning to Proteomics Data: Classification and Biomarker Identification in Postgenomics Biology

    PubMed Central

    Swan, Anna Louise; Mobasheri, Ali; Allaway, David; Liddell, Susan

    2013-01-01

    Abstract Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes. PMID:24116388

  6. Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods.

    PubMed

    Luo, Gang; Stone, Bryan L; Johnson, Michael D; Tarczy-Hornoch, Peter; Wilcox, Adam B; Mooney, Sean D; Sheng, Xiaoming; Haug, Peter J; Nkoy, Flory L

    2017-08-29

    To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets. This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care

  7. A Machine Learning Approach to Student Modeling.

    DTIC Science & Technology

    1984-05-01

    machine learning , and describe ACN, a student modeling system that incorporates this approach. This system begins with a set of overly general rules, which it uses to search a problem space until it arrives at the same answer as the student. The ACM computer program then uses the solution path it has discovered to determine positive and negative instances of its initial rules, and employs a discrimination learning mechanism to place additional conditions on these rules. The revised rules will reproduce the solution path without search, and constitute a cognitive model of

  8. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

    PubMed

    Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z

    2009-05-01

    Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.

  9. Machine learning applications in proteomics research: how the past can boost the future.

    PubMed

    Kelchtermans, Pieter; Bittremieux, Wout; De Grave, Kurt; Degroeve, Sven; Ramon, Jan; Laukens, Kris; Valkenborg, Dirk; Barsnes, Harald; Martens, Lennart

    2014-03-01

    Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Daily sea level prediction at Chiayi coast, Taiwan using extreme learning machine and relevance vector machine

    NASA Astrophysics Data System (ADS)

    Imani, Moslem; Kao, Huan-Chin; Lan, Wen-Hau; Kuo, Chung-Yen

    2018-02-01

    The analysis and the prediction of sea level fluctuations are core requirements of marine meteorology and operational oceanography. Estimates of sea level with hours-to-days warning times are especially important for low-lying regions and coastal zone management. The primary purpose of this study is to examine the applicability and capability of extreme learning machine (ELM) and relevance vector machine (RVM) models for predicting sea level variations and compare their performances with powerful machine learning methods, namely, support vector machine (SVM) and radial basis function (RBF) models. The input dataset from the period of January 2004 to May 2011 used in the study was obtained from the Dongshi tide gauge station in Chiayi, Taiwan. Results showed that the ELM and RVM models outperformed the other methods. The performance of the RVM approach was superior in predicting the daily sea level time series given the minimum root mean square error of 34.73 mm and the maximum determination coefficient of 0.93 (R2) during the testing periods. Furthermore, the obtained results were in close agreement with the original tide-gauge data, which indicates that RVM approach is a promising alternative method for time series prediction and could be successfully used for daily sea level forecasts.

  11. Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.

    PubMed

    Armañanzas, Rubén; Alonso-Nanclares, Lidia; Defelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G; Defelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery.

  12. Machine Learning Approach for the Outcome Prediction of Temporal Lobe Epilepsy Surgery

    PubMed Central

    DeFelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G.; DeFelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148

  13. Improved Extreme Learning Machine based on the Sensitivity Analysis

    NASA Astrophysics Data System (ADS)

    Cui, Licheng; Zhai, Huawei; Wang, Benchao; Qu, Zengtang

    2018-03-01

    Extreme learning machine and its improved ones is weak in some points, such as computing complex, learning error and so on. After deeply analyzing, referencing the importance of hidden nodes in SVM, an novel analyzing method of the sensitivity is proposed which meets people’s cognitive habits. Based on these, an improved ELM is proposed, it could remove hidden nodes before meeting the learning error, and it can efficiently manage the number of hidden nodes, so as to improve the its performance. After comparing tests, it is better in learning time, accuracy and so on.

  14. Advanced methods in NDE using machine learning approaches

    NASA Astrophysics Data System (ADS)

    Wunderlich, Christian; Tschöpe, Constanze; Duckhorn, Frank

    2018-04-01

    Machine learning (ML) methods and algorithms have been applied recently with great success in quality control and predictive maintenance. Its goal to build new and/or leverage existing algorithms to learn from training data and give accurate predictions, or to find patterns, particularly with new and unseen similar data, fits perfectly to Non-Destructive Evaluation. The advantages of ML in NDE are obvious in such tasks as pattern recognition in acoustic signals or automated processing of images from X-ray, Ultrasonics or optical methods. Fraunhofer IKTS is using machine learning algorithms in acoustic signal analysis. The approach had been applied to such a variety of tasks in quality assessment. The principal approach is based on acoustic signal processing with a primary and secondary analysis step followed by a cognitive system to create model data. Already in the second analysis steps unsupervised learning algorithms as principal component analysis are used to simplify data structures. In the cognitive part of the software further unsupervised and supervised learning algorithms will be trained. Later the sensor signals from unknown samples can be recognized and classified automatically by the algorithms trained before. Recently the IKTS team was able to transfer the software for signal processing and pattern recognition to a small printed circuit board (PCB). Still, algorithms will be trained on an ordinary PC; however, trained algorithms run on the Digital Signal Processor and the FPGA chip. The identical approach will be used for pattern recognition in image analysis of OCT pictures. Some key requirements have to be fulfilled, however. A sufficiently large set of training data, a high signal-to-noise ratio, and an optimized and exact fixation of components are required. The automated testing can be done subsequently by the machine. By integrating the test data of many components along the value chain further optimization including lifetime and durability

  15. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-10-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.

  16. Improved Saturated Hydraulic Conductivity Pedotransfer Functions Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Araya, S. N.; Ghezzehei, T. A.

    2017-12-01

    Saturated hydraulic conductivity (Ks) is one of the fundamental hydraulic properties of soils. Its measurement, however, is cumbersome and instead pedotransfer functions (PTFs) are often used to estimate it. Despite a lot of progress over the years, generic PTFs that estimate hydraulic conductivity generally don't have a good performance. We develop significantly improved PTFs by applying state of the art machine learning techniques coupled with high-performance computing on a large database of over 20,000 soils—USKSAT and the Florida Soil Characterization databases. We compared the performance of four machine learning algorithms (k-nearest neighbors, gradient boosted model, support vector machine, and relevance vector machine) and evaluated the relative importance of several soil properties in explaining Ks. An attempt is also made to better account for soil structural properties; we evaluated the importance of variables derived from transformations of soil water retention characteristics and other soil properties. The gradient boosted models gave the best performance with root mean square errors less than 0.7 and mean errors in the order of 0.01 on a log scale of Ks [cm/h]. The effective particle size, D10, was found to be the single most important predictor. Other important predictors included percent clay, bulk density, organic carbon percent, coefficient of uniformity and values derived from water retention characteristics. Model performances were consistently better for Ks values greater than 10 cm/h. This study maximizes the extraction of information from a large database to develop generic machine learning based PTFs to estimate Ks. The study also evaluates the importance of various soil properties and their transformations in explaining Ks.

  17. Online learning control using adaptive critic designs with sparse kernel machines.

    PubMed

    Xu, Xin; Hou, Zhongsheng; Lian, Chuanqiang; He, Haibo

    2013-05-01

    In the past decade, adaptive critic designs (ACDs), including heuristic dynamic programming (HDP), dual heuristic programming (DHP), and their action-dependent ones, have been widely studied to realize online learning control of dynamical systems. However, because neural networks with manually designed features are commonly used to deal with continuous state and action spaces, the generalization capability and learning efficiency of previous ACDs still need to be improved. In this paper, a novel framework of ACDs with sparse kernel machines is presented by integrating kernel methods into the critic of ACDs. To improve the generalization capability as well as the computational efficiency of kernel machines, a sparsification method based on the approximately linear dependence analysis is used. Using the sparse kernel machines, two kernel-based ACD algorithms, that is, kernel HDP (KHDP) and kernel DHP (KDHP), are proposed and their performance is analyzed both theoretically and empirically. Because of the representation learning and generalization capability of sparse kernel machines, KHDP and KDHP can obtain much better performance than previous HDP and DHP with manually designed neural networks. Simulation and experimental results of two nonlinear control problems, that is, a continuous-action inverted pendulum problem and a ball and plate control problem, demonstrate the effectiveness of the proposed kernel ACD methods.

  18. Using machine learning to model dose-response relationships.

    PubMed

    Linden, Ariel; Yarnold, Paul R; Nallamothu, Brahmajee K

    2016-12-01

    Establishing the relationship between various doses of an exposure and a response variable is integral to many studies in health care. Linear parametric models, widely used for estimating dose-response relationships, have several limitations. This paper employs the optimal discriminant analysis (ODA) machine-learning algorithm to determine the degree to which exposure dose can be distinguished based on the distribution of the response variable. By framing the dose-response relationship as a classification problem, machine learning can provide the same functionality as conventional models, but can additionally make individual-level predictions, which may be helpful in practical applications like establishing responsiveness to prescribed drug regimens. Using data from a study measuring the responses of blood flow in the forearm to the intra-arterial administration of isoproterenol (separately for 9 black and 13 white men, and pooled), we compare the results estimated from a generalized estimating equations (GEE) model with those estimated using ODA. Generalized estimating equations and ODA both identified many statistically significant dose-response relationships, separately by race and for pooled data. Post hoc comparisons between doses indicated ODA (based on exact P values) was consistently more conservative than GEE (based on estimated P values). Compared with ODA, GEE produced twice as many instances of paradoxical confounding (findings from analysis of pooled data that are inconsistent with findings from analyses stratified by race). Given its unique advantages and greater analytic flexibility, maximum-accuracy machine-learning methods like ODA should be considered as the primary analytic approach in dose-response applications. © 2016 John Wiley & Sons, Ltd.

  19. Fiber tractography using machine learning.

    PubMed

    Neher, Peter F; Côté, Marc-Alexandre; Houde, Jean-Christophe; Descoteaux, Maxime; Maier-Hein, Klaus H

    2017-09-01

    We present a fiber tractography approach based on a random forest classification and voting process, guiding each step of the streamline progression by directly processing raw diffusion-weighted signal intensities. For comparison to the state-of-the-art, i.e. tractography pipelines that rely on mathematical modeling, we performed a quantitative and qualitative evaluation with multiple phantom and in vivo experiments, including a comparison to the 96 submissions of the ISMRM tractography challenge 2015. The results demonstrate the vast potential of machine learning for fiber tractography. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Classification of older adults with/without a fall history using machine learning methods.

    PubMed

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.

  1. Nonlinear machine learning in soft materials engineering and design

    NASA Astrophysics Data System (ADS)

    Ferguson, Andrew

    The inherently many-body nature of molecular folding and colloidal self-assembly makes it challenging to identify the underlying collective mechanisms and pathways governing system behavior, and has hindered rational design of soft materials with desired structure and function. Fundamentally, there exists a predictive gulf between the architecture and chemistry of individual molecules or colloids and the collective many-body thermodynamics and kinetics. Integrating machine learning techniques with statistical thermodynamics provides a means to bridge this divide and identify emergent folding pathways and self-assembly mechanisms from computer simulations or experimental particle tracking data. We will survey a few of our applications of this framework that illustrate the value of nonlinear machine learning in understanding and engineering soft materials: the non-equilibrium self-assembly of Janus colloids into pinwheels, clusters, and archipelagos; engineering reconfigurable ''digital colloids'' as a novel high-density information storage substrate; probing hierarchically self-assembling onjugated asphaltenes in crude oil; and determining macromolecular folding funnels from measurements of single experimental observables. We close with an outlook on the future of machine learning in soft materials engineering, and share some personal perspectives on working at this disciplinary intersection. We acknowledge support for this work from a National Science Foundation CAREER Award (Grant No. DMR-1350008) and the Donors of the American Chemical Society Petroleum Research Fund (ACS PRF #54240-DNI6).

  2. Signature Verification Using N-tuple Learning Machine.

    PubMed

    Maneechot, Thanin; Kitjaidure, Yuttana

    2005-01-01

    This research presents new algorithm for signature verification using N-tuple learning machine. The features are taken from handwritten signature on Digital Tablet (On-line). This research develops recognition algorithm using four features extraction, namely horizontal and vertical pen tip position(x-y position), pen tip pressure, and pen altitude angles. Verification uses N-tuple technique with Gaussian thresholding.

  3. Using Machine Learning in Adversarial Environments.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Warren Leon Davis

    Intrusion/anomaly detection systems are among the first lines of cyber defense. Commonly, they either use signatures or machine learning (ML) to identify threats, but fail to account for sophisticated attackers trying to circumvent them. We propose to embed machine learning within a game theoretic framework that performs adversarial modeling, develops methods for optimizing operational response based on ML, and integrates the resulting optimization codebase into the existing ML infrastructure developed by the Hybrid LDRD. Our approach addresses three key shortcomings of ML in adversarial settings: 1) resulting classifiers are typically deterministic and, therefore, easy to reverse engineer; 2) ML approachesmore » only address the prediction problem, but do not prescribe how one should operationalize predictions, nor account for operational costs and constraints; and 3) ML approaches do not model attackers’ response and can be circumvented by sophisticated adversaries. The principal novelty of our approach is to construct an optimization framework that blends ML, operational considerations, and a model predicting attackers reaction, with the goal of computing optimal moving target defense. One important challenge is to construct a realistic model of an adversary that is tractable, yet realistic. We aim to advance the science of attacker modeling by considering game-theoretic methods, and by engaging experimental subjects with red teaming experience in trying to actively circumvent an intrusion detection system, and learning a predictive model of such circumvention activities. In addition, we will generate metrics to test that a particular model of an adversary is consistent with available data.« less

  4. 2014 Bio-Acoustics Data Challenge for the International Community on Machine Learning and Bioacoustics

    DTIC Science & Technology

    2014-09-30

    This ONR grant promotes the development and application of advanced machine learning techniques for detection and classification of marine mammal...sounds. The objective is to engage a broad community of data scientists in the development and application of advanced machine learning techniques for detection and classification of marine mammal sounds.

  5. Fall classification by machine learning using mobile phones.

    PubMed

    Albert, Mark V; Kording, Konrad; Herrmann, Megan; Jayaraman, Arun

    2012-01-01

    Fall prevention is a critical component of health care; falls are a common source of injury in the elderly and are associated with significant levels of mortality and morbidity. Automatically detecting falls can allow rapid response to potential emergencies; in addition, knowing the cause or manner of a fall can be beneficial for prevention studies or a more tailored emergency response. The purpose of this study is to demonstrate techniques to not only reliably detect a fall but also to automatically classify the type. We asked 15 subjects to simulate four different types of falls-left and right lateral, forward trips, and backward slips-while wearing mobile phones and previously validated, dedicated accelerometers. Nine subjects also wore the devices for ten days, to provide data for comparison with the simulated falls. We applied five machine learning classifiers to a large time-series feature set to detect falls. Support vector machines and regularized logistic regression were able to identify a fall with 98% accuracy and classify the type of fall with 99% accuracy. This work demonstrates how current machine learning approaches can simplify data collection for prevention in fall-related research as well as improve rapid response to potential injuries due to falls.

  6. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach

    PubMed Central

    Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets–Maximum Unbiased Validation Dataset–which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6. PMID:29652912

  7. Machine Learning for Social Services: A Study of Prenatal Case Management in Illinois.

    PubMed

    Pan, Ian; Nolan, Laura B; Brown, Rashida R; Khan, Romana; van der Boor, Paul; Harris, Daniel G; Ghani, Rayid

    2017-06-01

    To evaluate the positive predictive value of machine learning algorithms for early assessment of adverse birth risk among pregnant women as a means of improving the allocation of social services. We used administrative data for 6457 women collected by the Illinois Department of Human Services from July 2014 to May 2015 to develop a machine learning model for adverse birth prediction and improve upon the existing paper-based risk assessment. We compared different models and determined the strongest predictors of adverse birth outcomes using positive predictive value as the metric for selection. Machine learning algorithms performed similarly, outperforming the current paper-based risk assessment by up to 36%; a refined paper-based assessment outperformed the current assessment by up to 22%. We estimate that these improvements will allow 100 to 170 additional high-risk pregnant women screened for program eligibility each year to receive services that would have otherwise been unobtainable. Our analysis exhibits the potential for machine learning to move government agencies toward a more data-informed approach to evaluating risk and providing social services. Overall, such efforts will improve the efficiency of allocating resource-intensive interventions.

  8. NMF-Based Image Quality Assessment Using Extreme Learning Machine.

    PubMed

    Wang, Shuigen; Deng, Chenwei; Lin, Weisi; Huang, Guang-Bin; Zhao, Baojun

    2017-01-01

    Numerous state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage process: distortion description followed by distortion effects pooling. As for the first stage, the distortion descriptors or measurements are expected to be effective representatives of human visual variations, while the second stage should well express the relationship among quality descriptors and the perceptual visual quality. However, most of the existing quality descriptors (e.g., luminance, contrast, and gradient) do not seem to be consistent with human perception, and the effects pooling is often done in ad-hoc ways. In this paper, we propose a novel full-reference IQA metric. It applies non-negative matrix factorization (NMF) to measure image degradations by making use of the parts-based representation of NMF. On the other hand, a new machine learning technique [extreme learning machine (ELM)] is employed to address the limitations of the existing pooling techniques. Compared with neural networks and support vector regression, ELM can achieve higher learning accuracy with faster learning speed. Extensive experimental results demonstrate that the proposed metric has better performance and lower computational complexity in comparison with the relevant state-of-the-art approaches.

  9. Modeling Electronic Quantum Transport with Machine Learning

    DOE PAGES

    Lopez Bezanilla, Alejandro; von Lilienfeld Toal, Otto A.

    2014-06-11

    We present a machine learning approach to solve electronic quantum transport equations of one-dimensional nanostructures. The transmission coefficients of disordered systems were computed to provide training and test data sets to the machine. The system’s representation encodes energetic as well as geometrical information to characterize similarities between disordered configurations, while the Euclidean norm is used as a measure of similarity. Errors for out-of-sample predictions systematically decrease with training set size, enabling the accurate and fast prediction of new transmission coefficients. The remarkable performance of our model to capture the complexity of interference phenomena lends further support to its viability inmore » dealing with transport problems of undulatory nature.« less

  10. Machine learning and next-generation asteroid surveys

    NASA Astrophysics Data System (ADS)

    Nugent, Carrie R.; Dailey, John; Cutri, Roc M.; Masci, Frank J.; Mainzer, Amy K.

    2017-10-01

    Next-generation surveys such as NEOCam (Mainzer et al., 2016) will sift through tens of millions of point source detections daily to detect and discover asteroids. This requires new, more efficient techniques to distinguish between solar system objects, background stars and galaxies, and artifacts such as cosmic rays, scattered light and diffraction spikes.Supervised machine learning is a set of algorithms that allows computers to classify data on a training set, and then apply that classification to make predictions on new datasets. It has been employed by a broad range of fields, including computer vision, medical diagnoses, economics, and natural language processing. It has also been applied to astronomical datasets, including transient identification in the Palomar Transient Factory pipeline (Masci et al., 2016), and in the Pan-STARRS1 difference imaging (D. E. Wright et al., 2015).As part of the NEOCam extended phase A work we apply machine learning techniques to the problem of asteroid detection. Asteroid detection is an ideal application of supervised learning, as there is a wealth of metrics associated with each extracted source, and suitable training sets are easily created. Using the vetted NEOWISE dataset (E. L. Wright et al., 2010, Mainzer et al., 2011) as a proof-of-concept of this technique, we applied the python package sklearn. We report on reliability, feature set selection, and the suitability of various algorithms.

  11. A general-purpose machine learning framework for predicting properties of inorganic materials

    DOE PAGES

    Ward, Logan; Agrawal, Ankit; Choudhary, Alok; ...

    2016-08-26

    A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method formore » partitioning the data set into groups of similar materials to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.« less

  12. A general-purpose machine learning framework for predicting properties of inorganic materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ward, Logan; Agrawal, Ankit; Choudhary, Alok

    A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method formore » partitioning the data set into groups of similar materials to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.« less

  13. A Machine Learns to Predict the Stability of Tightly Packed Planetary Systems

    NASA Astrophysics Data System (ADS)

    Tamayo, Daniel; Silburt, Ari; Valencia, Diana; Menou, Kristen; Ali-Dib, Mohamad; Petrovich, Cristobal; Huang, Chelsea X.; Rein, Hanno; van Laerhoven, Christa; Paradise, Adiv; Obertas, Alysa; Murray, Norman

    2016-12-01

    The requirement that planetary systems be dynamically stable is often used to vet new discoveries or set limits on unconstrained masses or orbital elements. This is typically carried out via computationally expensive N-body simulations. We show that characterizing the complicated and multi-dimensional stability boundary of tightly packed systems is amenable to machine-learning methods. We find that training an XGBoost machine-learning algorithm on physically motivated features yields an accurate classifier of stability in packed systems. On the stability timescale investigated (107 orbits), it is three orders of magnitude faster than direct N-body simulations. Optimized machine-learning classifiers for dynamical stability may thus prove useful across the discipline, e.g., to characterize the exoplanet sample discovered by the upcoming Transiting Exoplanet Survey Satellite. This proof of concept motivates investing computational resources to train algorithms capable of predicting stability over longer timescales and over broader regions of phase space.

  14. The influence of negative training set size on machine learning-based virtual screening.

    PubMed

    Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J

    2014-01-01

    The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.

  15. The influence of negative training set size on machine learning-based virtual screening

    PubMed Central

    2014-01-01

    Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening. PMID:24976867

  16. Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification.

    PubMed

    Lu, Huijuan; Wei, Shasha; Zhou, Zili; Miao, Yanzi; Lu, Yi

    2015-01-01

    The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.

  17. Machine-learned and codified synthesis parameters of oxide materials

    NASA Astrophysics Data System (ADS)

    Kim, Edward; Huang, Kevin; Tomala, Alex; Matthews, Sara; Strubell, Emma; Saunders, Adam; McCallum, Andrew; Olivetti, Elsa

    2017-09-01

    Predictive materials design has rapidly accelerated in recent years with the advent of large-scale resources, such as materials structure and property databases generated by ab initio computations. In the absence of analogous ab initio frameworks for materials synthesis, high-throughput and machine learning techniques have recently been harnessed to generate synthesis strategies for select materials of interest. Still, a community-accessible, autonomously-compiled synthesis planning resource which spans across materials systems has not yet been developed. In this work, we present a collection of aggregated synthesis parameters computed using the text contained within over 640,000 journal articles using state-of-the-art natural language processing and machine learning techniques. We provide a dataset of synthesis parameters, compiled autonomously across 30 different oxide systems, in a format optimized for planning novel syntheses of materials.

  18. Mass Estimation and Its Applications

    DTIC Science & Technology

    2012-02-23

    parameters); e.g., the rect- angular kernel function has fixed width or fixed per unit size. But the rectangular function used in mass has no parameter...MassTER is implemented in JAVA , and we use DBSCAN in WEKA [13] and a version of DENCLUE implemented in R (www.r-project.org) in our empirical evaluation...Proceedings of SIGKDD, 2010, 989-998. [13] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations

  19. Classification of AB O 3 perovskite solids: a machine learning study

    DOE PAGES

    Pilania, G.; Balachandran, P. V.; Gubernatis, J. E.; ...

    2015-07-23

    Here we explored the use of machine learning methods for classifying whether a particularABO 3chemistry forms a perovskite or non-perovskite structured solid. Starting with three sets of feature pairs (the tolerance and octahedral factors, theAandBionic radii relative to the radius of O, and the bond valence distances between theAandBions from the O atoms), we used machine learning to create a hyper-dimensional partial dependency structure plot using all three feature pairs or any two of them. Doing so increased the accuracy of our predictions by 2–3 percentage points over using any one pair. We also included the Mendeleev numbers of theAandBatomsmore » to this set of feature pairs. Moreover, doing this and using the capabilities of our machine learning algorithm, the gradient tree boosting classifier, enabled us to generate a new type of structure plot that has the simplicity of one based on using just the Mendeleev numbers, but with the added advantages of having a higher accuracy and providing a measure of likelihood of the predicted structure.« less

  20. Optimisation and evaluation of hyperspectral imaging system using machine learning algorithm

    NASA Astrophysics Data System (ADS)

    Suthar, Gajendra; Huang, Jung Y.; Chidangil, Santhosh

    2017-10-01

    Hyperspectral imaging (HSI), also called imaging spectrometer, originated from remote sensing. Hyperspectral imaging is an emerging imaging modality for medical applications, especially in disease diagnosis and image-guided surgery. HSI acquires a three-dimensional dataset called hypercube, with two spatial dimensions and one spectral dimension. Spatially resolved spectral imaging obtained by HSI provides diagnostic information about the objects physiology, morphology, and composition. The present work involves testing and evaluating the performance of the hyperspectral imaging system. The methodology involved manually taking reflectance of the object in many images or scan of the object. The object used for the evaluation of the system was cabbage and tomato. The data is further converted to the required format and the analysis is done using machine learning algorithm. The machine learning algorithms applied were able to distinguish between the object present in the hypercube obtain by the scan. It was concluded from the results that system was working as expected. This was observed by the different spectra obtained by using the machine-learning algorithm.

  1. Machine Learning Estimates of Natural Product Conformational Energies

    PubMed Central

    Rupp, Matthias; Bauer, Matthias R.; Wilcken, Rainer; Lange, Andreas; Reutlinger, Michael; Boeckler, Frank M.; Schneider, Gisbert

    2014-01-01

    Machine learning has been used for estimation of potential energy surfaces to speed up molecular dynamics simulations of small systems. We demonstrate that this approach is feasible for significantly larger, structurally complex molecules, taking the natural product Archazolid A, a potent inhibitor of vacuolar-type ATPase, from the myxobacterium Archangium gephyra as an example. Our model estimates energies of new conformations by exploiting information from previous calculations via Gaussian process regression. Predictive variance is used to assess whether a conformation is in the interpolation region, allowing a controlled trade-off between prediction accuracy and computational speed-up. For energies of relaxed conformations at the density functional level of theory (implicit solvent, DFT/BLYP-disp3/def2-TZVP), mean absolute errors of less than 1 kcal/mol were achieved. The study demonstrates that predictive machine learning models can be developed for structurally complex, pharmaceutically relevant compounds, potentially enabling considerable speed-ups in simulations of larger molecular structures. PMID:24453952

  2. Data-driven advice for applying machine learning to bioinformatics problems

    PubMed Central

    Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.

    2017-01-01

    As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881

  3. Energy landscapes for a machine learning application to series data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ballard, Andrew J.; Stevenson, Jacob D.; Das, Ritankar

    2016-03-28

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in termsmore » of distributions of local minima and their properties.« less

  4. Inverse analysis of turbidites by machine learning

    NASA Astrophysics Data System (ADS)

    Naruse, H.; Nakao, K.

    2017-12-01

    This study aims to propose a method to estimate paleo-hydraulic conditions of turbidity currents from ancient turbidites by using machine-learning technique. In this method, numerical simulation was repeated under various initial conditions, which produces a data set of characteristic features of turbidites. Then, this data set of turbidites is used for supervised training of a deep-learning neural network (NN). Quantities of characteristic features of turbidites in the training data set are given to input nodes of NN, and output nodes are expected to provide the estimates of initial condition of the turbidity current. The optimization of weight coefficients of NN is then conducted to reduce root-mean-square of the difference between the true conditions and the output values of NN. The empirical relationship with numerical results and the initial conditions is explored in this method, and the discovered relationship is used for inversion of turbidity currents. This machine learning can potentially produce NN that estimates paleo-hydraulic conditions from data of ancient turbidites. We produced a preliminary implementation of this methodology. A forward model based on 1D shallow-water equations with a correction of density-stratification effect was employed. This model calculates a behavior of a surge-like turbidity current transporting mixed-size sediment, and outputs spatial distribution of volume per unit area of each grain-size class on the uniform slope. Grain-size distribution was discretized 3 classes. Numerical simulation was repeated 1000 times, and thus 1000 beds of turbidites were used as the training data for NN that has 21000 input nodes and 5 output nodes with two hidden-layers. After the machine learning finished, independent simulations were conducted 200 times in order to evaluate the performance of NN. As a result of this test, the initial conditions of validation data were successfully reconstructed by NN. The estimated values show very small

  5. Global Bathymetry: Machine Learning for Data Editing

    NASA Astrophysics Data System (ADS)

    Sandwell, D. T.; Tea, B.; Freund, Y.

    2017-12-01

    The accuracy of global bathymetry depends primarily on the coverage and accuracy of the sounding data and secondarily on the depth predicted from gravity. A main focus of our research is to add newly-available data to the global compilation. Most data sources have 1-12% of erroneous soundings caused by a wide array of blunders and measurement errors. Over the years we have hand-edited this data using undergraduate employees at UCSD (440 million soundings at 500 m resolution). We are developing a machine learning approach to refine the flagging of the older soundings and provide automated editing of newly-acquired soundings. The approach has three main steps: 1) Combine the sounding data with additional information that may inform the machine learning algorithm. The additional parameters include: depth predicted from gravity; distance to the nearest sounding from other cruises; seafloor age; spreading rate; sediment thickness; and vertical gravity gradient. 2) Use available edit decisions as training data sets for a boosted tree algorithm with a binary logistic objective function and L2 regularization. Initial results with poor quality single beam soundings show that the automated algorithm matches the hand-edited data 89% of the time. The results show that most of the information for detecting outliers comes from predicted depth with secondary contributions from distance to the nearest sounding and longitude. A similar analysis using very high quality multibeam data shows that the automated algorithm matches the hand-edited data 93% of the time. Again, most of the information for detecting outliers comes from predicted depth secondary contributions from distance to the nearest sounding and longitude. 3) The third step in the process is to use the machine learning parameters, derived from the training data, to edit 12 million newly acquired single beam sounding data provided by the National Geospatial-Intelligence Agency. The output of the learning algorithm will be

  6. The Value Simulation-Based Learning Added to Machining Technology in Singapore

    ERIC Educational Resources Information Center

    Fang, Linda; Tan, Hock Soon; Thwin, Mya Mya; Tan, Kim Cheng; Koh, Caroline

    2011-01-01

    This study seeks to understand the value simulation-based learning (SBL) added to the learning of Machining Technology in a 15-week core subject course offered to university students. The research questions were: (1) How did SBL enhance classroom learning? (2) How did SBL help participants in their test? (3) How did SBL prepare participants for…

  7. Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model

    NASA Astrophysics Data System (ADS)

    Pathak, Jaideep; Wikner, Alexander; Fussell, Rebeckah; Chandra, Sarthak; Hunt, Brian R.; Girvan, Michelle; Ott, Edward

    2018-04-01

    A model-based approach to forecasting chaotic dynamical systems utilizes knowledge of the mechanistic processes governing the dynamics to build an approximate mathematical model of the system. In contrast, machine learning techniques have demonstrated promising results for forecasting chaotic systems purely from past time series measurements of system state variables (training data), without prior knowledge of the system dynamics. The motivation for this paper is the potential of machine learning for filling in the gaps in our underlying mechanistic knowledge that cause widely-used knowledge-based models to be inaccurate. Thus, we here propose a general method that leverages the advantages of these two approaches by combining a knowledge-based model and a machine learning technique to build a hybrid forecasting scheme. Potential applications for such an approach are numerous (e.g., improving weather forecasting). We demonstrate and test the utility of this approach using a particular illustrative version of a machine learning known as reservoir computing, and we apply the resulting hybrid forecaster to a low-dimensional chaotic system, as well as to a high-dimensional spatiotemporal chaotic system. These tests yield extremely promising results in that our hybrid technique is able to accurately predict for a much longer period of time than either its machine-learning component or its model-based component alone.

  8. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

    PubMed Central

    Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.

    2017-01-01

    Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868

  9. A machine learning approach to computer-aided molecular design

    NASA Astrophysics Data System (ADS)

    Bolis, Giorgio; Di Pace, Luigi; Fabrocini, Filippo

    1991-12-01

    Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one — the specialization step — the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase — the generalization step — the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process.

  10. Machine Learning to Discover and Optimize Materials

    NASA Astrophysics Data System (ADS)

    Rosenbrock, Conrad Waldhar

    For centuries, scientists have dreamed of creating materials by design. Rather than discovery by accident, bespoke materials could be tailored to fulfill specific technological needs. Quantum theory and computational methods are essentially equal to the task, and computational power is the new bottleneck. Machine learning has the potential to solve that problem by approximating material behavior at multiple length scales. A full end-to-end solution must allow us to approximate the quantum mechanics, microstructure and engineering tasks well enough to be predictive in the real world. In this dissertation, I present algorithms and methodology to address some of these problems at various length scales. In the realm of enumeration, systems with many degrees of freedom such as high-entropy alloys may contain prohibitively many unique possibilities so that enumerating all of them would exhaust available compute memory. One possible way to address this problem is to know in advance how many possibilities there are so that the user can reduce their search space by restricting the occupation of certain lattice sites. Although tools to calculate this number were available, none performed well for very large systems and none could easily be integrated into low-level languages for use in existing scientific codes. I present an algorithm to solve these problems. Testing the robustness of machine-learned models is an essential component in any materials discovery or optimization application. While it is customary to perform a small number of system-specific tests to validate an approach, this may be insufficient in many cases. In particular, for Cluster Expansion models, the expansion may not converge quickly enough to be useful and reliable. Although the method has been used for decades, a rigorous investigation across many systems to determine when CE "breaks" was still lacking. This dissertation includes this investigation along with heuristics that use only a small training

  11. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction

    PubMed Central

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  12. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    PubMed

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  13. Machine Learning for Flood Prediction in Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.

    2015-12-01

    With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.

  14. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning.

    PubMed

    Sutphin, George L; Mahoney, J Matthew; Sheppard, Keith; Walton, David O; Korstanje, Ron

    2016-11-01

    The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.

  15. Parsing learning in networks using brain-machine interfaces.

    PubMed

    Orsborn, Amy L; Pesaran, Bijan

    2017-10-01

    Brain-machine interfaces (BMIs) define new ways to interact with our environment and hold great promise for clinical therapies. Motor BMIs, for instance, re-route neural activity to control movements of a new effector and could restore movement to people with paralysis. Increasing experience shows that interfacing with the brain inevitably changes the brain. BMIs engage and depend on a wide array of innate learning mechanisms to produce meaningful behavior. BMIs precisely define the information streams into and out of the brain, but engage wide-spread learning. We take a network perspective and review existing observations of learning in motor BMIs to show that BMIs engage multiple learning mechanisms distributed across neural networks. Recent studies demonstrate the advantages of BMI for parsing this learning and its underlying neural mechanisms. BMIs therefore provide a powerful tool for studying the neural mechanisms of learning that highlights the critical role of learning in engineered neural therapies. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. The impact of machine learning techniques in the study of bipolar disorder: A systematic review.

    PubMed

    Librenza-Garcia, Diego; Kotzian, Bruno Jaskulski; Yang, Jessica; Mwangi, Benson; Cao, Bo; Pereira Lima, Luiza Nunes; Bermudez, Mariane Bagatin; Boeira, Manuela Vianna; Kapczinski, Flávio; Passos, Ives Cavalcante

    2017-09-01

    Machine learning techniques provide new methods to predict diagnosis and clinical outcomes at an individual level. We aim to review the existing literature on the use of machine learning techniques in the assessment of subjects with bipolar disorder. We systematically searched PubMed, Embase and Web of Science for articles published in any language up to January 2017. We found 757 abstracts and included 51 studies in our review. Most of the included studies used multiple levels of biological data to distinguish the diagnosis of bipolar disorder from other psychiatric disorders or healthy controls. We also found studies that assessed the prediction of clinical outcomes and studies using unsupervised machine learning to build more consistent clinical phenotypes of bipolar disorder. We concluded that given the clinical heterogeneity of samples of patients with BD, machine learning techniques may provide clinicians and researchers with important insights in fields such as diagnosis, personalized treatment and prognosis orientation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Triangular Quantum Loop Topography for Machine Learning

    NASA Astrophysics Data System (ADS)

    Zhang, Yi; Kim, Eun-Ah

    Despite rapidly growing interest in harnessing machine learning in the study of quantum many-body systems there has been little success in training neural networks to identify topological phases. The key challenge is in efficiently extracting essential information from the many-body Hamiltonian or wave function and turning the information into an image that can be fed into a neural network. When targeting topological phases, this task becomes particularly challenging as topological phases are defined in terms of non-local properties. Here we introduce triangular quantum loop (TQL) topography: a procedure of constructing a multi-dimensional image from the ''sample'' Hamiltonian or wave function using two-point functions that form triangles. Feeding the TQL topography to a fully-connected neural network with a single hidden layer, we demonstrate that the architecture can be effectively trained to distinguish Chern insulator and fractional Chern insulator from trivial insulators with high fidelity. Given the versatility of the TQL topography procedure that can handle different lattice geometries, disorder, interaction and even degeneracy our work paves the route towards powerful applications of machine learning in the study of topological quantum matters.

  19. Galaxy morphology - An unsupervised machine learning approach

    NASA Astrophysics Data System (ADS)

    Schutter, A.; Shamir, L.

    2015-09-01

    Structural properties poses valuable information about the formation and evolution of galaxies, and are important for understanding the past, present, and future universe. Here we use unsupervised machine learning methodology to analyze a network of similarities between galaxy morphological types, and automatically deduce a morphological sequence of galaxies. Application of the method to the EFIGI catalog show that the morphological scheme produced by the algorithm is largely in agreement with the De Vaucouleurs system, demonstrating the ability of computer vision and machine learning methods to automatically profile galaxy morphological sequences. The unsupervised analysis method is based on comprehensive computer vision techniques that compute the visual similarities between the different morphological types. Rather than relying on human cognition, the proposed system deduces the similarities between sets of galaxy images in an automatic manner, and is therefore not limited by the number of galaxies being analyzed. The source code of the method is publicly available, and the protocol of the experiment is included in the paper so that the experiment can be replicated, and the method can be used to analyze user-defined datasets of galaxy images.

  20. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

    PubMed

    Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

    2014-07-01

    Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Comparative Analysis of Automatic Exudate Detection between Machine Learning and Traditional Approaches

    NASA Astrophysics Data System (ADS)

    Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Thomas

    To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.

  2. e-Learning Application for Machine Maintenance Process using Iterative Method in XYZ Company

    NASA Astrophysics Data System (ADS)

    Nurunisa, Suaidah; Kurniawati, Amelia; Pramuditya Soesanto, Rayinda; Yunan Kurnia Septo Hediyanto, Umar

    2016-02-01

    XYZ Company is a company based on manufacturing part for airplane, one of the machine that is categorized as key facility in the company is Millac 5H6P. As a key facility, the machines should be assured to work well and in peak condition, therefore, maintenance process is needed periodically. From the data gathering, it is known that there are lack of competency from the maintenance staff to maintain different type of machine which is not assigned by the supervisor, this indicate that knowledge which possessed by maintenance staff are uneven. The purpose of this research is to create knowledge-based e-learning application as a realization from externalization process in knowledge transfer process to maintain the machine. The application feature are adjusted for maintenance purpose using e-learning framework for maintenance process, the content of the application support multimedia for learning purpose. QFD is used in this research to understand the needs from user. The application is built using moodle with iterative method for software development cycle and UML Diagram. The result from this research is e-learning application as sharing knowledge media for maintenance staff in the company. From the test, it is known that the application make maintenance staff easy to understand the competencies.

  3. Automatic vetting of planet candidates from ground based surveys: Machine learning with NGTS

    NASA Astrophysics Data System (ADS)

    Armstrong, David J.; Günther, Maximilian N.; McCormac, James; Smith, Alexis M. S.; Bayliss, Daniel; Bouchy, François; Burleigh, Matthew R.; Casewell, Sarah; Eigmüller, Philipp; Gillen, Edward; Goad, Michael R.; Hodgkin, Simon T.; Jenkins, James S.; Louden, Tom; Metrailler, Lionel; Pollacco, Don; Poppenhaeger, Katja; Queloz, Didier; Raynard, Liam; Rauer, Heike; Udry, Stéphane; Walker, Simon R.; Watson, Christopher A.; West, Richard G.; Wheatley, Peter J.

    2018-05-01

    State of the art exoplanet transit surveys are producing ever increasing quantities of data. To make the best use of this resource, in detecting interesting planetary systems or in determining accurate planetary population statistics, requires new automated methods. Here we describe a machine learning algorithm that forms an integral part of the pipeline for the NGTS transit survey, demonstrating the efficacy of machine learning in selecting planetary candidates from multi-night ground based survey data. Our method uses a combination of random forests and self-organising-maps to rank planetary candidates, achieving an AUC score of 97.6% in ranking 12368 injected planets against 27496 false positives in the NGTS data. We build on past examples by using injected transit signals to form a training set, a necessary development for applying similar methods to upcoming surveys. We also make the autovet code used to implement the algorithm publicly accessible. autovet is designed to perform machine learned vetting of planetary candidates, and can utilise a variety of methods. The apparent robustness of machine learning techniques, whether on space-based or the qualitatively different ground-based data, highlights their importance to future surveys such as TESS and PLATO and the need to better understand their advantages and pitfalls in an exoplanetary context.

  4. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach

    PubMed Central

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-01-01

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202

  5. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach.

    PubMed

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-06-19

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.

  6. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach.

    PubMed

    Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Al-Dhief, Fahad Taha; Sammour, Mahmoud A M

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

  7. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach

    PubMed Central

    Tiun, Sabrina; AL-Dhief, Fahad Taha; Sammour, Mahmoud A. M.

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%. PMID:29672546

  8. Machine Learning Technologies and Their Applications for Science and Engineering Domains Workshop -- Summary Report

    NASA Technical Reports Server (NTRS)

    Ambur, Manjula; Schwartz, Katherine G.; Mavris, Dimitri N.

    2016-01-01

    The fields of machine learning and big data analytics have made significant advances in recent years, which has created an environment where cross-fertilization of methods and collaborations can achieve previously unattainable outcomes. The Comprehensive Digital Transformation (CDT) Machine Learning and Big Data Analytics team planned a workshop at NASA Langley in August 2016 to unite leading experts the field of machine learning and NASA scientists and engineers. The primary goal for this workshop was to assess the state-of-the-art in this field, introduce these leading experts to the aerospace and science subject matter experts, and develop opportunities for collaboration. The workshop was held over a three day-period with lectures from 15 leading experts followed by significant interactive discussions. This report provides an overview of the 15 invited lectures and a summary of the key discussion topics that arose during both formal and informal discussion sections. Four key workshop themes were identified after the closure of the workshop and are also highlighted in the report. Furthermore, several workshop attendees provided their feedback on how they are already utilizing machine learning algorithms to advance their research, new methods they learned about during the workshop, and collaboration opportunities they identified during the workshop.

  9. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  10. Utilizing Machine Learning for Analysis of Tiara for Texas

    NASA Astrophysics Data System (ADS)

    van Slycke, Jacqueline; Christian, Greg, , Dr.

    2017-09-01

    The Tiara for Texas detector at Texas A&M University consists of a target chamber housing an array of silicon detectors and surrounded by four high purity germanium clovers that generate voltage pulses proportional to detected gamma ray energies. While some radiation is fully absorbed in one photopeak, others undergo Compton scattering between detectors. This process is thoroughly simulated in GEANT4. Machine learning with scikit-learn allows for the reconstruction of scattered photons to the original energy of the incident gamma ray. In a given simulation, a defined number of rays are emitted from the source. Each ray is marked as an event and its path is tracked. Scikit-learn uses the events' paths to train an algorithm, which recognizes which events should be summed to reconstruct the full gamma ray energy and additional events to test the algorithm. These predictions are not exact, but were analyzed to further understand any discrepancies and increase the effectiveness of the simulation. The results from this research project compare various machine learning techniques to determine which methods should be expanded on in the future. National Science Foundation Grant PHY-1659847 and United States Department of Energy Grant DE-FG02-93ER40773.

  11. Machine learning approaches to the social determinants of health in the health and retirement study.

    PubMed

    Seligman, Benjamin; Tuljapurkar, Shripad; Rehkopf, David

    2018-04-01

    Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study. A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks. All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data ( R 2 between 0.4-0.6, versus <0.3 for all others). Across machine learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender. Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.

  12. Applying Machine Learning to Star Cluster Classification

    NASA Astrophysics Data System (ADS)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  13. Classification of sodium MRI data of cartilage using machine learning.

    PubMed

    Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R

    2015-11-01

    To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.

  14. Implementation of machine learning for high-volume manufacturing metrology challenges (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Timoney, Padraig; Kagalwala, Taher; Reis, Edward; Lazkani, Houssam; Hurley, Jonathan; Liu, Haibo; Kang, Charles; Isbester, Paul; Yellai, Naren; Shifrin, Michael; Etzioni, Yoav

    2018-03-01

    In recent years, the combination of device scaling, complex 3D device architecture and tightening process tolerances have strained the capabilities of optical metrology tools to meet process needs. Two main categories of approaches have been taken to address the evolving process needs. In the first category, new hardware configurations are developed to provide more spectral sensitivity. Most of this category of work will enable next generation optical metrology tools to try to maintain pace with next generation process needs. In the second category, new innovative algorithms have been pursued to increase the value of the existing measurement signal. These algorithms aim to boost sensitivity to the measurement parameter of interest, while reducing the impact of other factors that contribute to signal variability but are not influenced by the process of interest. This paper will evaluate the suitability of machine learning to address high volume manufacturing metrology requirements in both front end of line (FEOL) and back end of line (BEOL) sectors from advanced technology nodes. In the FEOL sector, initial feasibility has been demonstrated to predict the fin CD values from an inline measurement using machine learning. In this study, OCD spectra were acquired after an etch process that occurs earlier in the process flow than where the inline CD is measured. The fin hard mask etch process is known to impact the downstream inline CD value. Figure 1 shows the correlation of predicted CD vs downstream inline CD measurement obtained after the training of the machine learning algorithm. For BEOL, machine learning is shown to provide an additional source of information in prediction of electrical resistance from structures that are not compatible for direct copper height measurement. Figure 2 compares the trench height correlation to electrical resistance (Rs) and the correlation of predicted Rs to the e-test Rs value for a far back end of line (FBEOL) metallization level

  15. Machine learning for autonomous crystal structure identification.

    PubMed

    Reinhart, Wesley F; Long, Andrew W; Howard, Michael P; Ferguson, Andrew L; Panagiotopoulos, Athanassios Z

    2017-07-21

    We present a machine learning technique to discover and distinguish relevant ordered structures from molecular simulation snapshots or particle tracking data. Unlike other popular methods for structural identification, our technique requires no a priori description of the target structures. Instead, we use nonlinear manifold learning to infer structural relationships between particles according to the topology of their local environment. This graph-based approach yields unbiased structural information which allows us to quantify the crystalline character of particles near defects, grain boundaries, and interfaces. We demonstrate the method by classifying particles in a simulation of colloidal crystallization, and show that our method identifies structural features that are missed by standard techniques.

  16. Efficient Prediction of Low-Visibility Events at Airports Using Machine-Learning Regression

    NASA Astrophysics Data System (ADS)

    Cornejo-Bueno, L.; Casanova-Mateo, C.; Sanz-Justo, J.; Cerro-Prada, E.; Salcedo-Sanz, S.

    2017-11-01

    We address the prediction of low-visibility events at airports using machine-learning regression. The proposed model successfully forecasts low-visibility events in terms of the runway visual range at the airport, with the use of support-vector regression, neural networks (multi-layer perceptrons and extreme-learning machines) and Gaussian-process algorithms. We assess the performance of these algorithms based on real data collected at the Valladolid airport, Spain. We also propose a study of the atmospheric variables measured at a nearby tower related to low-visibility atmospheric conditions, since they are considered as the inputs of the different regressors. A pre-processing procedure of these input variables with wavelet transforms is also described. The results show that the proposed machine-learning algorithms are able to predict low-visibility events well. The Gaussian process is the best algorithm among those analyzed, obtaining over 98% of the correct classification rate in low-visibility events when the runway visual range is {>}1000 m, and about 80% under this threshold. The performance of all the machine-learning algorithms tested is clearly affected in extreme low-visibility conditions ({<}500 m). However, we show improved results of all the methods when data from a neighbouring meteorological tower are included, and also with a pre-processing scheme using a wavelet transform. Also presented are results of the algorithm performance in daytime and nighttime conditions, and for different prediction time horizons.

  17. Exploration of Machine Learning Approaches to Predict Pavement Performance

    DOT National Transportation Integrated Search

    2018-03-23

    Machine learning (ML) techniques were used to model and predict pavement condition index (PCI) for various pavement types using a variety of input variables. The primary objective of this research was to develop and assess PCI predictive models for t...

  18. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    PubMed

    S K, Somasundaram; P, Alli

    2017-11-09

    The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection

  19. Simplify and Accelerate Earth Science Data Preparation to Systemize Machine Learning

    NASA Astrophysics Data System (ADS)

    Kuo, K. S.; Rilee, M. L.; Oloso, A.

    2017-12-01

    Data preparation is the most laborious and time-consuming part of machine learning. The effort required is usually more than linearly proportional to the varieties of data used. From a system science viewpoint, useful machine learning in Earth Science likely involves diverse datasets. Thus, simplifying data preparation to ease the systemization of machine learning in Earth Science is of immense value. The technologies we have developed and applied to an array database, SciDB, are explicitly designed for the purpose, including the innovative SpatioTemporal Adaptive-Resolution Encoding (STARE), a remapping tool suite, and an efficient implementation of connected component labeling (CCL). STARE serves as a universal Earth data representation that homogenizes data varieties and facilitates spatiotemporal data placement as well as alignment, to maximize query performance on massively parallel, distributed computing resources for a major class of analysis. Moreover, it converts spatiotemporal set operations into fast and efficient integer interval operations, supporting in turn moving-object analysis. Integrative analysis requires more than overlapping spatiotemporal sets. For example, meaningful comparison of temperature fields obtained with different means and resolutions requires their transformation to the same grid. Therefore, remapping has been implemented to enable integrative analysis. Finally, Earth Science investigations are generally studies of phenomena, e.g. tropical cyclone, atmospheric river, and blizzard, through their associated events, like hurricanes Katrina and Sandy. Unfortunately, except for a few high-impact phenomena, comprehensive episodic records are lacking. Consequently, we have implemented an efficient CCL tracking algorithm, enabling event-based investigations within climate data records beyond mere event presence. In summary, we have implemented the core unifying capabilities on a Big Data technology to enable systematic machine learning in

  20. Solving a Higgs optimization problem with quantum annealing for machine learning.

    PubMed

    Mott, Alex; Job, Joshua; Vlimant, Jean-Roch; Lidar, Daniel; Spiropulu, Maria

    2017-10-18

    The discovery of Higgs-boson decays in a background of standard-model processes was assisted by machine learning methods. The classifiers used to separate signals such as these from background are trained using highly unerring but not completely perfect simulations of the physical processes involved, often resulting in incorrect labelling of background processes or signals (label noise) and systematic errors. Here we use quantum and classical annealing (probabilistic techniques for approximating the global maximum or minimum of a given function) to solve a Higgs-signal-versus-background machine learning optimization problem, mapped to a problem of finding the ground state of a corresponding Ising spin model. We build a set of weak classifiers based on the kinematic observables of the Higgs decay photons, which we then use to construct a strong classifier. This strong classifier is highly resilient against overtraining and against errors in the correlations of the physical observables in the training data. We show that the resulting quantum and classical annealing-based classifier systems perform comparably to the state-of-the-art machine learning methods that are currently used in particle physics. However, in contrast to these methods, the annealing-based classifiers are simple functions of directly interpretable experimental parameters with clear physical meaning. The annealer-trained classifiers use the excited states in the vicinity of the ground state and demonstrate some advantage over traditional machine learning methods for small training datasets. Given the relative simplicity of the algorithm and its robustness to error, this technique may find application in other areas of experimental particle physics, such as real-time decision making in event-selection problems and classification in neutrino physics.

  1. Solving a Higgs optimization problem with quantum annealing for machine learning

    NASA Astrophysics Data System (ADS)

    Mott, Alex; Job, Joshua; Vlimant, Jean-Roch; Lidar, Daniel; Spiropulu, Maria

    2017-10-01

    The discovery of Higgs-boson decays in a background of standard-model processes was assisted by machine learning methods. The classifiers used to separate signals such as these from background are trained using highly unerring but not completely perfect simulations of the physical processes involved, often resulting in incorrect labelling of background processes or signals (label noise) and systematic errors. Here we use quantum and classical annealing (probabilistic techniques for approximating the global maximum or minimum of a given function) to solve a Higgs-signal-versus-background machine learning optimization problem, mapped to a problem of finding the ground state of a corresponding Ising spin model. We build a set of weak classifiers based on the kinematic observables of the Higgs decay photons, which we then use to construct a strong classifier. This strong classifier is highly resilient against overtraining and against errors in the correlations of the physical observables in the training data. We show that the resulting quantum and classical annealing-based classifier systems perform comparably to the state-of-the-art machine learning methods that are currently used in particle physics. However, in contrast to these methods, the annealing-based classifiers are simple functions of directly interpretable experimental parameters with clear physical meaning. The annealer-trained classifiers use the excited states in the vicinity of the ground state and demonstrate some advantage over traditional machine learning methods for small training datasets. Given the relative simplicity of the algorithm and its robustness to error, this technique may find application in other areas of experimental particle physics, such as real-time decision making in event-selection problems and classification in neutrino physics.

  2. Imaging and machine learning techniques for diagnosis of Alzheimer's disease.

    PubMed

    Mirzaei, Golrokh; Adeli, Anahita; Adeli, Hojjat

    2016-12-01

    Alzheimer's disease (AD) is a common health problem in elderly people. There has been considerable research toward the diagnosis and early detection of this disease in the past decade. The sensitivity of biomarkers and the accuracy of the detection techniques have been defined to be the key to an accurate diagnosis. This paper presents a state-of-the-art review of the research performed on the diagnosis of AD based on imaging and machine learning techniques. Different segmentation and machine learning techniques used for the diagnosis of AD are reviewed including thresholding, supervised and unsupervised learning, probabilistic techniques, Atlas-based approaches, and fusion of different image modalities. More recent and powerful classification techniques such as the enhanced probabilistic neural network of Ahmadlou and Adeli should be investigated with the goal of improving the diagnosis accuracy. A combination of different image modalities can help improve the diagnosis accuracy rate. Research is needed on the combination of modalities to discover multi-modal biomarkers.

  3. Towards Machine Learning of Motor Skills

    NASA Astrophysics Data System (ADS)

    Peters, Jan; Schaal, Stefan; Schölkopf, Bernhard

    Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

  4. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    PubMed

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  5. Chemically intuited, large-scale screening of MOFs by machine learning techniques

    NASA Astrophysics Data System (ADS)

    Borboudakis, Giorgos; Stergiannakos, Taxiarchis; Frysali, Maria; Klontzas, Emmanuel; Tsamardinos, Ioannis; Froudakis, George E.

    2017-10-01

    A novel computational methodology for large-scale screening of MOFs is applied to gas storage with the use of machine learning technologies. This approach is a promising trade-off between the accuracy of ab initio methods and the speed of classical approaches, strategically combined with chemical intuition. The results demonstrate that the chemical properties of MOFs are indeed predictable (stochastically, not deterministically) using machine learning methods and automated analysis protocols, with the accuracy of predictions increasing with sample size. Our initial results indicate that this methodology is promising to apply not only to gas storage in MOFs but in many other material science projects.

  6. Elicitation of neurological knowledge with argument-based machine learning.

    PubMed

    Groznik, Vida; Guid, Matej; Sadikov, Aleksander; Možina, Martin; Georgiev, Dejan; Kragelj, Veronika; Ribarič, Samo; Pirtošek, Zvezdan; Bratko, Ivan

    2013-02-01

    The paper describes the use of expert's knowledge in practice and the efficiency of a recently developed technique called argument-based machine learning (ABML) in the knowledge elicitation process. We are developing a neurological decision support system to help the neurologists differentiate between three types of tremors: Parkinsonian, essential, and mixed tremor (comorbidity). The system is intended to act as a second opinion for the neurologists, and most importantly to help them reduce the number of patients in the "gray area" that require a very costly further examination (DaTSCAN). We strive to elicit comprehensible and medically meaningful knowledge in such a way that it does not come at the cost of diagnostic accuracy. To alleviate the difficult problem of knowledge elicitation from data and domain experts, we used ABML. ABML guides the expert to explain critical special cases which cannot be handled automatically by machine learning. This very efficiently reduces the expert's workload, and combines expert's knowledge with learning data. 122 patients were enrolled into the study. The classification accuracy of the final model was 91%. Equally important, the initial and the final models were also evaluated for their comprehensibility by the neurologists. All 13 rules of the final model were deemed as appropriate to be able to support its decisions with good explanations. The paper demonstrates ABML's advantage in combining machine learning and expert knowledge. The accuracy of the system is very high with respect to the current state-of-the-art in clinical practice, and the system's knowledge base is assessed to be very consistent from a medical point of view. This opens up the possibility to use the system also as a teaching tool. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. Nonlinear programming for classification problems in machine learning

    NASA Astrophysics Data System (ADS)

    Astorino, Annabella; Fuduli, Antonio; Gaudioso, Manlio

    2016-10-01

    We survey some nonlinear models for classification problems arising in machine learning. In the last years this field has become more and more relevant due to a lot of practical applications, such as text and web classification, object recognition in machine vision, gene expression profile analysis, DNA and protein analysis, medical diagnosis, customer profiling etc. Classification deals with separation of sets by means of appropriate separation surfaces, which is generally obtained by solving a numerical optimization model. While linear separability is the basis of the most popular approach to classification, the Support Vector Machine (SVM), in the recent years using nonlinear separating surfaces has received some attention. The objective of this work is to recall some of such proposals, mainly in terms of the numerical optimization models. In particular we tackle the polyhedral, ellipsoidal, spherical and conical separation approaches and, for some of them, we also consider the semisupervised versions.

  8. Stochastic subset selection for learning with kernel machines.

    PubMed

    Rhinelander, Jason; Liu, Xiaoping P

    2012-06-01

    Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.

  9. Mechanistic models versus machine learning, a fight worth fighting for the biological community?

    PubMed

    Baker, Ruth E; Peña, Jose-Maria; Jayamohan, Jayaratnam; Jérusalem, Antoine

    2018-05-01

    Ninety per cent of the world's data have been generated in the last 5 years ( Machine learning: the power and promise of computers that learn by example Report no. DES4702. Issued April 2017. Royal Society). A small fraction of these data is collected with the aim of validating specific hypotheses. These studies are led by the development of mechanistic models focused on the causality of input-output relationships. However, the vast majority is aimed at supporting statistical or correlation studies that bypass the need for causality and focus exclusively on prediction. Along these lines, there has been a vast increase in the use of machine learning models, in particular in the biomedical and clinical sciences, to try and keep pace with the rate of data generation. Recent successes now beg the question of whether mechanistic models are still relevant in this area. Said otherwise, why should we try to understand the mechanisms of disease progression when we can use machine learning tools to directly predict disease outcome? © 2018 The Author(s).

  10. A Sustainable Model for Integrating Current Topics in Machine Learning Research into the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Georgiopoulos, M.; DeMara, R. F.; Gonzalez, A. J.; Wu, A. S.; Mollaghasemi, M.; Gelenbe, E.; Kysilka, M.; Secretan, J.; Sharma, C. A.; Alnsour, A. J.

    2009-01-01

    This paper presents an integrated research and teaching model that has resulted from an NSF-funded effort to introduce results of current Machine Learning research into the engineering and computer science curriculum at the University of Central Florida (UCF). While in-depth exposure to current topics in Machine Learning has traditionally occurred…

  11. Machine learning methods for the classification of gliomas: Initial results using features extracted from MR spectroscopy.

    PubMed

    Ranjith, G; Parvathy, R; Vikas, V; Chandrasekharan, Kesavadas; Nair, Suresh

    2015-04-01

    With the advent of new imaging modalities, radiologists are faced with handling increasing volumes of data for diagnosis and treatment planning. The use of automated and intelligent systems is becoming essential in such a scenario. Machine learning, a branch of artificial intelligence, is increasingly being used in medical image analysis applications such as image segmentation, registration and computer-aided diagnosis and detection. Histopathological analysis is currently the gold standard for classification of brain tumors. The use of machine learning algorithms along with extraction of relevant features from magnetic resonance imaging (MRI) holds promise of replacing conventional invasive methods of tumor classification. The aim of the study is to classify gliomas into benign and malignant types using MRI data. Retrospective data from 28 patients who were diagnosed with glioma were used for the analysis. WHO Grade II (low-grade astrocytoma) was classified as benign while Grade III (anaplastic astrocytoma) and Grade IV (glioblastoma multiforme) were classified as malignant. Features were extracted from MR spectroscopy. The classification was done using four machine learning algorithms: multilayer perceptrons, support vector machine, random forest and locally weighted learning. Three of the four machine learning algorithms gave an area under ROC curve in excess of 0.80. Random forest gave the best performance in terms of AUC (0.911) while sensitivity was best for locally weighted learning (86.1%). The performance of different machine learning algorithms in the classification of gliomas is promising. An even better performance may be expected by integrating features extracted from other MR sequences. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  12. Prediction of activity type in preschool children using machine learning techniques.

    PubMed

    Hagenbuchner, Markus; Cliff, Dylan P; Trost, Stewart G; Van Tuc, Nguyen; Peoples, Gregory E

    2015-07-01

    Recent research has shown that machine learning techniques can accurately predict activity classes from accelerometer data in adolescents and adults. The purpose of this study is to develop and test machine learning models for predicting activity type in preschool-aged children. Participants completed 12 standardised activity trials (TV, reading, tablet game, quiet play, art, treasure hunt, cleaning up, active game, obstacle course, bicycle riding) over two laboratory visits. Eleven children aged 3-6 years (mean age=4.8±0.87; 55% girls) completed the activity trials while wearing an ActiGraph GT3X+ accelerometer on the right hip. Activities were categorised into five activity classes: sedentary activities, light activities, moderate to vigorous activities, walking, and running. A standard feed-forward Artificial Neural Network and a Deep Learning Ensemble Network were trained on features in the accelerometer data used in previous investigations (10th, 25th, 50th, 75th and 90th percentiles and the lag-one autocorrelation). Overall recognition accuracy for the standard feed forward Artificial Neural Network was 69.7%. Recognition accuracy for sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running was 82%, 79%, 64%, 36% and 46%, respectively. In comparison, overall recognition accuracy for the Deep Learning Ensemble Network was 82.6%. For sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running recognition accuracy was 84%, 91%, 79%, 73% and 73%, respectively. Ensemble machine learning approaches such as Deep Learning Ensemble Network can accurately predict activity type from accelerometer data in preschool children. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  13. Performances of Machine Learning Algorithms for Binary Classification of Network Anomaly Detection System

    NASA Astrophysics Data System (ADS)

    Nawir, Mukrimah; Amir, Amiza; Lynn, Ong Bi; Yaakob, Naimah; Badlishah Ahmad, R.

    2018-05-01

    The rapid growth of technologies might endanger them to various network attacks due to the nature of data which are frequently exchange their data through Internet and large-scale data that need to be handle. Moreover, network anomaly detection using machine learning faced difficulty when dealing the involvement of dataset where the number of labelled network dataset is very few in public and this caused many researchers keep used the most commonly network dataset (KDDCup99) which is not relevant to employ the machine learning (ML) algorithms for a classification. Several issues regarding these available labelled network datasets are discussed in this paper. The aim of this paper to build a network anomaly detection system using machine learning algorithms that are efficient, effective and fast processing. The finding showed that AODE algorithm is performed well in term of accuracy and processing time for binary classification towards UNSW-NB15 dataset.

  14. Machine Learning Principles Can Improve Hip Fracture Prediction.

    PubMed

    Kruse, Christian; Eiken, Pia; Vestergaard, Peter

    2017-04-01

    Apply machine learning principles to predict hip fractures and estimate predictor importance in Dual-energy X-ray absorptiometry (DXA)-scanned men and women. Dual-energy X-ray absorptiometry data from two Danish regions between 1996 and 2006 were combined with national Danish patient data to comprise 4722 women and 717 men with 5 years of follow-up time (original cohort n = 6606 men and women). Twenty-four statistical models were built on 75% of data points through k-5, 5-repeat cross-validation, and then validated on the remaining 25% of data points to calculate area under the curve (AUC) and calibrate probability estimates. The best models were retrained with restricted predictor subsets to estimate the best subsets. For women, bootstrap aggregated flexible discriminant analysis ("bagFDA") performed best with a test AUC of 0.92 [0.89; 0.94] and well-calibrated probabilities following Naïve Bayes adjustments. A "bagFDA" model limited to 11 predictors (among them bone mineral densities (BMD), biochemical glucose measurements, general practitioner and dentist use) achieved a test AUC of 0.91 [0.88; 0.93]. For men, eXtreme Gradient Boosting ("xgbTree") performed best with a test AUC of 0.89 [0.82; 0.95], but with poor calibration in higher probabilities. A ten predictor subset (BMD, biochemical cholesterol and liver function tests, penicillin use and osteoarthritis diagnoses) achieved a test AUC of 0.86 [0.78; 0.94] using an "xgbTree" model. Machine learning can improve hip fracture prediction beyond logistic regression using ensemble models. Compiling data from international cohorts of longer follow-up and performing similar machine learning procedures has the potential to further improve discrimination and calibration.

  15. Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy.

    PubMed

    Asadi, Hamed; Dowling, Richard; Yan, Bernard; Mitchell, Peter

    2014-01-01

    Stroke is a major cause of death and disability. Accurately predicting stroke outcome from a set of predictive variables may identify high-risk patients and guide treatment approaches, leading to decreased morbidity. Logistic regression models allow for the identification and validation of predictive variables. However, advanced machine learning algorithms offer an alternative, in particular, for large-scale multi-institutional data, with the advantage of easily incorporating newly available data to improve prediction performance. Our aim was to design and compare different machine learning methods, capable of predicting the outcome of endovascular intervention in acute anterior circulation ischaemic stroke. We conducted a retrospective study of a prospectively collected database of acute ischaemic stroke treated by endovascular intervention. Using SPSS®, MATLAB®, and Rapidminer®, classical statistics as well as artificial neural network and support vector algorithms were applied to design a supervised machine capable of classifying these predictors into potential good and poor outcomes. These algorithms were trained, validated and tested using randomly divided data. We included 107 consecutive acute anterior circulation ischaemic stroke patients treated by endovascular technique. Sixty-six were male and the mean age of 65.3. All the available demographic, procedural and clinical factors were included into the models. The final confusion matrix of the neural network, demonstrated an overall congruency of ∼ 80% between the target and output classes, with favourable receiving operative characteristics. However, after optimisation, the support vector machine had a relatively better performance, with a root mean squared error of 2.064 (SD: ± 0.408). We showed promising accuracy of outcome prediction, using supervised machine learning algorithms, with potential for incorporation of larger multicenter datasets, likely further improving prediction. Finally, we

  16. Towards large-scale FAME-based bacterial species identification using machine learning techniques.

    PubMed

    Slabbinck, Bram; De Baets, Bernard; Dawyndt, Peter; De Vos, Paul

    2009-05-01

    In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species

  17. Acquiring Software Design Schemas: A Machine Learning Perspective

    NASA Technical Reports Server (NTRS)

    Harandi, Mehdi T.; Lee, Hing-Yan

    1991-01-01

    In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.

  18. Prototype Vector Machine for Large Scale Semi-Supervised Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Kai; Kwok, James T.; Parvin, Bahram

    2009-04-29

    Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of themore » kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.« less

  19. Machine learning based Intelligent cognitive network using fog computing

    NASA Astrophysics Data System (ADS)

    Lu, Jingyang; Li, Lun; Chen, Genshe; Shen, Dan; Pham, Khanh; Blasch, Erik

    2017-05-01

    In this paper, a Cognitive Radio Network (CRN) based on artificial intelligence is proposed to distribute the limited radio spectrum resources more efficiently. The CRN framework can analyze the time-sensitive signal data close to the signal source using fog computing with different types of machine learning techniques. Depending on the computational capabilities of the fog nodes, different features and machine learning techniques are chosen to optimize spectrum allocation. Also, the computing nodes send the periodic signal summary which is much smaller than the original signal to the cloud so that the overall system spectrum source allocation strategies are dynamically updated. Applying fog computing, the system is more adaptive to the local environment and robust to spectrum changes. As most of the signal data is processed at the fog level, it further strengthens the system security by reducing the communication burden of the communications network.

  20. Leveraging knowledge engineering and machine learning for microbial bio-manufacturing.

    PubMed

    Oyetunde, Tolutola; Bao, Forrest Sheng; Chen, Jiung-Wen; Martin, Hector Garcia; Tang, Yinjie J

    2018-05-03

    Genome scale modeling (GSM) predicts the performance of microbial workhorses and helps identify beneficial gene targets. GSM integrated with intracellular flux dynamics, omics, and thermodynamics have shown remarkable progress in both elucidating complex cellular phenomena and computational strain design (CSD). Nonetheless, these models still show high uncertainty due to a poor understanding of innate pathway regulations, metabolic burdens, and other factors (such as stress tolerance and metabolite channeling). Besides, the engineered hosts may have genetic mutations or non-genetic variations in bioreactor conditions and thus CSD rarely foresees fermentation rate and titer. Metabolic models play important role in design-build-test-learn cycles for strain improvement, and machine learning (ML) may provide a viable complementary approach for driving strain design and deciphering cellular processes. In order to develop quality ML models, knowledge engineering leverages and standardizes the wealth of information in literature (e.g., genomic/phenomic data, synthetic biology strategies, and bioprocess variables). Data driven frameworks can offer new constraints for mechanistic models to describe cellular regulations, to design pathways, to search gene targets, and to estimate fermentation titer/rate/yield under specified growth conditions (e.g., mixing, nutrients, and O 2 ). This review highlights the scope of information collections, database constructions, and machine learning techniques (such as deep learning and transfer learning), which may facilitate "Learn and Design" for strain development. Copyright © 2018. Published by Elsevier Inc.

  1. Common component classification: what can we learn from machine learning?

    PubMed

    Anderson, Ariana; Labus, Jennifer S; Vianna, Eduardo P; Mayer, Emeran A; Cohen, Mark S

    2011-05-15

    Machine learning methods have been applied to classifying fMRI scans by studying locations in the brain that exhibit temporal intensity variation between groups, frequently reporting classification accuracy of 90% or better. Although empirical results are quite favorable, one might doubt the ability of classification methods to withstand changes in task ordering and the reproducibility of activation patterns over runs, and question how much of the classification machines' power is due to artifactual noise versus genuine neurological signal. To examine the true strength and power of machine learning classifiers we create and then deconstruct a classifier to examine its sensitivity to physiological noise, task reordering, and across-scan classification ability. The models are trained and tested both within and across runs to assess stability and reproducibility across conditions. We demonstrate the use of independent components analysis for both feature extraction and artifact removal and show that removal of such artifacts can reduce predictive accuracy even when data has been cleaned in the preprocessing stages. We demonstrate how mistakes in the feature selection process can cause the cross-validation error seen in publication to be a biased estimate of the testing error seen in practice and measure this bias by purposefully making flawed models. We discuss other ways to introduce bias and the statistical assumptions lying behind the data and model themselves. Finally we discuss the complications in drawing inference from the smaller sample sizes typically seen in fMRI studies, the effects of small or unbalanced samples on the Type 1 and Type 2 error rates, and how publication bias can give a false confidence of the power of such methods. Collectively this work identifies challenges specific to fMRI classification and methods affecting the stability of models. Copyright © 2010 Elsevier Inc. All rights reserved.

  2. Conditional High-Order Boltzmann Machines for Supervised Relation Learning.

    PubMed

    Huang, Yan; Wang, Wei; Wang, Liang; Tan, Tieniu

    2017-09-01

    Relation learning is a fundamental problem in many vision tasks. Recently, high-order Boltzmann machine and its variants have shown their great potentials in learning various types of data relation in a range of tasks. But most of these models are learned in an unsupervised way, i.e., without using relation class labels, which are not very discriminative for some challenging tasks, e.g., face verification. In this paper, with the goal to perform supervised relation learning, we introduce relation class labels into conventional high-order multiplicative interactions with pairwise input samples, and propose a conditional high-order Boltzmann Machine (CHBM), which can learn to classify the data relation in a binary classification way. To be able to deal with more complex data relation, we develop two improved variants of CHBM: 1) latent CHBM, which jointly performs relation feature learning and classification, by using a set of latent variables to block the pathway from pairwise input samples to output relation labels and 2) gated CHBM, which untangles factors of variation in data relation, by exploiting a set of latent variables to multiplicatively gate the classification of CHBM. To reduce the large number of model parameters generated by the multiplicative interactions, we approximately factorize high-order parameter tensors into multiple matrices. Then, we develop efficient supervised learning algorithms, by first pretraining the models using joint likelihood to provide good parameter initialization, and then finetuning them using conditional likelihood to enhance the discriminant ability. We apply the proposed models to a series of tasks including invariant recognition, face verification, and action similarity labeling. Experimental results demonstrate that by exploiting supervised relation labels, our models can greatly improve the performance.

  3. Classifying black and white spruce pollen using layered machine learning.

    PubMed

    Punyasena, Surangi W; Tcheng, David K; Wesseln, Cassandra; Mueller, Pietra G

    2012-11-01

    Pollen is among the most ubiquitous of terrestrial fossils, preserving an extended record of vegetation change. However, this temporal continuity comes with a taxonomic tradeoff. Analytical methods that improve the taxonomic precision of pollen identifications would expand the research questions that could be addressed by pollen, in fields such as paleoecology, paleoclimatology, biostratigraphy, melissopalynology, and forensics. We developed a supervised, layered, instance-based machine-learning classification system that uses leave-one-out bias optimization and discriminates among small variations in pollen shape, size, and texture. We tested our system on black and white spruce, two paleoclimatically significant taxa in the North American Quaternary. We achieved > 93% grain-to-grain classification accuracies in a series of experiments with both fossil and reference material. More significantly, when applied to Quaternary samples, the learning system was able to replicate the count proportions of a human expert (R(2) = 0.78, P = 0.007), with one key difference - the machine achieved these ratios by including larger numbers of grains with low-confidence identifications. Our results demonstrate the capability of machine-learning systems to solve the most challenging palynological classification problem, the discrimination of congeneric species, extending the capabilities of the pollen analyst and improving the taxonomic resolution of the palynological record. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.

  4. Logic Learning Machine and standard supervised methods for Hodgkin's lymphoma prognosis using gene expression data and clinical variables.

    PubMed

    Parodi, Stefano; Manneschi, Chiara; Verda, Damiano; Ferrari, Enrico; Muselli, Marco

    2018-03-01

    This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin's lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin's lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms ( k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene ( XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin's lymphoma patients.

  5. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning

    PubMed Central

    Sutphin, George L.; Mahoney, J. Matthew; Sheppard, Keith; Walton, David O.; Korstanje, Ron

    2016-01-01

    The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/. PMID:27812085

  6. Machine- z: Rapid machine-learned redshift indicator for Swift gamma-ray bursts

    DOE PAGES

    Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

    2016-03-08

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce ‘machine-z’, a redshift prediction algorithm and a ‘high-z’ classifier for Swift GRBs based on machine learning. Our method relies exclusively onmore » canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ~100 per cent recall. As a result, the most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.« less

  7. Machine-z: Rapid Machine-Learned Redshift Indicator for Swift Gamma-Ray Bursts

    NASA Technical Reports Server (NTRS)

    Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

    2016-01-01

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce 'machine-z', a redshift prediction algorithm and a 'high-z' classifier for Swift GRBs based on machine learning. Our method relies exclusively on canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve approximately 100 per cent recall. The most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.

  8. CAT-PUMA: CME Arrival Time Prediction Using Machine learning Algorithms

    NASA Astrophysics Data System (ADS)

    Liu, Jiajia; Ye, Yudong; Shen, Chenglong; Wang, Yuming; Erdélyi, Robert

    2018-04-01

    CAT-PUMA (CME Arrival Time Prediction Using Machine learning Algorithms) quickly and accurately predicts the arrival of Coronal Mass Ejections (CMEs) of CME arrival time. The software was trained via detailed analysis of CME features and solar wind parameters using 182 previously observed geo-effective partial-/full-halo CMEs and uses algorithms of the Support Vector Machine (SVM) to make its predictions, which can be made within minutes of providing the necessary input parameters of a CME.

  9. Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data.

    PubMed

    Perryman, Alexander L; Stratton, Thomas P; Ekins, Sean; Freundlich, Joel S

    2016-02-01

    Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.

  10. Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

    PubMed

    Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

    2017-10-01

    Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.

  11. Mental Health Risk Adjustment with Clinical Categories and Machine Learning.

    PubMed

    Shrestha, Akritee; Bergquist, Savannah; Montz, Ellen; Rose, Sherri

    2017-12-15

    To propose nonparametric ensemble machine learning for mental health and substance use disorders (MHSUD) spending risk adjustment formulas, including considering Clinical Classification Software (CCS) categories as diagnostic covariates over the commonly used Hierarchical Condition Category (HCC) system. 2012-2013 Truven MarketScan database. We implement 21 algorithms to predict MHSUD spending, as well as a weighted combination of these algorithms called super learning. The algorithm collection included seven unique algorithms that were supplied with three differing sets of MHSUD-related predictors alongside demographic covariates: HCC, CCS, and HCC + CCS diagnostic variables. Performance was evaluated based on cross-validated R 2 and predictive ratios. Results show that super learning had the best performance based on both metrics. The top single algorithm was random forests, which improved on ordinary least squares regression by 10 percent with respect to relative efficiency. CCS categories-based formulas were generally more predictive of MHSUD spending compared to HCC-based formulas. Literature supports the potential benefit of implementing a separate MHSUD spending risk adjustment formula. Our results suggest there is an incentive to explore machine learning for MHSUD-specific risk adjustment, as well as considering CCS categories over HCCs. © Health Research and Educational Trust.

  12. Distance Metric Learning via Iterated Support Vector Machines.

    PubMed

    Zuo, Wangmeng; Wang, Faqiang; Zhang, David; Lin, Liang; Huang, Yuchi; Meng, Deyu; Zhang, Lei

    2017-07-11

    Distance metric learning aims to learn from the given training data a valid distance metric, with which the similarity between data samples can be more effectively evaluated for classification. Metric learning is often formulated as a convex or nonconvex optimization problem, while most existing methods are based on customized optimizers and become inefficient for large scale problems. In this paper, we formulate metric learning as a kernel classification problem with the positive semi-definite constraint, and solve it by iterated training of support vector machines (SVMs). The new formulation is easy to implement and efficient in training with the off-the-shelf SVM solvers. Two novel metric learning models, namely Positive-semidefinite Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the global optimality of their solutions. Experiments are conducted on general classification, face verification and person re-identification to evaluate our methods. Compared with the state-of-the-art approaches, our methods can achieve comparable classification accuracy and are efficient in training.

  13. Machine Translation-Assisted Language Learning: Writing for Beginners

    ERIC Educational Resources Information Center

    Garcia, Ignacio; Pena, Maria Isabel

    2011-01-01

    The few studies that deal with machine translation (MT) as a language learning tool focus on its use by advanced learners, never by beginners. Yet, freely available MT engines (i.e. Google Translate) and MT-related web initiatives (i.e. Gabble-on.com) position themselves to cater precisely to the needs of learners with a limited command of a…

  14. Prediction of outcome in internet-delivered cognitive behaviour therapy for paediatric obsessive-compulsive disorder: A machine learning approach.

    PubMed

    Lenhard, Fabian; Sauer, Sebastian; Andersson, Erik; Månsson, Kristoffer Nt; Mataix-Cols, David; Rück, Christian; Serlachius, Eva

    2018-03-01

    There are no consistent predictors of treatment outcome in paediatric obsessive-compulsive disorder (OCD). One reason for this might be the use of suboptimal statistical methodology. Machine learning is an approach to efficiently analyse complex data. Machine learning has been widely used within other fields, but has rarely been tested in the prediction of paediatric mental health treatment outcomes. To test four different machine learning methods in the prediction of treatment response in a sample of paediatric OCD patients who had received Internet-delivered cognitive behaviour therapy (ICBT). Participants were 61 adolescents (12-17 years) who enrolled in a randomized controlled trial and received ICBT. All clinical baseline variables were used to predict strictly defined treatment response status three months after ICBT. Four machine learning algorithms were implemented. For comparison, we also employed a traditional logistic regression approach. Multivariate logistic regression could not detect any significant predictors. In contrast, all four machine learning algorithms performed well in the prediction of treatment response, with 75 to 83% accuracy. The results suggest that machine learning algorithms can successfully be applied to predict paediatric OCD treatment outcome. Validation studies and studies in other disorders are warranted. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Protein function in precision medicine: deep understanding with machine learning.

    PubMed

    Rost, Burkhard; Radivojac, Predrag; Bromberg, Yana

    2016-08-01

    Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both. © 2016 Federation of European Biochemical Societies.

  16. The Couzens Machine. A Computerized Learning Exchange. Final Report, 1973-74.

    ERIC Educational Resources Information Center

    Davis, Ken, Comp.; Libengood, Richard, Comp.

    The Couzens Machine is a computerized learning exchange and information service developed for the residents of Couzens Hall, a dormitory at the University of Michigan. Organized as a collective within the framework of a course and supported by an instructional development grant from the Center for Research on Learning and Teaching, the Couzens…

  17. Workshop on Fielded Applications of Machine Learning Held in Amherst, Massachusetts on 30 June-1 July 1993. Abstracts.

    DTIC Science & Technology

    1993-01-01

    engineering has led to many AI systems that are now regularly used in industry and elsewhere. The ultimate test of machine learning , the subfield of Al that...applications of machine learning suggest the time was ripe for a meeting on this topic. For this reason, Pat Langley (Siemens Corporate Research) and Yves...Kodratoff (Universite de Paris, Sud) organized an invited workshop on applications of machine learning . The goal of the gathering was to familiarize

  18. An automatic taxonomy of galaxy morphology using unsupervised machine learning

    NASA Astrophysics Data System (ADS)

    Hocking, Alex; Geach, James E.; Sun, Yi; Davey, Neil

    2018-01-01

    We present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previous unsupervised machine learning approaches used in astronomy we use no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. We demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the algorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS 0416.1-2403), we show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an 'early' or 'late' type galaxy is. We then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of approximately 60 000 classifications. We show how the automatic classification groups galaxies of similar morphological (and photometric) type and make the classifications public via a catalogue, a visual catalogue and galaxy similarity search. We compare the CANDELS machine-based classifications to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping between Galaxy Zoo and our hierarchical labelling, we demonstrate a good level of concordance between human and machine classifications. Finally, we show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging.

  19. Quantum annealing versus classical machine learning applied to a simplified computational biology problem

    PubMed Central

    Li, Richard Y.; Di Felice, Rosa; Rohs, Remo; Lidar, Daniel A.

    2018-01-01

    Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of a quantum machine learning approach to predict binding specificity. Using simplified datasets of a small number of DNA sequences derived from actual binding affinity experiments, we trained a commercially available quantum annealer to classify and rank transcription factor binding. The results were compared to state-of-the-art classical approaches for the same simplified datasets, including simulated annealing, simulated quantum annealing, multiple linear regression, LASSO, and extreme gradient boosting. Despite technological limitations, we find a slight advantage in classification performance and nearly equal ranking performance using the quantum annealer for these fairly small training data sets. Thus, we propose that quantum annealing might be an effective method to implement machine learning for certain computational biology problems. PMID:29652405

  20. Quantum annealing versus classical machine learning applied to a simplified computational biology problem

    NASA Astrophysics Data System (ADS)

    Li, Richard Y.; Di Felice, Rosa; Rohs, Remo; Lidar, Daniel A.

    2018-03-01

    Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of a quantum machine learning approach to classify and rank binding affinities. Using simplified data sets of a small number of DNA sequences derived from actual binding affinity experiments, we trained a commercially available quantum annealer to classify and rank transcription factor binding. The results were compared to state-of-the-art classical approaches for the same simplified data sets, including simulated annealing, simulated quantum annealing, multiple linear regression, LASSO, and extreme gradient boosting. Despite technological limitations, we find a slight advantage in classification performance and nearly equal ranking performance using the quantum annealer for these fairly small training data sets. Thus, we propose that quantum annealing might be an effective method to implement machine learning for certain computational biology problems.

  1. Cross-platform normalization of microarray and RNA-seq data for machine learning applications

    PubMed Central

    Thompson, Jeffrey A.; Tan, Jie

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  2. Optical Coherence Tomography Machine Learning Classifiers for Glaucoma Detection: A Preliminary Study

    PubMed Central

    Burgansky-Eliash, Zvia; Wollstein, Gadi; Chu, Tianjiao; Ramsey, Joseph D.; Glymour, Clark; Noecker, Robert J.; Ishikawa, Hiroshi; Schuman, Joel S.

    2007-01-01

    Purpose Machine-learning classifiers are trained computerized systems with the ability to detect the relationship between multiple input parameters and a diagnosis. The present study investigated whether the use of machine-learning classifiers improves optical coherence tomography (OCT) glaucoma detection. Methods Forty-seven patients with glaucoma (47 eyes) and 42 healthy subjects (42 eyes) were included in this cross-sectional study. Of the glaucoma patients, 27 had early disease (visual field mean deviation [MD] ≥ −6 dB) and 20 had advanced glaucoma (MD < −6 dB). Machine-learning classifiers were trained to discriminate between glaucomatous and healthy eyes using parameters derived from OCT output. The classifiers were trained with all 38 parameters as well as with only 8 parameters that correlated best with the visual field MD. Five classifiers were tested: linear discriminant analysis, support vector machine, recursive partitioning and regression tree, generalized linear model, and generalized additive model. For the last two classifiers, a backward feature selection was used to find the minimal number of parameters that resulted in the best and most simple prediction. The cross-validated receiver operating characteristic (ROC) curve and accuracies were calculated. Results The largest area under the ROC curve (AROC) for glaucoma detection was achieved with the support vector machine using eight parameters (0.981). The sensitivity at 80% and 95% specificity was 97.9% and 92.5%, respectively. This classifier also performed best when judged by cross-validated accuracy (0.966). The best classification between early glaucoma and advanced glaucoma was obtained with the generalized additive model using only three parameters (AROC = 0.854). Conclusions Automated machine classifiers of OCT data might be useful for enhancing the utility of this technology for detecting glaucomatous abnormality. PMID:16249492

  3. Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.

    PubMed

    Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J; Bao, Forrest Sheng

    2016-04-01

    13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.

  4. Clinical quality needs complex adaptive systems and machine learning.

    PubMed

    Marsland, Stephen; Buchan, Iain

    2004-01-01

    The vast increase in clinical data has the potential to bring about large improvements in clinical quality and other aspects of healthcare delivery. However, such benefits do not come without cost. The analysis of such large datasets, particularly where the data may have to be merged from several sources and may be noisy and incomplete, is a challenging task. Furthermore, the introduction of clinical changes is a cyclical task, meaning that the processes under examination operate in an environment that is not static. We suggest that traditional methods of analysis are unsuitable for the task, and identify complexity theory and machine learning as areas that have the potential to facilitate the examination of clinical quality. By its nature the field of complex adaptive systems deals with environments that change because of the interactions that have occurred in the past. We draw parallels between health informatics and bioinformatics, which has already started to successfully use machine learning methods.

  5. A Photometric Machine-Learning Method to Infer Stellar Metallicity

    NASA Technical Reports Server (NTRS)

    Miller, Adam A.

    2015-01-01

    Following its formation, a star's metal content is one of the few factors that can significantly alter its evolution. Measurements of stellar metallicity ([Fe/H]) typically require a spectrum, but spectroscopic surveys are limited to a few x 10(exp 6) targets; photometric surveys, on the other hand, have detected > 10(exp 9) stars. I present a new machine-learning method to predict [Fe/H] from photometric colors measured by the Sloan Digital Sky Survey (SDSS). The training set consists of approx. 120,000 stars with SDSS photometry and reliable [Fe/H] measurements from the SEGUE Stellar Parameters Pipeline (SSPP). For bright stars (g' < or = 18 mag), with 4500 K < or = Teff < or = 7000 K, corresponding to those with the most reliable SSPP estimates, I find that the model predicts [Fe/H] values with a root-mean-squared-error (RMSE) of approx.0.27 dex. The RMSE from this machine-learning method is similar to the scatter in [Fe/H] measurements from low-resolution spectra..

  6. Machine-learning in grading of gliomas based on multi-parametric magnetic resonance imaging at 3T.

    PubMed

    Citak-Er, Fusun; Firat, Zeynep; Kovanlikaya, Ilhami; Ture, Ugur; Ozturk-Isik, Esin

    2018-06-15

    The objective of this study was to assess the contribution of multi-parametric (mp) magnetic resonance imaging (MRI) quantitative features in the machine learning-based grading of gliomas with a multi-region-of-interests approach. Forty-three patients who were newly diagnosed as having a glioma were included in this study. The patients were scanned prior to any therapy using a standard brain tumor magnetic resonance (MR) imaging protocol that included T1 and T2-weighted, diffusion-weighted, diffusion tensor, MR perfusion and MR spectroscopic imaging. Three different regions-of-interest were drawn for each subject to encompass tumor, immediate tumor periphery, and distant peritumoral edema/normal. The normalized mp-MRI features were used to build machine-learning models for differentiating low-grade gliomas (WHO grades I and II) from high grades (WHO grades III and IV). In order to assess the contribution of regional mp-MRI quantitative features to the classification models, a support vector machine-based recursive feature elimination method was applied prior to classification. A machine-learning model based on support vector machine algorithm with linear kernel achieved an accuracy of 93.0%, a specificity of 86.7%, and a sensitivity of 96.4% for the grading of gliomas using ten-fold cross validation based on the proposed subset of the mp-MRI features. In this study, machine-learning based on multiregional and multi-parametric MRI data has proven to be an important tool in grading glial tumors accurately even in this limited patient population. Future studies are needed to investigate the use of machine learning algorithms for brain tumor classification in a larger patient cohort. Copyright © 2018. Published by Elsevier Ltd.

  7. Use of machine-learning classifiers to predict requests for preoperative acute pain service consultation.

    PubMed

    Tighe, Patrick J; Lucas, Stephen D; Edwards, David A; Boezaart, André P; Aytug, Haldun; Bihorac, Azra

    2012-10-01

      The purpose of this project was to determine whether machine-learning classifiers could predict which patients would require a preoperative acute pain service (APS) consultation.   Retrospective cohort.   University teaching hospital.   The records of 9,860 surgical patients posted between January 1 and June 30, 2010 were reviewed.   Request for APS consultation. A cohort of machine-learning classifiers was compared according to its ability or inability to classify surgical cases as requiring a request for a preoperative APS consultation. Classifiers were then optimized utilizing ensemble techniques. Computational efficiency was measured with the central processing unit processing times required for model training. Classifiers were tested using the full feature set, as well as the reduced feature set that was optimized using a merit-based dimensional reduction strategy.   Machine-learning classifiers correctly predicted preoperative requests for APS consultations in 92.3% (95% confidence intervals [CI], 91.8-92.8) of all surgical cases. Bayesian methods yielded the highest area under the receiver operating curve (0.87, 95% CI 0.84-0.89) and lowest training times (0.0018 seconds, 95% CI, 0.0017-0.0019 for the NaiveBayesUpdateable algorithm). An ensemble of high-performing machine-learning classifiers did not yield a higher area under the receiver operating curve than its component classifiers. Dimensional reduction decreased the computational requirements for multiple classifiers, but did not adversely affect classification performance.   Using historical data, machine-learning classifiers can predict which surgical cases should prompt a preoperative request for an APS consultation. Dimensional reduction improved computational efficiency and preserved predictive performance. Wiley Periodicals, Inc.

  8. A Machine Learning System for Analyzing Human Tactics in a Game

    NASA Astrophysics Data System (ADS)

    Ito, Hirotaka; Tanaka, Toshimitsu; Sugie, Noboru

    In order to realize advanced man-machine interfaces, it is desired to develop a system that can infer the mental state of human users and then return appropriate responses. As the first step toward the above goal, we developed a system capable of inferring human tactics in a simple game played between the system and a human. We present a machine learning system that plays a color expectation game. The system infers the tactics of the opponent, and then decides the action based on the result. We employed a modified version of classifier system like XCS in order to design the system. In addition, three methods are proposed in order to accelerate the learning rate. They are a masking method, an iterative method, and tactics templates. The results of computer experiments confirmed that the proposed methods effectively accelerate the machine learning. The masking method and the iterative method are effective to a simple strategy that considers only a part of past information. However, study speed of these methods is not enough for the tactics that refers to a lot of past information. For the case, the tactics template was able to settle the study rapidly when the tactics is identified.

  9. Resident Space Object Characterization and Behavior Understanding via Machine Learning and Ontology-based Bayesian Networks

    NASA Astrophysics Data System (ADS)

    Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.

    2016-09-01

    In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.

  10. Semi-supervised prediction of gene regulatory networks using machine learning algorithms.

    PubMed

    Patel, Nihir; Wang, Jason T L

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  11. Application of machine learning techniques to lepton energy reconstruction in water Cherenkov detectors

    NASA Astrophysics Data System (ADS)

    Drakopoulou, E.; Cowan, G. A.; Needham, M. D.; Playfer, S.; Taani, M.

    2018-04-01

    The application of machine learning techniques to the reconstruction of lepton energies in water Cherenkov detectors is discussed and illustrated for TITUS, a proposed intermediate detector for the Hyper-Kamiokande experiment. It is found that applying these techniques leads to an improvement of more than 50% in the energy resolution for all lepton energies compared to an approach based upon lookup tables. Machine learning techniques can be easily applied to different detector configurations and the results are comparable to likelihood-function based techniques that are currently used.

  12. Time of Flight Estimation in the Presence of Outliers: A Biosonar-Inspired Machine Learning Approach

    DTIC Science & Technology

    2013-08-29

    REPORT Time of Flight Estimation in the Presence of Outliers: A biosonar -inspired machine learning approach 14. ABSTRACT 16. SECURITY CLASSIFICATION OF...installations, biosonar , remote sensing, sonar resolution, sonar accuracy, sonar energy consumption Nathan Intrator, Leon N Cooper Brown University...Presence of Outliers: A biosonar -inspired machine learning approach Report Title ABSTRACT When the Signal-to-Noise Ratio (SNR) falls below a certain

  13. Clinical chemistry in higher dimensions: Machine-learning and enhanced prediction from routine clinical chemistry data.

    PubMed

    Richardson, Alice; Signor, Ben M; Lidbury, Brett A; Badrick, Tony

    2016-11-01

    Big Data is having an impact on many areas of research, not the least of which is biomedical science. In this review paper, big data and machine learning are defined in terms accessible to the clinical chemistry community. Seven myths associated with machine learning and big data are then presented, with the aim of managing expectation of machine learning amongst clinical chemists. The myths are illustrated with four examples investigating the relationship between biomarkers in liver function tests, enhanced laboratory prediction of hepatitis virus infection, the relationship between bilirubin and white cell count, and the relationship between red cell distribution width and laboratory prediction of anaemia. Copyright © 2016 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  14. A Comparison Study of Machine Learning Based Algorithms for Fatigue Crack Growth Calculation.

    PubMed

    Wang, Hongxun; Zhang, Weifang; Sun, Fuqiang; Zhang, Wei

    2017-05-18

    The relationships between the fatigue crack growth rate ( d a / d N ) and stress intensity factor range ( Δ K ) are not always linear even in the Paris region. The stress ratio effects on fatigue crack growth rate are diverse in different materials. However, most existing fatigue crack growth models cannot handle these nonlinearities appropriately. The machine learning method provides a flexible approach to the modeling of fatigue crack growth because of its excellent nonlinear approximation and multivariable learning ability. In this paper, a fatigue crack growth calculation method is proposed based on three different machine learning algorithms (MLAs): extreme learning machine (ELM), radial basis function network (RBFN) and genetic algorithms optimized back propagation network (GABP). The MLA based method is validated using testing data of different materials. The three MLAs are compared with each other as well as the classical two-parameter model ( K * approach). The results show that the predictions of MLAs are superior to those of K * approach in accuracy and effectiveness, and the ELM based algorithms show overall the best agreement with the experimental data out of the three MLAs, for its global optimization and extrapolation ability.

  15. Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes.

    PubMed

    Wang, Yuanjia; Chen, Tianle; Zeng, Donglin

    2016-01-01

    Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects.

  16. A machine learning system to improve heart failure patient assistance.

    PubMed

    Guidi, Gabriele; Pettenati, Maria Chiara; Melillo, Paolo; Iadanza, Ernesto

    2014-11-01

    In this paper, we present a clinical decision support system (CDSS) for the analysis of heart failure (HF) patients, providing various outputs such as an HF severity evaluation, HF-type prediction, as well as a management interface that compares the different patients' follow-ups. The whole system is composed of a part of intelligent core and of an HF special-purpose management tool also providing the function to act as interface for the artificial intelligence training and use. To implement the smart intelligent functions, we adopted a machine learning approach. In this paper, we compare the performance of a neural network (NN), a support vector machine, a system with fuzzy rules genetically produced, and a classification and regression tree and its direct evolution, which is the random forest, in analyzing our database. Best performances in both HF severity evaluation and HF-type prediction functions are obtained by using the random forest algorithm. The management tool allows the cardiologist to populate a "supervised database" suitable for machine learning during his or her regular outpatient consultations. The idea comes from the fact that in literature there are a few databases of this type, and they are not scalable to our case.

  17. Supervised machine learning for analysing spectra of exoplanetary atmospheres

    NASA Astrophysics Data System (ADS)

    Márquez-Neila, Pablo; Fisher, Chloe; Sznitman, Raphael; Heng, Kevin

    2018-06-01

    The use of machine learning is becoming ubiquitous in astronomy1-3, but remains rare in the study of the atmospheres of exoplanets. Given the spectrum of an exoplanetary atmosphere, a multi-parameter space is swept through in real time to find the best-fit model4-6. Known as atmospheric retrieval, this technique originates in the Earth and planetary sciences7. Such methods are very time-consuming, and by necessity there is a compromise between physical and chemical realism and computational feasibility. Machine learning has previously been used to determine which molecules to include in the model, but the retrieval itself was still performed using standard methods8. Here, we report an adaptation of the `random forest' method of supervised machine learning9,10, trained on a precomputed grid of atmospheric models, which retrieves full posterior distributions of the abundances of molecules and the cloud opacity. The use of a precomputed grid allows a large part of the computational burden to be shifted offline. We demonstrate our technique on a transmission spectrum of the hot gas-giant exoplanet WASP-12b using a five-parameter model (temperature, a constant cloud opacity and the volume mixing ratios or relative abundances of molecules of water, ammonia and hydrogen cyanide)11. We obtain results consistent with the standard nested-sampling retrieval method. We also estimate the sensitivity of the measured spectrum to the model parameters, and we are able to quantify the information content of the spectrum. Our method can be straightforwardly applied using more sophisticated atmospheric models to interpret an ensemble of spectra without having to retrain the random forest.

  18. Machine learning for the New York City power grid.

    PubMed

    Rudin, Cynthia; Waltz, David; Anderson, Roger N; Boulanger, Albert; Salleb-Aouissi, Ansaf; Chow, Maggie; Dutta, Haimonti; Gross, Philip N; Huang, Bert; Ierome, Steve; Isaac, Delfina F; Kressner, Arthur; Passonneau, Rebecca J; Radeva, Axinia; Wu, Leon

    2012-02-01

    Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint, terminator, and transformer rankings, 3) feeder Mean Time Between Failure (MTBF) estimates, and 4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or realtime, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City’s electrical grid.

  19. A review on machine learning principles for multi-view biological data integration.

    PubMed

    Li, Yifeng; Wu, Fang-Xiang; Ngom, Alioune

    2018-03-01

    Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

  20. A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.

    PubMed

    Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne

    2018-05-01

    Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.