Science.gov

Sample records for advanced machine learning

  1. Machine Learning

    NASA Astrophysics Data System (ADS)

    Hoffmann, Achim; Mahidadia, Ashesh

    The purpose of this chapter is to present fundamental ideas and techniques of machine learning suitable for the field of this book, i.e., for automated scientific discovery. The chapter focuses on those symbolic machine learning methods, which produce results that are suitable to be interpreted and understood by humans. This is particularly important in the context of automated scientific discovery as the scientific theories to be produced by machines are usually meant to be interpreted by humans. This chapter contains some of the most influential ideas and concepts in machine learning research to give the reader a basic insight into the field. After the introduction in Sect. 1, general ideas of how learning problems can be framed are given in Sect. 2. The section provides useful perspectives to better understand what learning algorithms actually do. Section 3 presents the Version space model which is an early learning algorithm as well as a conceptual framework, that provides important insight into the general mechanisms behind most learning algorithms. In section 4, a family of learning algorithms, the AQ family for learning classification rules is presented. The AQ family belongs to the early approaches in machine learning. The next, Sect. 5 presents the basic principles of decision tree learners. Decision tree learners belong to the most influential class of inductive learning algorithms today. Finally, a more recent group of learning systems are presented in Sect. 6, which learn relational concepts within the framework of logic programming. This is a particularly interesting group of learning systems since the framework allows also to incorporate background knowledge which may assist in generalisation. Section 7 discusses Association Rules - a technique that comes from the related field of Data mining. Section 8 presents the basic idea of the Naive Bayesian Classifier. While this is a very popular learning technique, the learning result is not well suited for

  2. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    ERIC Educational Resources Information Center

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  3. Machine Learning in Medicine.

    PubMed

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  4. Machine Learning in Medicine.

    PubMed

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome.

  5. Inhibitors for the hepatitis C virus RNA polymerase explored by SAR with advanced machine learning methods.

    PubMed

    Weidlich, Iwona E; Filippov, Igor V; Brown, Jodian; Kaushik-Basu, Neerja; Krishnan, Ramalingam; Nicklaus, Marc C; Thorpe, Ian F

    2013-06-01

    Hepatitis C virus (HCV) is a global health challenge, affecting approximately 200 million people worldwide. In this study we developed SAR models with advanced machine learning classifiers Random Forest and k Nearest Neighbor Simulated Annealing for 679 small molecules with measured inhibition activity for NS5B genotype 1b. The activity was expressed as a binary value (active/inactive), where actives were considered molecules with IC50 ≤0.95 μM. We applied our SAR models to various drug-like databases and identified novel chemical scaffolds for NS5B inhibitors. Subsequent in vitro antiviral assays suggested a new activity for an existing prodrug, Candesartan cilexetil, which is currently used to treat hypertension and heart failure but has not been previously tested for anti-HCV activity. We also identified NS5B inhibitors with two novel non-nucleoside chemical motifs. PMID:23608107

  6. Inhibitors for the hepatitis C virus RNA polymerase explored by SAR with advanced machine learning methods

    PubMed Central

    Weidlich, Iwona E.; Filippov, Igor V.; Brown, Jodian; Kaushik-Basu, Neerja; Krishnan, Ramalingam; Nicklaus, Marc C.; Thorpe, Ian F.

    2013-01-01

    Hepatitis C virus (HCV) is a global health challenge, affecting approximately 200 million people worldwide. In this study we developed SAR models with advanced machine learning classifiers Random Forest and k Nearest Neighbor Simulated Annealing for 679 small molecules with measured inhibition activity for NS5B genotype 1b. The activity was expressed as a binary value (active/inactive), where actives were considered molecules with IC50 ≤ 0.95 μM. We applied our SAR models to various drug-like databases and identified novel chemical scaffolds for NS5B inhibitors. Subsequent in vitro antiviral assays suggested a new activity for an existing prodrug, Candesartan cilexetil, which is currently used to treat hypertension and heart failure but has not been previously tested for anti-HCV activity. We also identified NS5B inhibitors with two novel non-nucleoside chemical motifs. PMID:23608107

  7. Advances in Climate Informatics: Accelerating Discovery in Climate Science with Machine Learning

    NASA Astrophysics Data System (ADS)

    Monteleoni, C.

    2015-12-01

    Despite the scientific consensus on climate change, drastic uncertainties remain. The climate system is characterized by complex phenomena that are imperfectly observed and even more imperfectly simulated. Climate data is Big Data, yet the magnitude of data and climate model output increasingly overwhelms the tools currently used to analyze them. Computational innovation is therefore needed. Machine learning is a cutting-edge research area at the intersection of computer science and statistics, focused on developing algorithms for big data analytics. Machine learning has revolutionized scientific discovery (e.g. Bioinformatics), and spawned new technologies (e.g. Web search). The impact of machine learning on climate science promises to be similarly profound. The goal of the novel interdisciplinary field of Climate Informatics is to accelerate discovery in climate science with machine learning, in order to shed light on urgent questions about climate change. In this talk, I will survey my research group's progress in the emerging field of climate informatics. Our work includes algorithms to improve the combined predictions of the IPCC multi-model ensemble, applications to seasonal and subseasonal prediction, and a data-driven technique to detect and define extreme events.

  8. Introduction to machine learning.

    PubMed

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods. PMID:24272434

  9. Introduction to machine learning.

    PubMed

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  10. Machine learning and radiology.

    PubMed

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers.

  11. Machine Learning and Radiology

    PubMed Central

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  12. Leveraging advanced data analytics, machine learning, and metrology models to enable critical dimension metrology solutions for advanced integrated circuit nodes

    NASA Astrophysics Data System (ADS)

    Rana, Narender; Zhang, Yunlin; Kagalwala, Taher; Bailey, Todd

    2014-10-01

    the useful and more accurate CD and profile information of the structures. This paper presents the optimization of scatterometry and MBIR model calibration and the feasibility to extrapolate not only in design and process space but also from one process step to a previous process step. A well-calibrated scatterometry model or patterning simulation model can be used to accurately extrapolate and interpolate in the design and process space for lithography patterning where AFM is not capable of accurately measuring sub-40 nm trenches. The uncertainty associated with extrapolation can be large and needs to be minimized. We have made use of measurements from CD-SEM and CD-AFM, along with the patterning and scatterometry simulation models to estimate the uncertainty associated with extrapolation and the methods to reduce it. For the first time, we have reported the application of machine learning (artificial neural networks) to the resist shrinkage systematic phenomenon to accurately predict the preshrink CD based on supervised learning using the CD-AFM data. The study lays out various basic concepts, approaches, and protocols of multiple source data processing and integration for a hybrid metrology approach. Impacts of this study include more accurate metrology, patterning models, and better process controls for advanced IC nodes.

  13. Paradigms for machine learning

    NASA Technical Reports Server (NTRS)

    Schlimmer, Jeffrey C.; Langley, Pat

    1991-01-01

    Five paradigms are described for machine learning: connectionist (neural network) methods, genetic algorithms and classifier systems, empirical methods for inducing rules and decision trees, analytic learning methods, and case-based approaches. Some dimensions are considered along with these paradigms vary in their approach to learning, and the basic methods are reviewed that are used within each framework, together with open research issues. It is argued that the similarities among the paradigms are more important than their differences, and that future work should attempt to bridge the existing boundaries. Finally, some recent developments in the field of machine learning are discussed, and their impact on both research and applications is examined.

  14. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances

    PubMed Central

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance. PMID:26346869

  15. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances.

    PubMed

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance.

  16. Remediating radium contaminated legacy sites: Advances made through machine learning in routine monitoring of "hot" particles.

    PubMed

    Varley, Adam; Tyler, Andrew; Smith, Leslie; Dale, Paul; Davies, Mike

    2015-07-15

    The extensive use of radium during the 20th century for industrial, military and pharmaceutical purposes has led to a large number of contaminated legacy sites across Europe and North America. Sites that pose a high risk to the general public can present expensive and long-term remediation projects. Often the most pragmatic remediation approach is through routine monitoring operating gamma-ray detectors to identify, in real-time, the signal from the most hazardous heterogeneous contamination (hot particles); thus facilitating their removal and safe disposal. However, current detection systems do not fully utilise all spectral information resulting in low detection rates and ultimately an increased risk to the human health. The aim of this study was to establish an optimised detector-algorithm combination. To achieve this, field data was collected using two handheld detectors (sodium iodide and lanthanum bromide) and a number of Monte Carlo simulated hot particles were randomly injected into the field data. This allowed for the detection rate of conventional deterministic (gross counts) and machine learning (neural networks and support vector machines) algorithms to be assessed. The results demonstrated that a Neural Network operated on a sodium iodide detector provided the best detection capability. Compared to deterministic approaches, this optimised detection system could detect a hot particle on average 10cm deeper into the soil column or with half of the activity at the same depth. It was also found that noise presented by internal contamination restricted lanthanum bromide for this application.

  17. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  18. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery. PMID:26017444

  19. Probabilistic machine learning and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  20. Quantum-Enhanced Machine Learning

    NASA Astrophysics Data System (ADS)

    Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.

    2016-09-01

    The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

  1. Machine learning: An artificial intelligence approach. Vol. II

    SciTech Connect

    Michalski, R.S.; Carbonell, J.G.; Mitchell, T.M.

    1986-01-01

    This book reflects the expansion of machine learning research through presentation of recent advances in the field. The book provides an account of current research directions. Major topics covered include the following: learning concepts and rules from examples; cognitive aspects of learning; learning by analogy; learning by observation and discovery; and an exploration of general aspects of learning.

  2. Model-based machine learning.

    PubMed

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  3. Model-based machine learning.

    PubMed

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

  4. Web Mining: Machine Learning for Web Applications.

    ERIC Educational Resources Information Center

    Chen, Hsinchun; Chau, Michael

    2004-01-01

    Presents an overview of machine learning research and reviews methods used for evaluating machine learning systems. Ways that machine-learning algorithms were used in traditional information retrieval systems in the "pre-Web" era are described, and the field of Web mining and how machine learning has been used in different Web mining applications…

  5. Machine Shop. Student Learning Guide.

    ERIC Educational Resources Information Center

    Palm Beach County Board of Public Instruction, West Palm Beach, FL.

    This student learning guide contains eight modules for completing a course in machine shop. It is designed especially for use in Palm Beach County, Florida. Each module covers one task, and consists of a purpose, performance objective, enabling objectives, learning activities and resources, information sheets, student self-check with answer key,…

  6. Game-powered machine learning

    PubMed Central

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-01-01

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786

  7. Game-powered machine learning.

    PubMed

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.

  8. Advanced Learning

    ERIC Educational Resources Information Center

    Hijon-Neira, Raquel, Ed.

    2009-01-01

    The education industry has obviously been influenced by the Internet revolution. Teaching and learning methods have changed significantly since the coming of the Web and it is very likely they will keep evolving many years to come thanks to it. A good example of this changing reality is the spectacular development of e-Learning. In a more…

  9. Machine learning methods in chemoinformatics

    PubMed Central

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  10. Photonic Neurocomputers And Learning Machines

    NASA Astrophysics Data System (ADS)

    Farhat, Nabil H.

    1990-05-01

    The study of complex multidimensional nonlinear dynamical systems and the modeling and emulation of cognitive brain-like processing of sensory information (neural network research), including the study of chaos and its role in such systems would benefit immensely from the development of a new generation of programmable analog computers capable of carrying out collective, nonlinear and iterative computations at very high speed. The massive interconnectivity and nonlinearity needed in such analog computing structures indicate that a mix of optics and electronics mediated by judicial choice of device physics offer benefits for realizing networks with the following desirable properties: (a) large scale nets, i.e. nets with high number of decision making elements (neurons), (b) modifiable structure, i.e. ability to partition the net into any desired number of layers of prescribed size (number of neurons per layer) with any prescribed pattern of communications between them (e.g. feed forward or feedback (recurrent)), (c) programmable and/or adaptive connectivity weights between the neurons for self-organization and learning, (d) both synchroneous or asynchroneous update rules be possible, (e) high speed update i.e. neurons with lisec response time to enable rapid iteration and convergence, (f) can be used in the study and evaluation of a variety of adaptive learning algorithms, (g) can be used in rapid solution by fast simulated annealing of complex optimization problems of the kind encountered in adaptive learning, pattern recognition, and image processing. The aim of this paper is to describe recent efforts and progress made towards achieving these desirable attributes in analog photonic (optoelectronic and/or electron optical) hardware that utilizes primarily incoherent light. A specific example, hardware implementation of a stochastic Boltzmann learning machine, is used as vehicle for identifying generic issues and clarify research and development areas for further

  11. A Machine Learning System for Recognizing Subclasses (Demo)

    SciTech Connect

    Vatsavai, Raju

    2012-01-01

    Thematic information extraction from remote sensing images is a complex task. In this demonstration, we present *Miner machine learning system. In particular, we demonstrate an advanced subclass recognition algorithm that is specifically designed to extract finer classes from aggregate classes.

  12. Applications of Machine Learning in Information Retrieval.

    ERIC Educational Resources Information Center

    Cunningham, Sally Jo; Witten, Ian H.; Littin, James

    1999-01-01

    Introduces the basic ideas that underpin applications of machine learning to information retrieval. Describes applications of machine learning to text categorization. Considers how machine learning can be applied to the query-formulation process. Examines methods of document filtering, where the user specifies a query that is to be applied to an…

  13. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  14. Learning Machine Learning: A Case Study

    ERIC Educational Resources Information Center

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  15. The Higgs Machine Learning Challenge

    NASA Astrophysics Data System (ADS)

    Adam-Bourdarios, C.; Cowan, G.; Germain-Renaud, C.; Guyon, I.; Kégl, B.; Rousseau, D.

    2015-12-01

    The Higgs Machine Learning Challenge was an open data analysis competition that took place between May and September 2014. Samples of simulated data from the ATLAS Experiment at the LHC corresponding to signal events with Higgs bosons decaying to τ+τ- together with background events were made available to the public through the website of the data science organization Kaggle (kaggle.com). Participants attempted to identify the search region in a space of 30 kinematic variables that would maximize the expected discovery significance of the signal process. One of the primary goals of the Challenge was to promote communication of new ideas between the Machine Learning (ML) and HEP communities. In this regard it was a resounding success, with almost 2,000 participants from HEP, ML and other areas. The process of understanding and integrating the new ideas, particularly from ML into HEP, is currently underway.

  16. Finding New Perovskite Halides via Machine learning

    NASA Astrophysics Data System (ADS)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  17. Finding new perovskite halides via machine learning

    DOE PAGES

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-26

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach toward rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning, henceforth referred to as ML) via building a support vectormore » machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br, or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 185 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor, and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. As a result, the trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.« less

  18. Machine learning for medical images analysis.

    PubMed

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods.

  19. Application of advanced machine learning methods on resting-state fMRI network for identification of mild cognitive impairment and Alzheimer's disease.

    PubMed

    Khazaee, Ali; Ebrahimzadeh, Ata; Babajani-Feremi, Abbas

    2016-09-01

    The study of brain networks by resting-state functional magnetic resonance imaging (rs-fMRI) is a promising method for identifying patients with dementia from healthy controls (HC). Using graph theory, different aspects of the brain network can be efficiently characterized by calculating measures of integration and segregation. In this study, we combined a graph theoretical approach with advanced machine learning methods to study the brain network in 89 patients with mild cognitive impairment (MCI), 34 patients with Alzheimer's disease (AD), and 45 age-matched HC. The rs-fMRI connectivity matrix was constructed using a brain parcellation based on a 264 putative functional areas. Using the optimal features extracted from the graph measures, we were able to accurately classify three groups (i.e., HC, MCI, and AD) with accuracy of 88.4 %. We also investigated performance of our proposed method for a binary classification of a group (e.g., MCI) from two other groups (e.g., HC and AD). The classification accuracies for identifying HC from AD and MCI, AD from HC and MCI, and MCI from HC and AD, were 87.3, 97.5, and 72.0 %, respectively. In addition, results based on the parcellation of 264 regions were compared to that of the automated anatomical labeling atlas (AAL), consisted of 90 regions. The accuracy of classification of three groups using AAL was degraded to 83.2 %. Our results show that combining the graph measures with the machine learning approach, on the basis of the rs-fMRI connectivity analysis, may assist in diagnosis of AD and MCI.

  20. Introducing Machine Learning Concepts with WEKA.

    PubMed

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

  1. MLZ: Machine Learning for photo-Z

    NASA Astrophysics Data System (ADS)

    Carrasco Kind, Matias; Brunner, Robert

    2014-03-01

    The parallel Python framework MLZ (Machine Learning and photo-Z) computes fast and robust photometric redshift PDFs using Machine Learning algorithms. It uses a supervised technique with prediction trees and random forest through TPZ that can be used for a regression or a classification problem, or a unsupervised methods with self organizing maps and random atlas called SOMz. These machine learning implementations can be efficiently combined into a more powerful one resulting in robust and accurate probability distributions for photometric redshifts.

  2. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  3. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  4. Applying Machine Learning to Star Cluster Classification

    NASA Astrophysics Data System (ADS)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  5. Defect Classification Using Machine Learning

    SciTech Connect

    Carr, A; Kegelmeyer, L; Liao, Z M; Abdulla, G; Cross, D; Kegelmeyer, W P; Raviza, F; Carr, C W

    2008-10-24

    Laser-induced damage growth on the surface of fused silica optics has been extensively studied and has been found to depend on a number of factors including fluence and the surface on which the damage site resides. It has been demonstrated that damage sites as small as a few tens of microns can be detected and tracked on optics installed a fusion-class laser, however, determining the surface of an optic on which a damage site resides in situ can be a significant challenge. In this work demonstrate that a machine-learning algorithm can successfully predict the surface location of the damage site using an expanded set of characteristics for each damage site, some of which are not historically associated with growth rate.

  6. Machine learning in motion control

    NASA Technical Reports Server (NTRS)

    Su, Renjeng; Kermiche, Noureddine

    1989-01-01

    The existing methodologies for robot programming originate primarily from robotic applications to manufacturing, where uncertainties of the robots and their task environment may be minimized by repeated off-line modeling and identification. In space application of robots, however, a higher degree of automation is required for robot programming because of the desire of minimizing the human intervention. We discuss a new paradigm of robotic programming which is based on the concept of machine learning. The goal is to let robots practice tasks by themselves and the operational data are used to automatically improve their motion performance. The underlying mathematical problem is to solve the problem of dynamical inverse by iterative methods. One of the key questions is how to ensure the convergence of the iterative process. There have been a few small steps taken into this important approach to robot programming. We give a representative result on the convergence problem.

  7. Adaptive Learning Systems: Beyond Teaching Machines

    ERIC Educational Resources Information Center

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  8. Machine learning: An artificial intelligence approach

    SciTech Connect

    Michalski, R.S.; Carbonell, J.G.; Mitchell, T.M.

    1983-01-01

    This book contains tutorial overviews and research papers on contemporary trends in the area of machine learning viewed from an AI perspective. Research directions covered include: learning from examples, modeling human learning strategies, knowledge acquisition for expert systems, learning heuristics, discovery systems, and conceptual data analysis.

  9. Machine vision systems using machine learning for industrial product inspection

    NASA Astrophysics Data System (ADS)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  10. Trends in extreme learning machines: a review.

    PubMed

    Huang, Gao; Huang, Guang-Bin; Song, Shiji; You, Keyou

    2015-01-01

    Extreme learning machine (ELM) has gained increasing interest from various research fields recently. In this review, we aim to report the current state of the theoretical research and practical advances on this subject. We first give an overview of ELM from the theoretical perspective, including the interpolation theory, universal approximation capability, and generalization ability. Then we focus on the various improvements made to ELM which further improve its stability, sparsity and accuracy under general or specific conditions. Apart from classification and regression, ELM has recently been extended for clustering, feature selection, representational learning and many other learning tasks. These newly emerging algorithms greatly expand the applications of ELM. From implementation aspect, hardware implementation and parallel computation techniques have substantially sped up the training of ELM, making it feasible for big data processing and real-time reasoning. Due to its remarkable efficiency, simplicity, and impressive generalization performance, ELM have been applied in a variety of domains, such as biomedical engineering, computer vision, system identification, and control and robotics. In this review, we try to provide a comprehensive view of these advances in ELM together with its future perspectives. PMID:25462632

  11. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    PubMed

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics. PMID:27478823

  12. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology

    PubMed Central

    Ju, Ying

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics. PMID:27478823

  13. Natural Language Processing and Machine Learning (NLP/ML): Applying Advances in Biomedicine to the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Duerr, R.; Myers, S.; Palmer, M.; Jenkins, C. J.; Thessen, A.; Martin, J.

    2015-12-01

    Semantics underlie many of the tools and services available from and on the web. From improving search results to enabling data mashups and other forms of interoperability, semantic technologies have proven themselves. But creating semantic resources, especially re-usable semantic resources, is extremely time consuming and labor intensive. Why? Because it is not just a matter of technology but also of obtaining rough consensus if not full agreement amongst community members on the meaning and order of things. One way to develop these resources in a more automated way would be to use NLP/ML techniques to extract the required resources from large corpora of subject-specific text such as peer-reviewed papers where presumably a rough consensus has been achieved at least about the basics of the particular discipline involved. While not generally applied to Earth Sciences, considerable resources have been spent in other fields such as medicine on these types of techniques with some success. The NSF-funded ClearEarth project is applying the techniques developed for biomedicine to the cryosphere, geology, and biology in order to spur faster development of the semantic resources needed in these fields. The first area being addressed by the project is the cryosphere, specifically sea ice nomenclature where an existing set of sea ice ontologies are being used as the "Gold Standard" against which to test and validate the NLP/ML techniques. The processes being used, lessons learned and early results will be described.

  14. Machine learning research 1989-90

    NASA Technical Reports Server (NTRS)

    Porter, Bruce W.; Souther, Arthur

    1990-01-01

    Multifunctional knowledge bases offer a significant advance in artificial intelligence because they can support numerous expert tasks within a domain. As a result they amortize the costs of building a knowledge base over multiple expert systems and they reduce the brittleness of each system. Due to the inevitable size and complexity of multifunctional knowledge bases, their construction and maintenance require knowledge engineering and acquisition tools that can automatically identify interactions between new and existing knowledge. Furthermore, their use requires software for accessing those portions of the knowledge base that coherently answer questions. Considerable progress was made in developing software for building and accessing multifunctional knowledge bases. A language was developed for representing knowledge, along with software tools for editing and displaying knowledge, a machine learning program for integrating new information into existing knowledge, and a question answering system for accessing the knowledge base.

  15. Memristor models for machine learning.

    PubMed

    Carbajal, Juan Pablo; Dambre, Joni; Hermans, Michiel; Schrauwen, Benjamin

    2015-03-01

    In the quest for alternatives to traditional complementary metal-oxide-semiconductor, it is being suggested that digital computing efficiency and power can be improved by matching the precision to the application. Many applications do not need the high precision that is being used today. In particular, large gains in area and power efficiency could be achieved by dedicated analog realizations of approximate computing engines. In this work we explore the use of memristor networks for analog approximate computation, based on a machine learning framework called reservoir computing. Most experimental investigations on the dynamics of memristors focus on their nonvolatile behavior. Hence, the volatility that is present in the developed technologies is usually unwanted and is not included in simulation models. In contrast, in reservoir computing, volatility is not only desirable but necessary. Therefore, in this work, we propose two different ways to incorporate it into memristor simulation models. The first is an extension of Strukov's model, and the second is an equivalent Wiener model approximation. We analyze and compare the dynamical properties of these models and discuss their implications for the memory and the nonlinear processing capacity of memristor networks. Our results indicate that device variability, increasingly causing problems in traditional computer design, is an asset in the context of reservoir computing. We conclude that although both models could lead to useful memristor-based reservoir computing systems, their computational performance will differ. Therefore, experimental modeling research is required for the development of accurate volatile memristor models.

  16. Machine Learning and Cosmological Simulations

    NASA Astrophysics Data System (ADS)

    Kamdar, Harshil; Turk, Matthew; Brunner, Robert

    2016-01-01

    We explore the application of machine learning (ML) to the problem of galaxy formation and evolution in a hierarchical universe. Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively evaluating the extent of the influence of dark matter halo properties on small-scale structure formation. For our analyses, we use both semi-analytical models (Millennium simulation) and N-body + hydrodynamical simulations (Illustris simulation). The ML algorithms are trained on important dark matter halo properties (inputs) and galaxy properties (outputs). The trained models are able to robustly predict the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity. Moreover, the ML simulated galaxies obey fundamental observational constraints implying that the population of ML predicted galaxies is physically and statistically robust. Next, ML algorithms are trained on an N-body + hydrodynamical simulation and applied to an N-body only simulation (Dark Sky simulation, Illustris Dark), populating this new simulation with galaxies. We can examine how structure formation changes with different cosmological parameters and are able to mimic a full-blown hydrodynamical simulation in a computation time that is orders of magnitude smaller. We find that the set of ML simulated galaxies in Dark Sky obey the same observational constraints, further solidifying ML's place as an intriguing and promising technique in future galaxy formation studies and rapid mock galaxy catalog creation.

  17. Machine Translation-Assisted Language Learning: Writing for Beginners

    ERIC Educational Resources Information Center

    Garcia, Ignacio; Pena, Maria Isabel

    2011-01-01

    The few studies that deal with machine translation (MT) as a language learning tool focus on its use by advanced learners, never by beginners. Yet, freely available MT engines (i.e. Google Translate) and MT-related web initiatives (i.e. Gabble-on.com) position themselves to cater precisely to the needs of learners with a limited command of a…

  18. Man-machine interface requirements - advanced technology

    NASA Technical Reports Server (NTRS)

    Remington, R. W.; Wiener, E. L.

    1984-01-01

    Research issues and areas are identified where increased understanding of the human operator and the interaction between the operator and the avionics could lead to improvements in the performance of current and proposed helicopters. Both current and advanced helicopter systems and avionics are considered. Areas critical to man-machine interface requirements include: (1) artificial intelligence; (2) visual displays; (3) voice technology; (4) cockpit integration; and (5) pilot work loads and performance.

  19. Machine learning applications in genetics and genomics.

    PubMed

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets. PMID:25948244

  20. Machine learning: an indispensable tool in bioinformatics.

    PubMed

    Inza, Iñaki; Calvo, Borja; Armañanzas, Rubén; Bengoetxea, Endika; Larrañaga, Pedro; Lozano, José A

    2010-01-01

    The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process. In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community. PMID:19957143

  1. Machine learning applications in genetics and genomics.

    PubMed

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  2. An introduction to quantum machine learning

    NASA Astrophysics Data System (ADS)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  3. Machine learning for Big Data analytics in plants.

    PubMed

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences. PMID:25223304

  4. Machine learning for Big Data analytics in plants.

    PubMed

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences.

  5. Machine learning in cell biology - teaching computers to recognize phenotypes.

    PubMed

    Sommer, Christoph; Gerlich, Daniel W

    2013-12-15

    Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline. PMID:24259662

  6. Machine learning in cell biology - teaching computers to recognize phenotypes.

    PubMed

    Sommer, Christoph; Gerlich, Daniel W

    2013-12-15

    Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline.

  7. Machine learning: Trends, perspectives, and prospects.

    PubMed

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.

  8. Machine learning: Trends, perspectives, and prospects.

    PubMed

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. PMID:26185243

  9. Advances in Distance Learning.

    ERIC Educational Resources Information Center

    1999

    This document contains three symposium papers on advances in distance learning. "The Adoption of Computer Technology and Telecommunications: A Case Study" (Larry M. Dooley, Teri Metcalf, Ann Martinez) reports on a study of the possible applications of two theoretical models (Rogers' Diffusion of Innovations model and the Concerns-Based Adoption…

  10. Machine Learning for Biological Trajectory Classification Applications

    NASA Technical Reports Server (NTRS)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  11. Introduction to machine learning for brain imaging.

    PubMed

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. PMID:21172442

  12. Geological Mapping Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Harvey, A. S.; Fotopoulos, G.

    2016-06-01

    Remotely sensed spectral imagery, geophysical (magnetic and gravity), and geodetic (elevation) data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA), which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines) are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  13. Paradigms for Realizing Machine Learning Algorithms.

    PubMed

    Agneeswaran, Vijay Srinivas; Tonpay, Pranay; Tiwary, Jayati

    2013-12-01

    The article explains the three generations of machine learning algorithms-with all three trying to operate on big data. The first generation tools are SAS, SPSS, etc., while second generation realizations include Mahout and RapidMiner (that work over Hadoop), and the third generation paradigms include Spark and GraphLab, among others. The essence of the article is that for a number of machine learning algorithms, it is important to look beyond the Hadoop's Map-Reduce paradigm in order to make them work on big data. A number of promising contenders have emerged in the third generation that can be exploited to realize deep analytics on big data.

  14. Machine Learning Toolkit for Extreme Scale

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination ofmore » samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are considered in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less

  15. Machine Learning Toolkit for Extreme Scale

    SciTech Connect

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are considered in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets

  16. Using Simple Machines to Leverage Learning

    ERIC Educational Resources Information Center

    Dotger, Sharon

    2008-01-01

    What would your students say if you told them they could lift you off the ground using a block and a board? Using a simple machine, they'll find out they can, and they'll learn about work, energy, and motion in the process! In addition, this integrated lesson gives students the opportunity to investigate variables while practicing measurement…

  17. Vitrification: Machines learn to recognize glasses

    NASA Astrophysics Data System (ADS)

    Ceriotti, Michele; Vitelli, Vincenzo

    2016-05-01

    The dynamics of a viscous liquid undergo a dramatic slowdown when it is cooled to form a solid glass. Recognizing the structural changes across such a transition remains a major challenge. Machine-learning methods, similar to those Facebook uses to recognize groups of friends, have now been applied to this problem.

  18. Digital Signal Processing and Machine Learning

    NASA Astrophysics Data System (ADS)

    Li, Yuanqing; Ang, Kai Keng; Guan, Cuntai

    Any brain-computer interface (BCI) system must translate signals from the users brain into messages or commands (see Fig. 1). Many signal processing and machine learning techniques have been developed for this signal translation, and this chapter reviews the most common ones. Although these techniques are often illustrated using electroencephalography (EEG) signals in this chapter, they are also suitable for other brain signals.

  19. Machine learning for neuroimaging with scikit-learn.

    PubMed

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. PMID:24600388

  20. Fast, Continuous Audiogram Estimation using Machine Learning

    PubMed Central

    Song, Xinyu D.; Wallace, Brittany M.; Gardner, Jacob R.; Ledbetter, Noah M.; Weinberger, Kilian Q.; Barbour, Dennis L.

    2016-01-01

    Objectives Pure-tone audiometry has been a staple of hearing assessments for decades. Many different procedures have been proposed for measuring thresholds with pure tones by systematically manipulating intensity one frequency at a time until a discrete threshold function is determined. The authors have developed a novel nonparametric approach for estimating a continuous threshold audiogram using Bayesian estimation and machine learning classification. The objective of this study is to assess the accuracy and reliability of this new method relative to a commonly used threshold measurement technique. Design The authors performed air conduction pure-tone audiometry on 21 participants between the ages of 18 and 90 years with varying degrees of hearing ability. Two repetitions of automated machine learning audiogram estimation and 1 repetition of conventional modified Hughson-Westlake ascending-descending audiogram estimation were acquired by an audiologist. The estimated hearing thresholds of these two techniques were compared at standard audiogram frequencies (i.e., 0.25, 0.5, 1, 2, 4, 8 kHz). Results The two threshold estimate methods delivered very similar estimates at standard audiogram frequencies. Specifically, the mean absolute difference between estimates was 4.16 ± 3.76 dB HL. The mean absolute difference between repeated measurements of the new machine learning procedure was 4.51 ± 4.45 dB HL. These values compare favorably to those of other threshold audiogram estimation procedures. Furthermore, the machine learning method generated threshold estimates from significantly fewer samples than the modified Hughson-Westlake procedure while returning a continuous threshold estimate as a function of frequency. Conclusions The new machine learning audiogram estimation technique produces continuous threshold audiogram estimates accurately, reliably, and efficiently, making it a strong candidate for widespread application in clinical and research audiometry. PMID

  1. Machine learning in soil classification.

    PubMed

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    In a number of engineering problems, e.g. in geotechnics, petroleum engineering, etc. intervals of measured series data (signals) are to be attributed a class maintaining the constraint of contiguity and standard classification methods could be inadequate. Classification in this case needs involvement of an expert who observes the magnitude and trends of the signals in addition to any a priori information that might be available. In this paper, an approach for automating this classification procedure is presented. Firstly, a segmentation algorithm is developed and applied to segment the measured signals. Secondly, the salient features of these segments are extracted using boundary energy method. Based on the measured data and extracted features to assign classes to the segments classifiers are built; they employ Decision Trees, ANN and Support Vector Machines. The methodology was tested in classifying sub-surface soil using measured data from Cone Penetration Testing and satisfactory results were obtained. PMID:16530382

  2. Prototype classification: insights from machine learning.

    PubMed

    Graf, Arnulf B A; Bousquet, Olivier; Rätsch, Gunnar; Schölkopf, Bernhard

    2009-01-01

    We shed light on the discrimination between patterns belonging to two different classes by casting this decoding problem into a generalized prototype framework. The discrimination process is then separated into two stages: a projection stage that reduces the dimensionality of the data by projecting it on a line and a threshold stage where the distributions of the projected patterns of both classes are separated. For this, we extend the popular mean-of-class prototype classification using algorithms from machine learning that satisfy a set of invariance properties. We report a simple yet general approach to express different types of linear classification algorithms in an identical and easy-to-visualize formal framework using generalized prototypes where these prototypes are used to express the normal vector and offset of the hyperplane. We investigate non-margin classifiers such as the classical prototype classifier, the Fisher classifier, and the relevance vector machine. We then study hard and soft margin classifiers such as the support vector machine and a boosted version of the prototype classifier. Subsequently, we relate mean-of-class prototype classification to other classification algorithms by showing that the prototype classifier is a limit of any soft margin classifier and that boosting a prototype classifier yields the support vector machine. While giving novel insights into classification per se by presenting a common and unified formalism, our generalized prototype framework also provides an efficient visualization and a principled comparison of machine learning classification.

  3. Advances of implementing NC machine tools discussed

    NASA Astrophysics Data System (ADS)

    Kukuyev, Y. P.; Trukhan, Y. V.

    1984-11-01

    Numerical control machine tools which are one of the principal resources of reequipment, mechanization and automation of small series and series production in machine building were examined. The continually increasing volume of NC machine tools which are produced and introduced is economically significant for introduction of these machine tools to operation and organization of their effective use. Organizational and technical measures were directed at solving these problems. To insure the fastest introduction of NC machine tools into operation and their technical maintenance, a number of setting up organizations was organized. Setting up services are also provided by the plants manufacturing the NC machine tools, and appropriate subdivisions are created for this purpose.

  4. Protein function in precision medicine: deep understanding with machine learning.

    PubMed

    Rost, Burkhard; Radivojac, Predrag; Bromberg, Yana

    2016-08-01

    Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both. PMID:27423136

  5. Paradigms for Realizing Machine Learning Algorithms.

    PubMed

    Agneeswaran, Vijay Srinivas; Tonpay, Pranay; Tiwary, Jayati

    2013-12-01

    The article explains the three generations of machine learning algorithms-with all three trying to operate on big data. The first generation tools are SAS, SPSS, etc., while second generation realizations include Mahout and RapidMiner (that work over Hadoop), and the third generation paradigms include Spark and GraphLab, among others. The essence of the article is that for a number of machine learning algorithms, it is important to look beyond the Hadoop's Map-Reduce paradigm in order to make them work on big data. A number of promising contenders have emerged in the third generation that can be exploited to realize deep analytics on big data. PMID:27447253

  6. Man-machine cooperation in advanced teleoperation

    NASA Technical Reports Server (NTRS)

    Fiorini, Paolo; Das, Hari; Lee, Sukhan

    1993-01-01

    Teleoperation experiments at JPL have shown that advanced features in a telerobotic system are a necessary condition for good results, but that they are not sufficient to assure consistently good performance by the operators. Two or three operators are normally used during training and experiments to maintain the desired performance. An alternative to this multi-operator control station is a man-machine interface embedding computer programs that can perform some of the operator's functions. In this paper we present our first experiments with these concepts, in which we focused on the areas of real-time task monitoring and interactive path planning. In the first case, when performing a known task, the operator has an automatic aid for setting control parameters and camera views. In the second case, an interactive path planner will rank different path alternatives so that the operator will make the correct control decision. The monitoring function has been implemented with a neural network doing the real-time task segmentation. The interactive path planner was implemented for redundant manipulators to specify arm configurations across the desired path and satisfy geometric, task, and performance constraints.

  7. Machine Learning and Geometric Technique for SLAM

    NASA Astrophysics Data System (ADS)

    Bernal-Marin, Miguel; Bayro-Corrochano, Eduardo

    This paper describes a new approach for building 3D geometric maps using a laser rangefinder, a stereo camera system and a mathematical system the Conformal Geometric Algebra. The use of a known visual landmarks in the map helps to carry out a good localization of the robot. A machine learning technique is used for recognition of objects in the environment. These landmarks are found using the Viola and Jones algorithm and are represented with their position in the 3D virtual map.

  8. Topics in Machine Learning for Astronomers

    NASA Astrophysics Data System (ADS)

    Cisewski, Jessi

    2016-01-01

    As astronomical datasets continue to increase in size and complexity, innovative statistical and machine learning tools are required to address the scientific questions of interest in a computationally efficient manner. I will introduce some tools that astronomers can employ for such problems with a focus on clustering and classification techniques. I will introduce standard methods, but also get into more recent developments that may be of use to the astronomical community.

  9. Prototype-based models in machine learning.

    PubMed

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of potentially high-dimensional, complex datasets. We discuss basic schemes of competitive vector quantization as well as the so-called neural gas approach and Kohonen's topology-preserving self-organizing map. Supervised learning in prototype systems is exemplified in terms of learning vector quantization. Most frequently, the familiar Euclidean distance serves as a dissimilarity measure. We present extensions of the framework to nonstandard measures and give an introduction to the use of adaptive distances in relevance learning.

  10. Scaling up: Distributed machine learning with cooperation

    SciTech Connect

    Provost, F.J.; Hennessy, D.N.

    1996-12-31

    Machine-learning methods are becoming increasingly popular for automated data analysis. However, standard methods do not scale up to massive scientific and business data sets without expensive hardware. This paper investigates a practical alternative for scaling up: the use of distributed processing to take advantage of the often dormant PCs and workstations available on local networks. Each workstation runs a common rule-learning program on a subset of the data. We first show that for commonly used rule-evaluation criteria, a simple form of cooperation can guarantee that a rule will look good to the set of cooperating learners if and only if it would look good to a single learner operating with the entire data set. We then show how such a system can further capitalize on different perspectives by sharing learned knowledge for significant reduction in search effort. We demonstrate the power of the method by learning from a massive data set taken from the domain of cellular fraud detection. Finally, we provide an overview of other methods for scaling up machine learning.

  11. Machine learning: how to get more out of HEP data and the Higgs Boson Machine Learning Challenge

    NASA Astrophysics Data System (ADS)

    Wolter, Marcin

    2015-09-01

    Multivariate techniques using machine learning algorithms have become an integral part in many High Energy Physics (HEP) data analyses. The article shows the gain in physics reach of the physics experiments due to the adaptation of machine learning techniques. Rapid development in the field of machine learning in the last years is a challenge for the HEP community. The open competition for machine learning experts "Higgs Boson Machine Learning Challenge" shows, that the modern techniques developed outside HEP can significantly improve the analysis of data from HEP experiments and improve the sensitivity of searches for new particles and processes.

  12. Sparse extreme learning machine for classification.

    PubMed

    Bai, Zuo; Huang, Guang-Bin; Wang, Danwei; Wang, Han; Westover, M Brandon

    2014-10-01

    Extreme learning machine (ELM) was initially proposed for single-hidden-layer feedforward neural networks (SLFNs). In the hidden layer (feature mapping), nodes are randomly generated independently of training data. Furthermore, a unified ELM was proposed, providing a single framework to simplify and unify different learning methods, such as SLFNs, least square support vector machines, proximal support vector machines, and so on. However, the solution of unified ELM is dense, and thus, usually plenty of storage space and testing time are required for large-scale applications. In this paper, a sparse ELM is proposed as an alternative solution for classification, reducing storage space and testing time. In addition, unified ELM obtains the solution by matrix inversion, whose computational complexity is between quadratic and cubic with respect to the training size. It still requires plenty of training time for large-scale problems, even though it is much faster than many other traditional methods. In this paper, an efficient training algorithm is specifically developed for sparse ELM. The quadratic programming problem involved in sparse ELM is divided into a series of smallest possible sub-problems, each of which are solved analytically. Compared with SVM, sparse ELM obtains better generalization performance with much faster training speed. Compared with unified ELM, sparse ELM achieves similar generalization performance for binary classification applications, and when dealing with large-scale binary classification problems, sparse ELM realizes even faster training speed than unified ELM. PMID:25222727

  13. Optoelectronic neural networks and learning machines

    SciTech Connect

    Farhat, N.H

    1989-09-01

    Optics offers advantages in realizing the parallelism, massive interconnectivity, and plasticity required in the design and construction of large-scale optoelectronic (photonic) neurocomputers that solve optimization problems at potentially very high speeds by learning to perform mappings and associations. To elucidate these advantages, a brief neural net primer based on phase-space and energy landscape considerations is presented. This provides the basis for subsequent discussion of optoelectronic architectures and implementations with self-organization and learning ability that are configured around an optical crossbar interconnect. Stochastic learning in the context of a Boltzmann machine is then described to illustrate the flexibility of optoelectronics in performing tasks that may be difficult for electronics alone. Stochastic nets are studies to gain insight into the possible role of noise in biological neural nets. The authors describe two approaches to realizing large-scale optoelectronic neurocomputers.

  14. Discriminative clustering via extreme learning machine.

    PubMed

    Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng

    2015-10-01

    Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036

  15. Entanglement-based machine learning on a quantum computer.

    PubMed

    Cai, X-D; Wu, D; Su, Z-E; Chen, M-C; Wang, X-L; Li, Li; Liu, N-L; Lu, C-Y; Pan, J-W

    2015-03-20

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning. PMID:25839250

  16. Entanglement-based machine learning on a quantum computer.

    PubMed

    Cai, X-D; Wu, D; Su, Z-E; Chen, M-C; Wang, X-L; Li, Li; Liu, N-L; Lu, C-Y; Pan, J-W

    2015-03-20

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  17. Entanglement-Based Machine Learning on a Quantum Computer

    NASA Astrophysics Data System (ADS)

    Cai, X.-D.; Wu, D.; Su, Z.-E.; Chen, M.-C.; Wang, X.-L.; Li, Li; Liu, N.-L.; Lu, C.-Y.; Pan, J.-W.

    2015-03-01

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  18. Extreme Learning Machine for Multilayer Perceptron.

    PubMed

    Tang, Jiexiong; Deng, Chenwei; Huang, Guang-Bin

    2016-04-01

    Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via l1 constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme. PMID:25966483

  19. Metagenomic taxonomic classification using extreme learning machines.

    PubMed

    Rasheed, Zeehasham; Rangwala, Huzefa

    2012-10-01

    Next-generation sequencing technologies have allowed researchers to determine the collective genomes of microbial communities co-existing within diverse ecological environments. Varying species abundance, length and complexities within different communities, coupled with discovery of new species makes the problem of taxonomic assignment to short DNA sequence reads extremely challenging. We have developed a new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides. TAC-ELM is evaluated on two metagenomic benchmarks with sequence read lengths reflecting the traditional and current sequencing technologies. Our empirical results indicate the strength of the developed approach, which outperforms state-of-the-art taxonomic classifiers in terms of accuracy and implementation complexity. We also perform experiments that evaluate the pervasive case within metagenome analysis, where a species may not have been previously sequenced or discovered and will not exist in the reference genome databases. TAC-ELM was also combined with BLAST to show improved classification results. Code and Supplementary Results: http://www.cs.gmu.edu/~mlbio/TAC-ELM (BSD License). PMID:22849369

  20. Ozone ensemble forecast with machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Mallet, Vivien; Stoltz, Gilles; Mauricette, Boris

    2009-03-01

    We apply machine learning algorithms to perform sequential aggregation of ozone forecasts. The latter rely on a multimodel ensemble built for ozone forecasting with the modeling system Polyphemus. The ensemble simulations are obtained by changes in the physical parameterizations, the numerical schemes, and the input data to the models. The simulations are carried out for summer 2001 over western Europe in order to forecast ozone daily peaks and ozone hourly concentrations. On the basis of past observations and past model forecasts, the learning algorithms produce a weight for each model. A convex or linear combination of the model forecasts is then formed with these weights. This process is repeated for each round of forecasting and is therefore called sequential aggregation. The aggregated forecasts demonstrate good results; for instance, they always show better performance than the best model in the ensemble and they even compete against the best constant linear combination. In addition, the machine learning algorithms come with theoretical guarantees with respect to their performance, that hold for all possible sequences of observations, even nonstochastic ones. Our study also demonstrates the robustness of the methods. We therefore conclude that these aggregation methods are very relevant for operational forecasts.

  1. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    ERIC Educational Resources Information Center

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  2. Patient-centered yes/no prognosis using learning machines

    PubMed Central

    König, I.R.; Malley, J.D.; Pajevic, S.; Weimar, C.; Diener, H-C.

    2009-01-01

    In the last 15 years several machine learning approaches have been developed for classification and regression. In an intuitive manner we introduce the main ideas of classification and regression trees, support vector machines, bagging, boosting and random forests. We discuss differences in the use of machine learning in the biomedical community and the computer sciences. We propose methods for comparing machines on a sound statistical basis. Data from the German Stroke Study Collaboration is used for illustration. We compare the results from learning machines to those obtained by a published logistic regression and discuss similarities and differences. PMID:19216340

  3. Finding Density Functionals with Machine Learning

    NASA Astrophysics Data System (ADS)

    Snyder, John C.; Rupp, Matthias; Hansen, Katja; Müller, Klaus-Robert; Burke, Kieron

    2012-06-01

    Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of noninteracting fermions in 1D, mean absolute errors below 1kcal/mol on test densities similar to the training set are reached with fewer than 100 training densities. A predictor identifies if a test density is within the interpolation region. Via principal component analysis, a projected functional derivative finds highly accurate self-consistent densities. The challenges for application of our method to real electronic structure problems are discussed.

  4. Towards Machine Learning of Motor Skills

    NASA Astrophysics Data System (ADS)

    Peters, Jan; Schaal, Stefan; Schölkopf, Bernhard

    Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

  5. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    PubMed Central

    Subbulakshmi, C. V.; Deepa, S. N.

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers. PMID:26491713

  6. Machine learning of user profiles: Representational issues

    SciTech Connect

    Bloedorn, E.; Mani, I.; MacMillan, T.R.

    1996-12-31

    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (ER) problem.

  7. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred.

  8. Measure Transformer Semantics for Bayesian Machine Learning

    NASA Astrophysics Data System (ADS)

    Borgström, Johannes; Gordon, Andrew D.; Greenberg, Michael; Margetson, James; van Gael, Jurgen

    The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.

  9. Galaxy morphology - An unsupervised machine learning approach

    NASA Astrophysics Data System (ADS)

    Schutter, A.; Shamir, L.

    2015-09-01

    Structural properties poses valuable information about the formation and evolution of galaxies, and are important for understanding the past, present, and future universe. Here we use unsupervised machine learning methodology to analyze a network of similarities between galaxy morphological types, and automatically deduce a morphological sequence of galaxies. Application of the method to the EFIGI catalog show that the morphological scheme produced by the algorithm is largely in agreement with the De Vaucouleurs system, demonstrating the ability of computer vision and machine learning methods to automatically profile galaxy morphological sequences. The unsupervised analysis method is based on comprehensive computer vision techniques that compute the visual similarities between the different morphological types. Rather than relying on human cognition, the proposed system deduces the similarities between sets of galaxy images in an automatic manner, and is therefore not limited by the number of galaxies being analyzed. The source code of the method is publicly available, and the protocol of the experiment is included in the paper so that the experiment can be replicated, and the method can be used to analyze user-defined datasets of galaxy images.

  10. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. PMID:26829605

  11. Mining the Kepler Data using Machine Learning

    NASA Astrophysics Data System (ADS)

    Walkowicz, Lucianne; Howe, A. R.; Nayar, R.; Turner, E. L.; Scargle, J.; Meadows, V.; Zee, A.

    2014-01-01

    Kepler's high cadence and incredible precision has provided an unprecedented view into stars and their planetary companions, revealing both expected and novel phenomena and systems. Due to the large number of Kepler lightcurves, the discovery of novel phenomena in particular has often been serendipitous in the course of searching for known forms of variability (for example, the discovery of the doubly pulsating elliptical binary KOI-54, originally identified by the transiting planet search pipeline). In this talk, we discuss progress on mining the Kepler data through both supervised and unsupervised machine learning, intended to both systematically search the Kepler lightcurves for rare or anomalous variability, and to create a variability catalog for community use. Mining the dataset in this way also allows for a quantitative identification of anomalous variability, and so may also be used as a signal-agnostic form of optical SETI. As the Kepler data are exceptionally rich, they provide an interesting counterpoint to machine learning efforts typically performed on sparser and/or noisier survey data, and will inform similar characterization carried out on future survey datasets.

  12. Photometric Supernova Classification with Machine Learning

    NASA Astrophysics Data System (ADS)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  13. Learning Activity Packets for Milling Machines. Unit I--Introduction to Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to identify parts and attachments of vertical and horizontal milling machines, identify work-holding devices, state safety rules, and…

  14. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    PubMed

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-01

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  15. Learning to Control Advanced Life Support Systems

    NASA Technical Reports Server (NTRS)

    Subramanian, Devika

    2004-01-01

    Advanced life support systems have many interacting processes and limited resources. Controlling and optimizing advanced life support systems presents unique challenges. In particular, advanced life support systems are nonlinear coupled dynamical systems and it is difficult for humans to take all interactions into account to design an effective control strategy. In this project. we developed several reinforcement learning controllers that actively explore the space of possible control strategies, guided by rewards from a user specified long term objective function. We evaluated these controllers using a discrete event simulation of an advanced life support system. This simulation, called BioSim, designed by Nasa scientists David Kortenkamp and Scott Bell has multiple, interacting life support modules including crew, food production, air revitalization, water recovery, solid waste incineration and power. They are implemented in a consumer/producer relationship in which certain modules produce resources that are consumed by other modules. Stores hold resources between modules. Control of this simulation is via adjusting flows of resources between modules and into/out of stores. We developed adaptive algorithms that control the flow of resources in BioSim. Our learning algorithms discovered several ingenious strategies for maximizing mission length by controlling the air and water recycling systems as well as crop planting schedules. By exploiting non-linearities in the overall system dynamics, the learned controllers easily out- performed controllers written by human experts. In sum, we accomplished three goals. We (1) developed foundations for learning models of coupled dynamical systems by active exploration of the state space, (2) developed and tested algorithms that learn to efficiently control air and water recycling processes as well as crop scheduling in Biosim, and (3) developed an understanding of the role machine learning in designing control systems for

  16. Industrial Inspection with Open Eyes: Advance with Machine Vision Technology

    SciTech Connect

    Liu, Zheng; Ukida, H.; Niel, Kurt; Ramuhalli, Pradeep

    2015-10-01

    Machine vision systems have evolved significantly with the technology advances to tackle the challenges from modern manufacturing industry. A wide range of industrial inspection applications for quality control are benefiting from visual information captured by different types of cameras variously configured in a machine vision system. This chapter screens the state of the art in machine vision technologies in the light of hardware, software tools, and major algorithm advances for industrial inspection. The inspection beyond visual spectrum offers a significant complementary to the visual inspection. The combination with multiple technologies makes it possible for the inspection to achieve a better performance and efficiency in varied applications. The diversity of the applications demonstrates the great potential of machine vision systems for industry.

  17. Teacher Leaders: Advancing Mathematics Learning

    ERIC Educational Resources Information Center

    Kinzer, Cathy J.; Rincón, Mari; Ward, Jana; Rincón, Ricardo; Gomez, Lesli

    2014-01-01

    Four elementary school instructors offer insights into their classrooms, their unique professional roles, and their leadership approaches as they reflect on their journey to advance teacher and student mathematics learning. They note a "teacher leader" serves as an example to other educators and strives to impact student learning;…

  18. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  19. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    ERIC Educational Resources Information Center

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  20. Large-Scale Machine Learning for Classification and Search

    ERIC Educational Resources Information Center

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  1. Newton Methods for Large Scale Problems in Machine Learning

    ERIC Educational Resources Information Center

    Hansen, Samantha Leigh

    2014-01-01

    The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss…

  2. Tilinglike learning in the parity machine

    NASA Astrophysics Data System (ADS)

    Biehl, Michael; Opper, Manfred

    1991-11-01

    An algorithm for the training of multilayered feedforward neural networks is presented. The strategy is very similar to the well-known tiling algorithm, yet the resulting architecture is completely different. New hidden units are added to one layer only in order to correct the errors of the previous ones; standard perceptron learning can be applied. The output of the network is given by the product of these k (+/-1) neurons (parity machine). In a special case with two hidden units, the capacity αc and stability of the network can be derived exactly by means of a replica-symmetric calculation. Correlations between the two sets of couplings vanish exactly. For the case of arbitrary k, estimates of αc are given. The asymptotic capacity per input neuron of a network trained according to the proposed algorithm is found to be αc~k lnk for k-->∞ in the estimation. This is in agreement with recent analytic results for the algorithm-independent capacity of a parity machine.

  3. Machine Learning for Dynamical Mean Field Theory

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-Francois; Lopez-Bezanilla, Alejandro; von Lilienfeld, O. Anatole; Littlewood, P. B.; Millis, Andy

    2014-03-01

    Machine Learning (ML), an approach that infers new results from accumulated knowledge, is in use for a variety of tasks ranging from face and voice recognition to internet searching and has recently been gaining increasing importance in chemistry and physics. In this talk, we investigate the possibility of using ML to solve the equations of dynamical mean field theory which otherwise requires the (numerically very expensive) solution of a quantum impurity model. Our ML scheme requires the relation between two functions: the hybridization function describing the bare (local) electronic structure of a material and the self-energy describing the many body physics. We discuss the parameterization of the two functions for the exact diagonalization solver and present examples, beginning with the Anderson Impurity model with a fixed bath density of states, demonstrating the advantages and the pitfalls of the method. DOE contract DE-AC02-06CH11357.

  4. Tracking medical genetic literature through machine learning.

    PubMed

    Bornstein, Aaron T; McLoughlin, Matthew H; Aguilar, Jesus; Wong, Wendy S W; Solomon, Benjamin D

    2016-08-01

    There has been remarkable progress in identifying the causes of genetic conditions as well as understanding how changes in specific genes cause disease. Though difficult (and often superficial) to parse, an interesting tension involves emphasis on basic research aimed to dissect normal and abnormal biology versus more clearly clinical and therapeutic investigations. To examine one facet of this question and to better understand progress in Mendelian-related research, we developed an algorithm that classifies medical literature into three categories (Basic, Clinical, and Management) and conducted a retrospective analysis. We built a supervised machine learning classification model using the Azure Machine Learning (ML) Platform and analyzed the literature (1970-2014) from NCBI's Entrez Gene2Pubmed Database (http://www.ncbi.nlm.nih.gov/gene) using genes from the NHGRI's Clinical Genomics Database (http://research.nhgri.nih.gov/CGD/). We applied our model to 376,738 articles: 288,639 (76.6%) were classified as Basic, 54,178 (14.4%) as Clinical, and 24,569 (6.5%) as Management. The average classification accuracy was 92.2%. The rate of Clinical publication was significantly higher than Basic or Management. The rate of publication of article types differed significantly when divided into key eras: Human Genome Project (HGP) planning phase (1984-1990); HGP launch (1990) to publication (2001); following HGP completion to the "Next Generation" advent (2009); the era following 2009. In conclusion, in addition to the findings regarding the pace and focus of genetic progress, our algorithm produced a database that can be used in a variety of contexts including automating the identification of management-related literature.

  5. Tracking medical genetic literature through machine learning.

    PubMed

    Bornstein, Aaron T; McLoughlin, Matthew H; Aguilar, Jesus; Wong, Wendy S W; Solomon, Benjamin D

    2016-08-01

    There has been remarkable progress in identifying the causes of genetic conditions as well as understanding how changes in specific genes cause disease. Though difficult (and often superficial) to parse, an interesting tension involves emphasis on basic research aimed to dissect normal and abnormal biology versus more clearly clinical and therapeutic investigations. To examine one facet of this question and to better understand progress in Mendelian-related research, we developed an algorithm that classifies medical literature into three categories (Basic, Clinical, and Management) and conducted a retrospective analysis. We built a supervised machine learning classification model using the Azure Machine Learning (ML) Platform and analyzed the literature (1970-2014) from NCBI's Entrez Gene2Pubmed Database (http://www.ncbi.nlm.nih.gov/gene) using genes from the NHGRI's Clinical Genomics Database (http://research.nhgri.nih.gov/CGD/). We applied our model to 376,738 articles: 288,639 (76.6%) were classified as Basic, 54,178 (14.4%) as Clinical, and 24,569 (6.5%) as Management. The average classification accuracy was 92.2%. The rate of Clinical publication was significantly higher than Basic or Management. The rate of publication of article types differed significantly when divided into key eras: Human Genome Project (HGP) planning phase (1984-1990); HGP launch (1990) to publication (2001); following HGP completion to the "Next Generation" advent (2009); the era following 2009. In conclusion, in addition to the findings regarding the pace and focus of genetic progress, our algorithm produced a database that can be used in a variety of contexts including automating the identification of management-related literature. PMID:27268407

  6. Advances in learning disabilities.

    PubMed

    Nass, R

    1994-04-01

    This review reports recent findings about diagnostic criteria, epidemiology, genetics, neuropsychological underpinnings, neuroanatomy, etiology, outcome, and treatment for the following: pervasive developmental disorder, the developmental language disorders, dyslexia, and dyscalculia. In addition, recent findings about neurological correlates, medical causes, and gender effects of learning disabilities are discussed. PMID:8019665

  7. ATCA for Machines-- Advanced Telecommunications Computing Architecture

    SciTech Connect

    Larsen, R.S.; /SLAC

    2008-04-22

    The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.

  8. Abrasives and Grinding Machines; Machine Shop Work--Advanced: 9557.02.

    ERIC Educational Resources Information Center

    Dade County Public Schools, Miami, FL.

    The course outline has been prepared as a guide to assist the instructor in systematically planning and presenting a variety of meaningful lessons to facilitate the necessary training for the machine shop student. The material contained in the outline is designed to enable the student to learn the manipulative skills and related knowledge…

  9. Precision Machining and Technology; Machine Shop Work--Advanced: 9557.04.

    ERIC Educational Resources Information Center

    Dade County Public Schools, Miami, FL.

    The course outline has been prepared as a guide to assist the instructor in systematically planning and presenting a variety of meaningful lessons to facilitate the necessary training for the machine shop student. The material is designed to enable the student to learn the manipulative skills and related knowledge necessary to understand the jig…

  10. Studying depression using imaging and machine learning methods.

    PubMed

    Patel, Meenal J; Khalaf, Alexander; Aizenstein, Howard J

    2016-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies.

  11. Machine Learning for High-Throughput Stress Phenotyping in Plants.

    PubMed

    Singh, Arti; Ganapathysubramanian, Baskar; Singh, Asheesh Kumar; Sarkar, Soumik

    2016-02-01

    Advances in automated and high-throughput imaging technologies have resulted in a deluge of high-resolution images and sensor data of plants. However, extracting patterns and features from this large corpus of data requires the use of machine learning (ML) tools to enable data assimilation and feature identification for stress phenotyping. Four stages of the decision cycle in plant stress phenotyping and plant breeding activities where different ML approaches can be deployed are (i) identification, (ii) classification, (iii) quantification, and (iv) prediction (ICQP). We provide here a comprehensive overview and user-friendly taxonomy of ML tools to enable the plant community to correctly and easily apply the appropriate ML tools and best-practice guidelines for various biotic and abiotic stress traits.

  12. Image Segmentation for Connectomics Using Machine Learning

    SciTech Connect

    Tasdizen, Tolga; Seyedhosseini, Mojtaba; Liu, TIng; Jones, Cory; Jurrus, Elizabeth R.

    2014-12-01

    Reconstruction of neural circuits at the microscopic scale of individual neurons and synapses, also known as connectomics, is an important challenge for neuroscience. While an important motivation of connectomics is providing anatomical ground truth for neural circuit models, the ability to decipher neural wiring maps at the individual cell level is also important in studies of many neurodegenerative diseases. Reconstruction of a neural circuit at the individual neuron level requires the use of electron microscopy images due to their extremely high resolution. Computational challenges include pixel-by-pixel annotation of these images into classes such as cell membrane, mitochondria and synaptic vesicles and the segmentation of individual neurons. State-of-the-art image analysis solutions are still far from the accuracy and robustness of human vision and biologists are still limited to studying small neural circuits using mostly manual analysis. In this chapter, we describe our image analysis pipeline that makes use of novel supervised machine learning techniques to tackle this problem.

  13. Predicting Increased Blood Pressure Using Machine Learning

    PubMed Central

    Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Soares, Telma de Jesus; dos Reis, Luciana Araujo

    2014-01-01

    The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R2 (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R2 (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power. PMID:24669313

  14. Many-body physics via machine learning

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-Francois; von Lilienfeld, O. Anatole; Millis, Andrew J.

    We demonstrate a method for the use of machine learning (ML) to solve the equations of many-body physics, which are functional equations linking a bare to an interacting Green's function (or self-energy) offering transferable power of prediction for physical quantities for both the forward and the reverse engineering problem of materials. Functions are represented by coefficients in an orthogonal polynomial expansion and kernel ridge regression is used. The method is demonstrated using as an example a database built from Dynamical Mean Field theory (DMFT) calculations on the three dimensional Hubbard model. We discuss the extension to a database for real materials. We also discuss some new area of investigation concerning high throughput predictions for real materials by offering a perspective of how our scheme is general enough for applications to other problems involving the inversion of integral equations from the integrated knowledge such as the analytical continuation of the Green's function and the reconstruction of lattice structures from X-ray spectra. Office of Science of the U.S. Department of Energy under SubContract DOE No. 3F-3138 and FG-ER04169.

  15. Acceleration of saddle-point searches with machine learning.

    PubMed

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  16. Acceleration of saddle-point searches with machine learning

    NASA Astrophysics Data System (ADS)

    Peterson, Andrew A.

    2016-08-01

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community.

  17. Reduced multiple empirical kernel learning machine.

    PubMed

    Wang, Zhe; Lu, MingZhe; Gao, Daqi

    2015-02-01

    Multiple kernel learning (MKL) is demonstrated to be flexible and effective in depicting heterogeneous data sources since MKL can introduce multiple kernels rather than a single fixed kernel into applications. However, MKL would get a high time and space complexity in contrast to single kernel learning, which is not expected in real-world applications. Meanwhile, it is known that the kernel mapping ways of MKL generally have two forms including implicit kernel mapping and empirical kernel mapping (EKM), where the latter is less attracted. In this paper, we focus on the MKL with the EKM, and propose a reduced multiple empirical kernel learning machine named RMEKLM for short. To the best of our knowledge, it is the first to reduce both time and space complexity of the MKL with EKM. Different from the existing MKL, the proposed RMEKLM adopts the Gauss Elimination technique to extract a set of feature vectors, which is validated that doing so does not lose much information of the original feature space. Then RMEKLM adopts the extracted feature vectors to span a reduced orthonormal subspace of the feature space, which is visualized in terms of the geometry structure. It can be demonstrated that the spanned subspace is isomorphic to the original feature space, which means that the dot product of two vectors in the original feature space is equal to that of the two corresponding vectors in the generated orthonormal subspace. More importantly, the proposed RMEKLM brings a simpler computation and meanwhile needs a less storage space, especially in the processing of testing. Finally, the experimental results show that RMEKLM owns a much efficient and effective performance in terms of both complexity and classification. The contributions of this paper can be given as follows: (1) by mapping the input space into an orthonormal subspace, the geometry of the generated subspace is visualized; (2) this paper first reduces both the time and space complexity of the EKM-based MKL; (3

  18. Learning Machine, Vietnamese Based Human-Computer Interface.

    ERIC Educational Resources Information Center

    Northwest Regional Educational Lab., Portland, OR.

    The sixth session of IT@EDU98 consisted of seven papers on the topic of the learning machine--Vietnamese based human-computer interface, and was chaired by Phan Viet Hoang (Informatics College, Singapore). "Knowledge Based Approach for English Vietnamese Machine Translation" (Hoang Kiem, Dinh Dien) presents the knowledge base approach, which…

  19. Learn about Physical Science: Simple Machines. [CD-ROM].

    ERIC Educational Resources Information Center

    2000

    This CD-ROM, designed for students in grades K-2, explores the world of simple machines. It allows students to delve into the mechanical world and learn the ways in which simple machines make work easier. Animated demonstrations are provided of the lever, pulley, wheel, screw, wedge, and inclined plane. Activities include practical matching and…

  20. Machine learning challenges in Mars rover traverse science

    NASA Technical Reports Server (NTRS)

    Castano, R.; Judd, M.; Anderson, R. C.; Estlin, T.

    2003-01-01

    The successful implementation of machine learning in autonomous rover traverse science requires addressing challenges that range from the analytical technical realm, to the fuzzy, philosophical domain of entrenched belief systems within scientists and mission managers.

  1. Shedding Light on Synergistic Chemical Genetic Connections with Machine Learning.

    PubMed

    Ekins, Sean; Siqueira-Neto, Jair Lage

    2015-12-23

    Machine learning can be used to predict compounds acting synergistically, and this could greatly expand the universe of available potential treatments for diseases that are currently hidden in the dark chemical matter. PMID:27136350

  2. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and promises

    PubMed Central

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2014-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to misinformed conclusions. To illustrate this concern, the current paper critically evaluates and attempts to reproduce results from two studies (Wall et al., 2012a; Wall et al., 2012b) that claim to drastically reduce time to diagnose autism using machine learning. Our failure to generate comparable findings to those reported by Wall and colleagues using larger and more balanced data underscores several conceptual and methodological problems associated with these studies. We conclude with proposed best-practices when using machine learning in autism research, and highlight some especially promising areas for collaborative work at the intersection of computational and behavioral science. PMID:25294649

  3. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    PubMed Central

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-01-01

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further, a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. While this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well. PMID:26876223

  4. Machine learning strategy for accelerated design of polymer dielectrics

    DOE PAGES

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-02-15

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further,more » a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. Furthermore, while this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well.« less

  5. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    NASA Astrophysics Data System (ADS)

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-02-01

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further, a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. While this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well.

  6. Risk prediction with machine learning and regression methods.

    PubMed

    Steyerberg, Ewout W; van der Ploeg, Tjeerd; Van Calster, Ben

    2014-07-01

    This is a discussion of issues in risk prediction based on the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler.

  7. Man-machine interface builders at the Advanced Photon Source

    SciTech Connect

    Anderson, M.D.

    1991-12-31

    Argonne National Laboratory is constructing a 7-GeV Advanced Photon Source for use as a synchrotron radiation source in basic and applied research. The controls and computing environment for this accelerator complex includes graphical operator interfaces to the machine based on Motif, X11, and PHIGS/PEX. Construction and operation of the control system for this accelerator relies upon interactive interface builder and diagram/editor type tools, as well as a run-time environment for the constructed displays which communicate with the physical machine via network connections. This paper discusses our experience with several commercial CUI builders, the inadequacies found in these, motivation for the development of an application- specific builder, and design and implementation strategies employed in the development of our own Man-Machine Interface builder. 5 refs.

  8. Man-machine interface builders at the Advanced Photon Source

    SciTech Connect

    Anderson, M.D.

    1991-01-01

    Argonne National Laboratory is constructing a 7-GeV Advanced Photon Source for use as a synchrotron radiation source in basic and applied research. The controls and computing environment for this accelerator complex includes graphical operator interfaces to the machine based on Motif, X11, and PHIGS/PEX. Construction and operation of the control system for this accelerator relies upon interactive interface builder and diagram/editor type tools, as well as a run-time environment for the constructed displays which communicate with the physical machine via network connections. This paper discusses our experience with several commercial CUI builders, the inadequacies found in these, motivation for the development of an application- specific builder, and design and implementation strategies employed in the development of our own Man-Machine Interface builder. 5 refs.

  9. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  10. Machine Learning Assessments of Soil Drying

    NASA Astrophysics Data System (ADS)

    Coopersmith, E. J.; Minsker, B. S.; Wenzel, C.; Gilmore, B. J.

    2011-12-01

    Agricultural activities require the use of heavy equipment and vehicles on unpaved farmlands. When soil conditions are wet, equipment can cause substantial damage, leaving deep ruts. In extreme cases, implements can sink and become mired, causing considerable delays and expense to extricate the equipment. Farm managers, who are often located remotely, cannot assess sites before allocating equipment, causing considerable difficulty in reliably assessing conditions of countless sites with any reliability and frequency. For example, farmers often trace serpentine paths of over one hundred miles each day to assess the overall status of various tracts of land spanning thirty, forty, or fifty miles in each direction. One means of assessing the moisture content of a field lies in the strategic positioning of remotely-monitored in situ sensors. Unfortunately, land owners are often reluctant to place sensors across their properties due to the significant monetary cost and complexity. This work aspires to overcome these limitations by modeling the process of wetting and drying statistically - remotely assessing field readiness using only information that is publically accessible. Such data includes Nexrad radar and state climate network sensors, as well as Twitter-based reports of field conditions for validation. Three algorithms, classification trees, k-nearest-neighbors, and boosted perceptrons are deployed to deliver statistical field readiness assessments of an agricultural site located in Urbana, IL. Two of the three algorithms performed with 92-94% accuracy, with the majority of misclassifications falling within the calculated margins of error. This demonstrates the feasibility of using a machine learning framework with only public data, knowledge of system memory from previous conditions, and statistical tools to assess "readiness" without the need for real-time, on-site physical observation. Future efforts will produce a workflow assimilating Nexrad, climate network

  11. Learning Activity Packets for Milling Machines. Unit III--Vertical Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to set up and operate a vertical mill. Tasks addressed in the LAP include mounting and removing cutters and cutter holders for vertical…

  12. Learning Activity Packets for Milling Machines. Unit II--Horizontal Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to set up and operate a horizontal mill. Tasks addressed in the LAP include mounting style "A" or "B" arbors and adjusting arbor…

  13. Predicting phenotypes of asthma and eczema with machine learning

    PubMed Central

    2014-01-01

    Background There is increasing recognition that asthma and eczema are heterogeneous diseases. We investigated the predictive ability of a spectrum of machine learning methods to disambiguate clinical sub-groups of asthma, wheeze and eczema, using a large heterogeneous set of attributes in an unselected population. The aim was to identify to what extent such heterogeneous information can be combined to reveal specific clinical manifestations. Methods The study population comprised a cross-sectional sample of adults, and included representatives of the general population enriched by subjects with asthma. Linear and non-linear machine learning methods, from logistic regression to random forests, were fit on a large attribute set including demographic, clinical and laboratory features, genetic profiles and environmental exposures. Outcome of interest were asthma, wheeze and eczema encoded by different operational definitions. Model validation was performed via bootstrapping. Results The study population included 554 adults, 42% male, 38% previous or current smokers. Proportion of asthma, wheeze, and eczema diagnoses was 16.7%, 12.3%, and 21.7%, respectively. Models were fit on 223 non-genetic variables plus 215 single nucleotide polymorphisms. In general, non-linear models achieved higher sensitivity and specificity than other methods, especially for asthma and wheeze, less for eczema, with areas under receiver operating characteristic curve of 84%, 76% and 64%, respectively. Our findings confirm that allergen sensitisation and lung function characterise asthma better in combination than separately. The predictive ability of genetic markers alone is limited. For eczema, new predictors such as bio-impedance were discovered. Conclusions More usefully-complex modelling is the key to a better understanding of disease mechanisms and personalised healthcare: further advances are likely with the incorporation of more factors/attributes and longitudinal measures. PMID:25077568

  14. Stellar classification from single-band imaging using machine learning

    NASA Astrophysics Data System (ADS)

    Kuntzer, T.; Tewes, M.; Courbin, F.

    2016-06-01

    Information on the spectral types of stars is of great interest in view of the exploitation of space-based imaging surveys. In this article, we investigate the classification of stars into spectral types using only the shape of their diffraction pattern in a single broad-band image. We propose a supervised machine learning approach to this endeavour, based on principal component analysis (PCA) for dimensionality reduction, followed by artificial neural networks (ANNs) estimating the spectral type. Our analysis is performed with image simulations mimicking the Hubble Space Telescope (HST) Advanced Camera for Surveys (ACS) in the F606W and F814W bands, as well as the Euclid VIS imager. We first demonstrate this classification in a simple context, assuming perfect knowledge of the point spread function (PSF) model and the possibility of accurately generating mock training data for the machine learning. We then analyse its performance in a fully data-driven situation, in which the training would be performed with a limited subset of bright stars from a survey, and an unknown PSF with spatial variations across the detector. We use simulations of main-sequence stars with flat distributions in spectral type and in signal-to-noise ratio, and classify these stars into 13 spectral subclasses, from O5 to M5. Under these conditions, the algorithm achieves a high success rate both for Euclid and HST images, with typical errors of half a spectral class. Although more detailed simulations would be needed to assess the performance of the algorithm on a specific survey, this shows that stellar classification from single-band images is well possible.

  15. Data Triage of Astronomical Transients: A Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Rebbapragada, U.

    This talk presents real-time machine learning systems for triage of big data streams generated by photometric and image-differencing pipelines. Our first system is a transient event detection system in development for the Palomar Transient Factory (PTF), a fully-automated synoptic sky survey that has demonstrated real-time discovery of optical transient events. The system is tasked with discriminating between real astronomical objects and bogus objects, which are usually artifacts of the image differencing pipeline. We performed a machine learning forensics investigation on PTF’s initial system that led to training data improvements that decreased both false positive and negative rates. The second machine learning system is a real-time classification engine of transients and variables in development for the Australian Square Kilometre Array Pathfinder (ASKAP), an upcoming wide-field radio survey with unprecedented ability to investigate the radio transient sky. The goal of our system is to classify light curves into known classes with as few observations as possible in order to trigger follow-up on costlier assets. We discuss the violation of standard machine learning assumptions incurred by this task, and propose the use of ensemble and hierarchical machine learning classifiers that make predictions most robustly.

  16. Application of Learning Machines and Combinatorial Algorithms in Water Resources Management and Hydrologic Sciences

    SciTech Connect

    Khalil, Abedalrazq F.; Kaheil, Yasir H.; Gill, Kashif; Mckee, Mac

    2010-01-01

    Contemporary and water resources engineering and management rely increasingly on pattern recognition techniques that have the ability to capitalize on the unrelenting accumulation of data that is made possible by modern information technology and remote sensing methods. In response to the growing information needs of modern water systems, advanced computational models and tools have been devised to identify and extract relevant information from the mass of data that is now available. This chapter presents innovative applications from computational learning science within the fields of hydrology, hydrogeology, hydroclimatology, and water management. The success of machine learning is evident from the growing number of studies involving the application of Artificial Neural Networks (ANN), Support Vector Machines (SVM), Relevance Vector Machines (RVM), and Locally Weighted Projection Regression (LWPR) to address various issues in hydrologic sciences. The applications that will be discussed within the chapter employ the abovementioned machine learning techniques for intelligent modeling of reservoir operations, temporal downscaling of precipitation, spatial downscaling of soil moisture and evapotranspiration, comparisons of various techniques for groundwater quality modeling, and forecasting of chaotic time series behavior. Combinatorial algorithms to capture the intrinsic complexities in the modeled phenomena and to overcome disparate scales are developed; for example, learning machines have been coupled with geostatistical techniques, non-homogenous hidden Markov models, wavelets, and evolutionary computing techniques. This chapter does not intend to be exhaustive; it reviews the progress that has been made over the past decade in the use of learning machines in applied hydrologic sciences and presents a summary of future needs and challenges for further advancement of these methods.

  17. Building Artificial Vision Systems with Machine Learning

    SciTech Connect

    LeCun, Yann

    2011-02-23

    Three questions pose the next challenge for Artificial Intelligence (AI), robotics, and neuroscience. How do we learn perception (e.g. vision)? How do we learn representations of the perceptual world? How do we learn visual categories from just a few examples?

  18. Machine learning and predictive data analytics enabling metrology and process control in IC fabrication

    NASA Astrophysics Data System (ADS)

    Rana, Narender; Zhang, Yunlin; Wall, Donald; Dirahoui, Bachir; Bailey, Todd C.

    2015-03-01

    Integrate circuit (IC) technology is going through multiple changes in terms of patterning techniques (multiple patterning, EUV and DSA), device architectures (FinFET, nanowire, graphene) and patterning scale (few nanometers). These changes require tight controls on processes and measurements to achieve the required device performance, and challenge the metrology and process control in terms of capability and quality. Multivariate data with complex nonlinear trends and correlations generally cannot be described well by mathematical or parametric models but can be relatively easily learned by computing machines and used to predict or extrapolate. This paper introduces the predictive metrology approach which has been applied to three different applications. Machine learning and predictive analytics have been leveraged to accurately predict dimensions of EUV resist patterns down to 18 nm half pitch leveraging resist shrinkage patterns. These patterns could not be directly and accurately measured due to metrology tool limitations. Machine learning has also been applied to predict the electrical performance early in the process pipeline for deep trench capacitance and metal line resistance. As the wafer goes through various processes its associated cost multiplies. It may take days to weeks to get the electrical performance readout. Predicting the electrical performance early on can be very valuable in enabling timely actionable decision such as rework, scrap, feedforward, feedback predicted information or information derived from prediction to improve or monitor processes. This paper provides a general overview of machine learning and advanced analytics application in the advanced semiconductor development and manufacturing.

  19. Feasibility of Active Machine Learning for Multiclass Compound Classification.

    PubMed

    Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias

    2016-01-25

    A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.

  20. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations.

    PubMed

    Torkzaban, Bahareh; Kayvanjoo, Amir Hossein; Ardalan, Arman; Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.

  1. Can Machine Learning Methods Predict Extubation Outcome in Premature Infants as well as Clinicians?

    PubMed Central

    Mueller, Martina; Almeida, Jonas S.; Stanislaus, Romesh; Wagner, Carol L.

    2014-01-01

    Rationale Though treatment of the prematurely born infant breathing with assistance of a mechanical ventilator has much advanced in the past decades, predicting extubation outcome at a given point in time remains challenging. Numerous studies have been conducted to identify predictors for extubation outcome; however, the rate of infants failing extubation attempts has not declined. Objective To develop a decision-support tool for the prediction of extubation outcome in premature infants using a set of machine learning algorithms Methods A dataset assembled from 486 premature infants on mechanical ventilation was used to develop predictive models using machine learning algorithms such as artificial neural networks (ANN), support vector machine (SVM), naïve Bayesian classifier (NBC), boosted decision trees (BDT), and multivariable logistic regression (MLR). Performance of all models was evaluated using area under the curve (AUC). Results For some of the models (ANN, MLR and NBC) results were satisfactory (AUC: 0.63–0.76); however, two algorithms (SVM and BDT) showed poor performance with AUCs of ~0.5. Conclusion Clinician's predictions still outperform machine learning due to the complexity of the data and contextual information that may not be captured in clinical data used as input for the development of the machine learning algorithms. Inclusion of preprocessing steps in future studies may improve the performance of prediction models. PMID:25419493

  2. Acceleration of saddle-point searches with machine learning.

    PubMed

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community. PMID:27544086

  3. Recognition of printed Arabic text using machine learning

    NASA Astrophysics Data System (ADS)

    Amin, Adnan

    1998-04-01

    Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.

  4. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    PubMed

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  5. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    PubMed

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  6. Machine Learning: A Crucial Tool for Sensor Design

    PubMed Central

    Zhao, Weixiang; Bhushan, Abhinav; Santamaria, Anthony D.; Simon, Melinda G.; Davis, Cristina E.

    2009-01-01

    Sensors have been widely used for disease diagnosis, environmental quality monitoring, food quality control, industrial process analysis and control, and other related fields. As a key tool for sensor data analysis, machine learning is becoming a core part of novel sensor design. Dividing a complete machine learning process into three steps: data pre-treatment, feature extraction and dimension reduction, and system modeling, this paper provides a review of the methods that are widely used for each step. For each method, the principles and the key issues that affect modeling results are discussed. After reviewing the potential problems in machine learning processes, this paper gives a summary of current algorithms in this field and provides some feasible directions for future studies. PMID:20191110

  7. The Philosophy of Science and its relation to Machine Learning

    NASA Astrophysics Data System (ADS)

    Williamson, Jon

    In this chapter I discuss connections between machine learning and the philosophy of science. First I consider the relationship between the two disciplines. There is a clear analogy between hypothesis choice in science and model selection in machine learning. While this analogy has been invoked to argue that the two disciplines are essentially doing the same thing and should merge, I maintain that the disciplines are distinct but related and that there is a dynamic interaction operating between the two: a series of mutually beneficial interactions that changes over time. I will introduce some particularly fruitful interactions, in particular the consequences of automated scientific discovery for the debate on inductivism versus falsificationism in the philosophy of science, and the importance of philosophical work on Bayesian epistemology and causality for contemporary machine learning. I will close by suggesting the locus of a possible future interaction: evidence integration.

  8. Predicting Market Impact Costs Using Nonparametric Machine Learning Models

    PubMed Central

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  9. Machine Learning Search for Gamma-Ray Burst Afterglows in Optical Images

    NASA Astrophysics Data System (ADS)

    Topinka, M.

    2016-06-01

    Thanks to the advances in robotic telescopes, time domain astronomy leads to a large number of transient events detected in images every night. Data mining and machine learning tools used for object classification are presented. The goal is to automatically classify transient events for both further follow-up by a larger telescope and for statistical studies of transient events. Special attention is given to the identification of gamma-ray burst afterglows. Machine learning techniques are used to identify GROND gamma-ray burst afterglow among the astrophysical objects present in the SDSS archival images based on the g'-r', r'-i' and i'-z' color indices. The performance of the support vector machine, random forest and neural network algorithms is compared. A joint meta-classifier, built on top of the individual classifiers, can identify GRB afterglows with the overall accuracy of ≳ 90%.

  10. The cerebellum: a neuronal learning machine?

    NASA Technical Reports Server (NTRS)

    Raymond, J. L.; Lisberger, S. G.; Mauk, M. D.

    1996-01-01

    Comparison of two seemingly quite different behaviors yields a surprisingly consistent picture of the role of the cerebellum in motor learning. Behavioral and physiological data about classical conditioning of the eyelid response and motor learning in the vestibulo-ocular reflex suggests that (i) plasticity is distributed between the cerebellar cortex and the deep cerebellar nuclei; (ii) the cerebellar cortex plays a special role in learning the timing of movement; and (iii) the cerebellar cortex guides learning in the deep nuclei, which may allow learning to be transferred from the cortex to the deep nuclei. Because many of the similarities in the data from the two systems typify general features of cerebellar organization, the cerebellar mechanisms of learning in these two systems may represent principles that apply to many motor systems.

  11. A machine learning approach to quantifying noise in medical images

    NASA Astrophysics Data System (ADS)

    Chowdhury, Aritra; Sevinsky, Christopher J.; Yener, Bülent; Aggour, Kareem S.; Gustafson, Steven M.

    2016-03-01

    As advances in medical imaging technology are resulting in significant growth of biomedical image data, new techniques are needed to automate the process of identifying images of low quality. Automation is needed because it is very time consuming for a domain expert such as a medical practitioner or a biologist to manually separate good images from bad ones. While there are plenty of de-noising algorithms in the literature, their focus is on designing filters which are necessary but not sufficient for determining how useful an image is to a domain expert. Thus a computational tool is needed to assign a score to each image based on its perceived quality. In this paper, we introduce a machine learning-based score and call it the Quality of Image (QoI) score. The QoI score is computed by combining the confidence values of two popular classification techniques—support vector machines (SVMs) and Naïve Bayes classifiers. We test our technique on clinical image data obtained from cancerous tissue samples. We used 747 tissue samples that are stained by four different markers (abbreviated as CK15, pck26, E_cad and Vimentin) leading to a total of 2,988 images. The results show that images can be classified as good (high QoI), bad (low QoI) or ugly (intermediate QoI) based on their QoI scores. Our automated labeling is in agreement with the domain experts with a bi-modal classification accuracy of 94%, on average. Furthermore, ugly images can be recovered and forwarded for further post-processing.

  12. Task 8.6 -- Advanced man machine interface (MMI)

    SciTech Connect

    1997-12-31

    The Solar/DOE ATS engine program seeks to improve the utilization of turbomachinery resources through the development of an Advanced Man Machine Interface (MMI). The program goals include timely and succinct feedback to the operations personnel to enhance their decision making process. As part of the Solar ATS Phase 2 technology development program, enabling technologies, including graphics environments, communications technology, and operating systems were explored to determine their viability to support the overall MMI requirements. This report discusses the research and prototyping effort, as well as the conclusions reached.

  13. Energy landscapes for a machine learning application to series data

    NASA Astrophysics Data System (ADS)

    Ballard, Andrew J.; Stevenson, Jacob D.; Das, Ritankar; Wales, David J.

    2016-03-01

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in terms of distributions of local minima and their properties.

  14. RECONCILE: a machine-learning coreference resolution system

    2007-12-10

    RECONCILE is a noun phrase conference resolution system: it identifies noun phrases in a text document and determines which subsets refer to each real world entity referenced in the text. The heart of the system is a combination of supervised and unsupervised machine learning systems. It uses a machine learning algorithm (chosen from an extensive suite, including Weka) for training noun phrase coreference classifier models and implements a variety of clustering algorithms to coordinate themore » pairwise classifications. A number of features have been implemented, including all of the features employed in Ng & Cardie [2002].« less

  15. Energy landscapes for a machine learning application to series data.

    PubMed

    Ballard, Andrew J; Stevenson, Jacob D; Das, Ritankar; Wales, David J

    2016-03-28

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in terms of distributions of local minima and their properties.

  16. 3D Visualization of Machine Learning Algorithms with Astronomical Data

    NASA Astrophysics Data System (ADS)

    Kent, Brian R.

    2016-01-01

    We present innovative machine learning (ML) methods using unsupervised clustering with minimum spanning trees (MSTs) to study 3D astronomical catalogs. Utilizing Python code to build trees based on galaxy catalogs, we can render the results with the visualization suite Blender to produce interactive 360 degree panoramic videos. The catalogs and their ML results can be explored in a 3D space using mobile devices, tablets or desktop browsers. We compare the statistics of the MST results to a number of machine learning methods relating to optimization and efficiency.

  17. Molecular learning with DNA kernel machines.

    PubMed

    Noh, Yung-Kyun; Lee, Daniel D; Yang, Kyung-Ae; Kim, Cheongtag; Zhang, Byoung-Tak

    2015-11-01

    We present a computational learning method for bio-molecular classification. This method shows how to design biochemical operations both for learning and pattern classification. As opposed to prior work, our molecular algorithm learns generic classes considering the realization in vitro via a sequence of molecular biological operations on sets of DNA examples. Specifically, hybridization between DNA molecules is interpreted as computing the inner product between embedded vectors in a corresponding vector space, and our algorithm performs learning of a binary classifier in this vector space. We analyze the thermodynamic behavior of these learning algorithms, and show simulations on artificial and real datasets as well as demonstrate preliminary wet experimental results using gel electrophoresis.

  18. Learning Activity Packets for Grinding Machines. Unit I--Grinding Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the first unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  19. Refining fuzzy logic controllers with machine learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1994-01-01

    In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.

  20. Supervised machine learning algorithms to diagnose stress for vehicle drivers based on physiological sensor signals.

    PubMed

    Barua, Shaibal; Begum, Shahina; Ahmed, Mobyen Uddin

    2015-01-01

    Machine learning algorithms play an important role in computer science research. Recent advancement in sensor data collection in clinical sciences lead to a complex, heterogeneous data processing, and analysis for patient diagnosis and prognosis. Diagnosis and treatment of patients based on manual analysis of these sensor data are difficult and time consuming. Therefore, development of Knowledge-based systems to support clinicians in decision-making is important. However, it is necessary to perform experimental work to compare performances of different machine learning methods to help to select appropriate method for a specific characteristic of data sets. This paper compares classification performance of three popular machine learning methods i.e., case-based reasoning, neutral networks and support vector machine to diagnose stress of vehicle drivers using finger temperature and heart rate variability. The experimental results show that case-based reasoning outperforms other two methods in terms of classification accuracy. Case-based reasoning has achieved 80% and 86% accuracy to classify stress using finger temperature and heart rate variability. On contrary, both neural network and support vector machine have achieved less than 80% accuracy by using both physiological signals.

  1. Outsmarting neural networks: an alternative paradigm for machine learning

    SciTech Connect

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  2. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution.

    PubMed

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis.

  3. Probability and Statistics in Astronomical Machine Learning and Data Minin

    NASA Astrophysics Data System (ADS)

    Scargle, Jeffrey

    2012-03-01

    Statistical issues peculiar to astronomy have implications for machine learning and data mining. It should be obvious that statistics lies at the heart of machine learning and data mining. Further it should be no surprise that the passive observational nature of astronomy, the concomitant lack of sampling control, and the uniqueness of its realm (the whole universe!) lead to some special statistical issues and problems. As described in the Introduction to this volume, data analysis technology is largely keeping up with major advances in astrophysics and cosmology, even driving many of them. And I realize that there are many scientists with good statistical knowledge and instincts, especially in the modern era I like to call the Age of Digital Astronomy. Nevertheless, old impediments still lurk, and the aim of this chapter is to elucidate some of them. Many experiences with smart people doing not-so-smart things (cf. the anecdotes collected in the Appendix here) have convinced me that the cautions given here need to be emphasized. Consider these four points: 1. Data analysis often involves searches of many cases, for example, outcomes of a repeated experiment, for a feature of the data. 2. The feature comprising the goal of such searches may not be defined unambiguously until the search is carried out, or perhaps vaguely even then. 3. The human visual system is very good at recognizing patterns in noisy contexts. 4. People are much easier to convince of something they want to believe, or already believe, as opposed to unpleasant or surprising facts. One can argue that all four are good things during the initial, exploratory phases of most data analysis. They represent the curiosity and creativity of the scientific process, especially during the exploration of data collections from new observational programs such as all-sky surveys in wavelengths not accessed before or sets of images of a planetary surface not yet explored. On the other hand, confirmatory scientific

  4. Predicting single-molecule conductance through machine learning

    NASA Astrophysics Data System (ADS)

    Lanzillo, Nicholas A.; Breneman, Curt M.

    2016-10-01

    We present a robust machine learning model that is trained on the experimentally determined electrical conductance values of approximately 120 single-molecule junctions used in scanning tunnelling microscope molecular break junction (STM-MBJ) experiments. Quantum mechanical, chemical, and topological descriptors are used to correlate each molecular structure with a conductance value, and the resulting machine-learning model can predict the corresponding value of conductance with correlation coefficients of r 2 = 0.95 for the training set and r 2 = 0.78 for a blind testing set. While neglecting entirely the effects of the metal contacts, this work demonstrates that single molecule conductance can be qualitatively correlated with a number of molecular descriptors through a suitably trained machine learning model. The dominant features in the machine learning model include those based on the electronic wavefunction, the geometry/topology of the molecule as well as the surface chemistry of the molecule. This model can be used to identify promising molecular structures for use in single-molecule electronic circuits and can guide synthesis and experiments in the future.

  5. Acquiring Software Design Schemas: A Machine Learning Perspective

    NASA Technical Reports Server (NTRS)

    Harandi, Mehdi T.; Lee, Hing-Yan

    1991-01-01

    In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.

  6. Machine learning of fault characteristics from rocket engine simulation data

    NASA Technical Reports Server (NTRS)

    Ke, Min; Ali, Moonis

    1990-01-01

    Transformation of data into knowledge through conceptual induction has been the focus of our research described in this paper. We have developed a Machine Learning System (MLS) to analyze the rocket engine simulation data. MLS can provide to its users fault analysis, characteristics, and conceptual descriptions of faults, and the relationships of attributes and sensors. All the results are critically important in identifying faults.

  7. An efficient learning procedure for deep Boltzmann machines.

    PubMed

    Salakhutdinov, Ruslan; Hinton, Geoffrey

    2012-08-01

    We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer pretraining phase that initializes the weights sensibly. The pretraining also allows the variational inference to be initialized sensibly with a single bottom-up pass. We present results on the MNIST and NORB data sets showing that deep Boltzmann machines learn very good generative models of handwritten digits and 3D objects. We also show that the features discovered by deep Boltzmann machines are a very effective way to initialize the hidden layers of feedforward neural nets, which are then discriminatively fine-tuned.

  8. Testing and Validating Machine Learning Classifiers by Metamorphic Testing.

    PubMed

    Xie, Xiaoyuan; Ho, Joshua W K; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-04-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no "test oracle" to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique "metamorphic testing", which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program.

  9. Machine learning techniques for fault isolation and sensor placement

    NASA Technical Reports Server (NTRS)

    Carnes, James R.; Fisher, Douglas H.

    1993-01-01

    Fault isolation and sensor placement are vital for monitoring and diagnosis. A sensor conveys information about a system's state that guides troubleshooting if problems arise. We are using machine learning methods to uncover behavioral patterns over snapshots of system simulations that will aid fault isolation and sensor placement, with an eye towards minimality, fault coverage, and noise tolerance.

  10. Machine Learning Through Signature Trees. Applications to Human Speech.

    ERIC Educational Resources Information Center

    White, George M.

    A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…

  11. AstroML: Python-powered Machine Learning for Astronomy

    NASA Astrophysics Data System (ADS)

    Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.

    2014-01-01

    As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.

  12. Health Informatics via Machine Learning for the Clinical Management of Patients

    PubMed Central

    Niehaus, K. E.; Charlton, P.; Colopy, G. W.

    2015-01-01

    Summary Objectives To review how health informatics systems based on machine learning methods have impacted the clinical management of patients, by affecting clinical practice. Methods We reviewed literature from 2010-2015 from databases such as Pubmed, IEEE xplore, and INSPEC, in which methods based on machine learning are likely to be reported. We bring together a broad body of literature, aiming to identify those leading examples of health informatics that have advanced the methodology of machine learning. While individual methods may have further examples that might be added, we have chosen some of the most representative, informative exemplars in each case. Results Our survey highlights that, while much research is taking place in this high-profile field, examples of those that affect the clinical management of patients are seldom found. We show that substantial progress is being made in terms of methodology, often by data scientists working in close collaboration with clinical groups. Conclusions Health informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale. PMID:26293849

  13. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines.

    PubMed

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  14. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines

    PubMed Central

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  15. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines.

    PubMed

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-03-26

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures.

  16. A 128-Channel Extreme Learning Machine-Based Neural Decoder for Brain Machine Interfaces.

    PubMed

    Chen, Yi; Yao, Enyi; Basu, Arindam

    2016-06-01

    Currently, state-of-the-art motor intention decoding algorithms in brain-machine interfaces are mostly implemented on a PC and consume significant amount of power. A machine learning coprocessor in 0.35- μm CMOS for the motor intention decoding in the brain-machine interfaces is presented in this paper. Using Extreme Learning Machine algorithm and low-power analog processing, it achieves an energy efficiency of 3.45 pJ/MAC at a classification rate of 50 Hz. The learning in second stage and corresponding digitally stored coefficients are used to increase robustness of the core analog processor. The chip is verified with neural data recorded in monkey finger movements experiment, achieving a decoding accuracy of 99.3% for movement type. The same coprocessor is also used to decode time of movement from asynchronous neural spikes. With time-delayed feature dimension enhancement, the classification accuracy can be increased by 5% with limited number of input channels. Further, a sparsity promoting training scheme enables reduction of number of programmable weights by ≈ 2X.

  17. The immune system, adaptation, and machine learning

    NASA Astrophysics Data System (ADS)

    Farmer, J. Doyne; Packard, Norman H.; Perelson, Alan S.

    1986-10-01

    The immune system is capable of learning, memory, and pattern recognition. By employing genetic operators on a time scale fast enough to observe experimentally, the immune system is able to recognize novel shapes without preprogramming. Here we describe a dynamical model for the immune system that is based on the network hypothesis of Jerne, and is simple enough to simulate on a computer. This model has a strong similarity to an approach to learning and artificial intelligence introduced by Holland, called the classifier system. We demonstrate that simple versions of the classifier system can be cast as a nonlinear dynamical system, and explore the analogy between the immune and classifier systems in detail. Through this comparison we hope to gain insight into the way they perform specific tasks, and to suggest new approaches that might be of value in learning systems.

  18. Efficiently Ranking Hyphotheses in Machine Learning

    NASA Technical Reports Server (NTRS)

    Chien, Steve

    1997-01-01

    This paper considers the problem of learning the ranking of a set of alternatives based upon incomplete information (e.g. a limited number of observations). At each decision cycle, the system can output a complete ordering on the hypotheses or decide to gather additional information (e.g. observation) at some cost.

  19. Protein secondary structure prediction using logic-based machine learning.

    PubMed

    Muggleton, S; King, R D; Sternberg, M J

    1992-10-01

    Many attempts have been made to solve the problem of predicting protein secondary structure from the primary sequence but the best performance results are still disappointing. In this paper, the use of a machine learning algorithm which allows relational descriptions is shown to lead to improved performance. The Inductive Logic Programming computer program, Golem, was applied to learning secondary structure prediction rules for alpha/alpha domain type proteins. The input to the program consisted of 12 non-homologous proteins (1612 residues) of known structure, together with a background knowledge describing the chemical and physical properties of the residues. Golem learned a small set of rules that predict which residues are part of the alpha-helices--based on their positional relationships and chemical and physical properties. The rules were tested on four independent non-homologous proteins (416 residues) giving an accuracy of 81% (+/- 2%). This is an improvement, on identical data, over the previously reported result of 73% by King and Sternberg (1990, J. Mol. Biol., 216, 441-457) using the machine learning program PROMIS, and of 72% using the standard Garnier-Osguthorpe-Robson method. The best previously reported result in the literature for the alpha/alpha domain type is 76%, achieved using a neural net approach. Machine learning also has the advantage over neural network and statistical methods in producing more understandable results. PMID:1480619

  20. Stacking for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-08-01

    We present an analysis of a general machine learning technique called `stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We show how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organizing maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9 per cent and 21 per cent on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4 per cent and 2.5 per cent for the explored metrics and comes at almost no additional computational cost.

  1. Research on knowledge representation, machine learning, and knowledge acquisition

    NASA Technical Reports Server (NTRS)

    Buchanan, Bruce G.

    1987-01-01

    Research in knowledge representation, machine learning, and knowledge acquisition performed at Knowledge Systems Lab. is summarized. The major goal of the research was to develop flexible, effective methods for representing the qualitative knowledge necessary for solving large problems that require symbolic reasoning as well as numerical computation. The research focused on integrating different representation methods to describe different kinds of knowledge more effectively than any one method can alone. In particular, emphasis was placed on representing and using spatial information about three dimensional objects and constraints on the arrangement of these objects in space. Another major theme is the development of robust machine learning programs that can be integrated with a variety of intelligent systems. To achieve this goal, learning methods were designed, implemented and experimented within several different problem solving environments.

  2. Combining data mining and machine learning for effective user profiling

    SciTech Connect

    Fawcett, T.; Provost, F.

    1996-12-31

    This paper describes the automatic design of methods for detecting fraudulent behavior. Much of the design is accomplished using a series of machine learning methods. In particular, we combine data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior. Specifically, we use a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls. These indicators are used to create profilers, which then serve as features to a system that combines evidence from multiple profilers to generate high-confidence alarms. Experiments indicate that this automatic approach performs nearly as well as the best hand-tuned methods for detecting fraud.

  3. Machine Learning for Outcome Prediction of Acute Ischemic Stroke Post Intra-Arterial Therapy

    PubMed Central

    Asadi, Hamed; Dowling, Richard; Yan, Bernard; Mitchell, Peter

    2014-01-01

    Introduction Stroke is a major cause of death and disability. Accurately predicting stroke outcome from a set of predictive variables may identify high-risk patients and guide treatment approaches, leading to decreased morbidity. Logistic regression models allow for the identification and validation of predictive variables. However, advanced machine learning algorithms offer an alternative, in particular, for large-scale multi-institutional data, with the advantage of easily incorporating newly available data to improve prediction performance. Our aim was to design and compare different machine learning methods, capable of predicting the outcome of endovascular intervention in acute anterior circulation ischaemic stroke. Method We conducted a retrospective study of a prospectively collected database of acute ischaemic stroke treated by endovascular intervention. Using SPSS®, MATLAB®, and Rapidminer®, classical statistics as well as artificial neural network and support vector algorithms were applied to design a supervised machine capable of classifying these predictors into potential good and poor outcomes. These algorithms were trained, validated and tested using randomly divided data. Results We included 107 consecutive acute anterior circulation ischaemic stroke patients treated by endovascular technique. Sixty-six were male and the mean age of 65.3. All the available demographic, procedural and clinical factors were included into the models. The final confusion matrix of the neural network, demonstrated an overall congruency of ∼80% between the target and output classes, with favourable receiving operative characteristics. However, after optimisation, the support vector machine had a relatively better performance, with a root mean squared error of 2.064 (SD: ±0.408). Discussion We showed promising accuracy of outcome prediction, using supervised machine learning algorithms, with potential for incorporation of larger multicenter datasets, likely further

  4. Machine learning bandgaps of double perovskites

    NASA Astrophysics Data System (ADS)

    Pilania, Ghanshyam; Mannodi-Kanakkithodi, Arun; Uberuaga, Blas; Ramprasad, Rampi; Gubernatis, James; Lookman, Turab

    The ability to make rapid and accurate predictions of bandgaps for double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps for double perovskites. After evaluating a set of nearly 1.2 million features, we identify several elemental features of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science (on a dataset of more than 1300 double perovskite bandgaps) and further analyzed to rationalize their prediction performance. Los Alamos National Laboratory LDRD program and the U.S. Department of Energy, Office of Science, Basic Energy Sciences.

  5. Machine learning bandgaps of double perovskites

    NASA Astrophysics Data System (ADS)

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-01

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance.

  6. Machine learning bandgaps of double perovskites

    DOE PAGES

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-19

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the mostmore » crucial and relevant predictors. As a result, the developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance.« less

  7. Machine learning bandgaps of double perovskites

    PubMed Central

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-01

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance. PMID:26783247

  8. Intracortical Brain-Machine Interfaces Advance Sensorimotor Neuroscience.

    PubMed

    Schroeder, Karen E; Chestek, Cynthia A

    2016-01-01

    Brain-machine interfaces (BMIs) decode brain activity to control external devices. Over the past two decades, the BMI community has grown tremendously and reached some impressive milestones, including the first human clinical trials using chronically implanted intracortical electrodes. It has also contributed experimental paradigms and important findings to basic neuroscience. In this review, we discuss neuroscience achievements stemming from BMI research, specifically that based upon upper limb prosthetic control with intracortical microelectrodes. We will focus on three main areas: first, we discuss progress in neural coding of reaches in motor cortex, describing recent results linking high dimensional representations of cortical activity to muscle activation. Next, we describe recent findings on learning and plasticity in motor cortex on various time scales. Finally, we discuss how bidirectional BMIs have led to better understanding of somatosensation in and related to motor cortex. PMID:27445663

  9. Intracortical Brain-Machine Interfaces Advance Sensorimotor Neuroscience

    PubMed Central

    Schroeder, Karen E.; Chestek, Cynthia A.

    2016-01-01

    Brain-machine interfaces (BMIs) decode brain activity to control external devices. Over the past two decades, the BMI community has grown tremendously and reached some impressive milestones, including the first human clinical trials using chronically implanted intracortical electrodes. It has also contributed experimental paradigms and important findings to basic neuroscience. In this review, we discuss neuroscience achievements stemming from BMI research, specifically that based upon upper limb prosthetic control with intracortical microelectrodes. We will focus on three main areas: first, we discuss progress in neural coding of reaches in motor cortex, describing recent results linking high dimensional representations of cortical activity to muscle activation. Next, we describe recent findings on learning and plasticity in motor cortex on various time scales. Finally, we discuss how bidirectional BMIs have led to better understanding of somatosensation in and related to motor cortex. PMID:27445663

  10. Recent Advances in Technologies Required for a ``Salad Machine''

    NASA Astrophysics Data System (ADS)

    Kliss, M.; Heyenga, A. G.; Hoehn, A.; Stodieck, L. S.

    Future long duration, manned space flight missions will require life support systems that minimize resupply requirements and ultimately approach self-sufficiency in space. Bioregenerative life support systems are a promising approach, but they are far from mature. Early in the development of the NASA Controlled Ecological Life Support System Program, the idea of onboard cultivation of salad-type vegetables for crew consumption was proposed as a first step away from the total reliance on resupply for food in space. Since that time, significant advances in space-based plant growth hardware have occurred, and considerable flight experience has been gained. This paper revisits the ``Salad Machine'' concept and describes recent developments in subsystem technologies for both plant root and shoot environments that are directly relevant to the development of such a facility

  11. Recent advances in technologies required for a "Salad Machine".

    PubMed

    Kliss, M; Heyenga, A G; Hoehn, A; Stodieck, L S

    2000-01-01

    Future long duration, manned space flight missions will require life support systems that minimize resupply requirements and ultimately approach self-sufficiency in space. Bioregenerative life support systems are a promising approach, but they are far from mature. Early in the development of the NASA Controlled Ecological Life Support System Program, the idea of onboard cultivation of salad-type vegetables for crew consumption was proposed as a first step away from the total reliance on resupply for food in space. Since that time, significant advances in space-based plant growth hardware have occurred, and considerable flight experience has been gained. This paper revisits the "Salad Machine" concept and describes recent developments in subsystem technologies for both plant root and shoot environments that are directly relevant to the development of such a facility.

  12. Learning machines applied to potential forest distribution.

    PubMed

    Ordóñez, Celestino; Taboada, Javier; Bastante, Fernando; Matías, Jose María; Felicísimo, Angel Manuel

    2005-01-01

    The clearing of forests to obtain land for pasture and agriculture and the replacement of autochthonous species by other faster-growing varieties of trees for timber have both led to the loss of vast areas of forest worldwide. At present, many developed countries are attempting to reverse these effects, establishing policies for the restoration of older woodland systems. Reforestation is a complex matter, planned and carried out by experts who need objective information regarding the type of forest that can be sustained in each area. This information is obtained by drawing up feasibility models constructed using statistical methods that make use of the information provided by morphological and environmental variables (height, gradient, rainfall, etc.) that partially condition the presence or absence of a specific kind of forestation in an area. The aim of this work is to construct a set of feasibility models for woodland located in the basin of the River Liébana (NW Spain), to serve as a support tool for the experts entrusted with carrying out the reforestation project. The techniques used are multilayer perceptron neural networks and support vector machines. Their results will be compared to the results obtained by traditional techniques (such as discriminant analysis and logistic regression) by measuring the degree of fit between each model and the existing distribution of woodlands. The interpretation and problems of the feasibility models are commented on in the Discussion section. PMID:15984068

  13. Advanced coordinate measuring machine at Sandia National Laboratories/California

    SciTech Connect

    Pilkey, R.D.; Klevgard, P.A.

    1993-03-01

    Sandia National Laboratories/California has acquired a new Moore M-48V CNC five-axis universal coordinate measuring machine (CMM). Site preparation, acceptance testing, and initial performance results are discussed. Unique features of the machine include a ceramic ram and vacuum evacuated laser pathways (VELPS). The implementation of a VELPS system on the machine imposed certain design requirements and entailed certain start-up problems. The machine`s projected capabilities, workload, and research possibilities are outlined.

  14. Machine learning approach to optimizing combined stimulation and medication therapies for Parkinson’s disease

    PubMed Central

    Shamir, Reuben R.; Dolber, Trygve; Noecker, Angela M.; Walter, Benjamin L.; McIntyre, Cameron C.

    2015-01-01

    Background Deep brain stimulation (DBS) of the subthalamic region is an established therapy for advanced Parkinson’s disease (PD). However, patients often require time-intensive postoperative management to balance their coupled stimulation and medication treatments. Given the large and complex parameter space associated with this task, we propose that clinical decision support systems (CDSS) based on machine learning algorithms could assist in treatment optimization. Objective Develop a proof-of-concept implementation of a CDSS that incorporates patient-specific details on both stimulation and medication. Methods Clinical data from 10 patients, and 89 post-DBS surgery visits, were used to create a prototype CDSS. The system was designed to provide three key functions: 1) information retrieval; 2) visualization of treatment, and; 3) recommendation on expected effective stimulation and drug dosages, based on three machine learning methods that included support vector machines, Naïve Bayes, and random forest. Results Measures of medication dosages, time factors, and symptom-specific preoperative response to levodopa were significantly correlated with postoperative outcomes (p<0.05) and their effect on outcomes was of similar magnitude to that of DBS. Using those results, the combined machine learning algorithms were able to accurately predict 86% (12/14) of the motor improvement scores at one year after surgery. Conclusions Using patient-specific details, an appropriately parameterized CDSS could help select theoretically optimal DBS parameter settings and medication dosages that have potential to improve the clinical management of PD patients. PMID:26140956

  15. Learning by Design: Good Video Games as Learning Machines

    ERIC Educational Resources Information Center

    Gee, James Paul

    2005-01-01

    This article asks how good video and computer game designers manage to get new players to learn long, complex and difficult games. The short answer is that designers of good games have hit on excellent methods for getting people to learn and to enjoy learning. The longer answer is more complex. Integral to this answer are the good principles of…

  16. Machine learning strategies for systems with invariance properties

    DOE PAGES

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    2016-05-06

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neuralmore » networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.« less

  17. Machine learning strategies for systems with invariance properties

    NASA Astrophysics Data System (ADS)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  18. Developing a PLC-friendly state machine model: lessons learned

    NASA Astrophysics Data System (ADS)

    Pessemier, Wim; Deconinck, Geert; Raskin, Gert; Saey, Philippe; Van Winckel, Hans

    2014-07-01

    Modern Programmable Logic Controllers (PLCs) have become an attractive platform for controlling real-time aspects of astronomical telescopes and instruments due to their increased versatility, performance and standardization. Likewise, vendor-neutral middleware technologies such as OPC Unified Architecture (OPC UA) have recently demonstrated that they can greatly facilitate the integration of these industrial platforms into the overall control system. Many practical questions arise, however, when building multi-tiered control systems that consist of PLCs for low level control, and conventional software and platforms for higher level control. How should the PLC software be structured, so that it can rely on well-known programming paradigms on the one hand, and be mapped to a well-organized OPC UA interface on the other hand? Which programming languages of the IEC 61131-3 standard closely match the problem domains of the abstraction levels within this structure? How can the recent additions to the standard (such as the support for namespaces and object-oriented extensions) facilitate a model based development approach? To what degree can our applications already take advantage of the more advanced parts of the OPC UA standard, such as the high expressiveness of the semantic modeling language that it defines, or the support for events, aggregation of data, automatic discovery, ... ? What are the timing and concurrency problems to be expected for the higher level tiers of the control system due to the cyclic execution of control and communication tasks by the PLCs? We try to answer these questions by demonstrating a semantic state machine model that can readily be implemented using IEC 61131 and OPC UA. One that does not aim to capture all possible states of a system, but rather one that attempts to organize the course-grained structure and behaviour of a system. In this paper we focus on the intricacies of this seemingly simple task, and on the lessons that we

  19. Advanced Training Technologies and Learning Environments

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K. (Compiler); Malone, John B. (Compiler)

    1999-01-01

    This document contains the proceedings of the Workshop on Advanced Training Technologies and Learning Environments held at NASA Langley Research Center, Hampton, Virginia, March 9-10, 1999. The workshop was jointly sponsored by the University of Virginia's Center for Advanced Computational Technology and NASA. Workshop attendees were from NASA, other government agencies, industry, and universities. The objective of the workshop was to assess the status and effectiveness of different advanced training technologies and learning environments.

  20. Multivariate Mapping of Environmental Data Using Extreme Learning Machines

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2014-05-01

    In most real cases environmental data are multivariate, highly variable at several spatio-temporal scales, and are generated by nonlinear and complex phenomena. Mapping - spatial predictions of such data, is a challenging problem. Machine learning algorithms, being universal nonlinear tools, have demonstrated their efficiency in modelling of environmental spatial and space-time data (Kanevski et al. 2009). Recently, a new approach in machine learning - Extreme Learning Machine (ELM), has gained a great popularity. ELM is a fast and powerful approach being a part of the machine learning algorithm category. Developed by G.-B. Huang et al. (2006), it follows the structure of a multilayer perceptron (MLP) with one single-hidden layer feedforward neural networks (SLFNs). The learning step of classical artificial neural networks, like MLP, deals with the optimization of weights and biases by using gradient-based learning algorithm (e.g. back-propagation algorithm). Opposed to this optimization phase, which can fall into local minima, ELM generates randomly the weights between the input layer and the hidden layer and also the biases in the hidden layer. By this initialization, it optimizes just the weight vector between the hidden layer and the output layer in a single way. The main advantage of this algorithm is the speed of the learning step. In a theoretical context and by growing the number of hidden nodes, the algorithm can learn any set of training data with zero error. To avoid overfitting, cross-validation method or "true validation" (by randomly splitting data into training, validation and testing subsets) are recommended in order to find an optimal number of neurons. With its universal property and solid theoretical basis, ELM is a good machine learning algorithm which can push the field forward. The present research deals with an extension of ELM to multivariate output modelling and application of ELM to the real data case study - pollution of the sediments in

  1. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  2. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework. PMID:25807571

  3. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines.

    PubMed

    Neftci, Emre O; Pedroni, Bruno U; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware. PMID:27445650

  4. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines

    PubMed Central

    Neftci, Emre O.; Pedroni, Bruno U.; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware. PMID:27445650

  5. Icing detection from geostationary satellite data using machine learning approaches

    NASA Astrophysics Data System (ADS)

    Lee, J.; Ha, S.; Sim, S.; Im, J.

    2015-12-01

    Icing can cause a significant structural damage to aircraft during flight, resulting in various aviation accidents. Icing studies have been typically performed using two approaches: one is a numerical model-based approach and the other is a remote sensing-based approach. The model based approach diagnoses aircraft icing using numerical atmospheric parameters such as temperature, relative humidity, and vertical thermodynamic structure. This approach tends to over-estimate icing according to the literature. The remote sensing-based approach typically uses meteorological satellite/ground sensor data such as Geostationary Operational Environmental Satellite (GOES) and Dual-Polarization radar data. This approach detects icing areas by applying thresholds to parameters such as liquid water path and cloud optical thickness derived from remote sensing data. In this study, we propose an aircraft icing detection approach which optimizes thresholds for L1B bands and/or Cloud Optical Thickness (COT) from Communication, Ocean and Meteorological Satellite-Meteorological Imager (COMS MI) and newly launched Himawari-8 Advanced Himawari Imager (AHI) over East Asia. The proposed approach uses machine learning algorithms including decision trees (DT) and random forest (RF) for optimizing thresholds of L1B data and/or COT. Pilot Reports (PIREPs) from South Korea and Japan were used as icing reference data. Results show that RF produced a lower false alarm rate (1.5%) and a higher overall accuracy (98.8%) than DT (8.5% and 75.3%), respectively. The RF-based approach was also compared with the existing COMS MI and GOES-R icing mask algorithms. The agreements of the proposed approach with the existing two algorithms were 89.2% and 45.5%, respectively. The lower agreement with the GOES-R algorithm was possibly due to the high uncertainty of the cloud phase product from COMS MI.

  6. The ligand binding mechanism to purine nucleoside phosphorylase elucidated via molecular dynamics and machine learning

    PubMed Central

    Decherchi, Sergio; Berteotti, Anna; Bottegoni, Giovanni; Rocchia, Walter; Cavalli, Andrea

    2015-01-01

    The study of biomolecular interactions between a drug and its biological target is of paramount importance for the design of novel bioactive compounds. In this paper, we report on the use of molecular dynamics (MD) simulations and machine learning to study the binding mechanism of a transition state analogue (DADMe–immucillin-H) to the purine nucleoside phosphorylase (PNP) enzyme. Microsecond-long MD simulations allow us to observe several binding events, following different dynamical routes and reaching diverse binding configurations. These simulations are used to estimate kinetic and thermodynamic quantities, such as kon and binding free energy, obtaining a good agreement with available experimental data. In addition, we advance a hypothesis for the slow-onset inhibition mechanism of DADMe–immucillin-H against PNP. Combining extensive MD simulations with machine learning algorithms could therefore be a fruitful approach for capturing key aspects of drug–target recognition and binding. PMID:25625196

  7. Machine learning classification of SDSS transient survey images

    NASA Astrophysics Data System (ADS)

    du Buisson, L.; Sivanandam, N.; Bassett, Bruce A.; Smith, M.

    2015-12-01

    We show that multiple machine learning algorithms can match human performance in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS) supernova survey into real objects and artefacts. This is a first step in any transient science pipeline and is currently still done by humans, but future surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate fully machine-enabled solutions. Using features trained from eigenimage analysis (principal component analysis, PCA) of single-epoch g, r and i difference images, we can reach a completeness (recall) of 96 per cent, while only incorrectly classifying at most 18 per cent of artefacts as real objects, corresponding to a precision (purity) of 84 per cent. In general, random forests performed best, followed by the k-nearest neighbour and the SkyNet artificial neural net algorithms, compared to other methods such as naive Bayes and kernel support vector machine. Our results show that PCA-based machine learning can match human success levels and can naturally be extended by including multiple epochs of data, transient colours and host galaxy information which should allow for significant further improvements, especially at low signal-to-noise.

  8. Prototype Vector Machine for Large Scale Semi-Supervised Learning

    SciTech Connect

    Zhang, Kai; Kwok, James T.; Parvin, Bahram

    2009-04-29

    Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.

  9. Robust Extreme Learning Machine With its Application to Indoor Positioning.

    PubMed

    Lu, Xiaoxuan; Zou, Han; Zhou, Hongming; Xie, Lihua; Huang, Guang-Bin

    2016-01-01

    The increasing demands of location-based services have spurred the rapid development of indoor positioning system and indoor localization system interchangeably (IPSs). However, the performance of IPSs suffers from noisy measurements. In this paper, two kinds of robust extreme learning machines (RELMs), corresponding to the close-to-mean constraint, and the small-residual constraint, have been proposed to address the issue of noisy measurements in IPSs. Based on whether the feature mapping in extreme learning machine is explicit, we respectively provide random-hidden-nodes and kernelized formulations of RELMs by second order cone programming. Furthermore, the computation of the covariance in feature space is discussed. Simulations and real-world indoor localization experiments are extensively carried out and the results demonstrate that the proposed algorithms can not only improve the accuracy and repeatability, but also reduce the deviation and worst case error of IPSs compared with other baseline algorithms. PMID:26684258

  10. Prediction of Zeolite Framework Types by a Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Yang, Shujiang; Lach-Hab, Mohammed; Vaisman, Iosif; Blaisten-Barojas, Estela

    2009-03-01

    Zeolites are microporous crystalline materials with highly regular framework structures consisting of molecular-sized pores and channels. Characteristic framework types of zeolites are traditionally determined by the combined information of coordination sequences and vertex symbols. Here we present a machine learning model for classifying zeolite crystals according to their framework types. An eighteen-dimensional feature vector is defined including topological descriptors and physical/chemical properties of zeolite crystals [Microporous and Mesoporous Materials 117, 339 (2009)]. Trained with crystallographic data of known zeolites, the new model can predict the framework types of unknown zeolite crystals with up to 98 % accuracy. Compared with conventional methods, the machine learning model is more robust handling crystal disorder and/or crystal defects in a more effective manner. This model can be adapted for classifying and clustering other crystalline species.

  11. Survey of Machine Learning Methods for Database Security

    NASA Astrophysics Data System (ADS)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  12. Stochastic Local Interaction (SLI) model: Bridging machine learning and geostatistics

    NASA Astrophysics Data System (ADS)

    Hristopulos, Dionissios T.

    2015-12-01

    Machine learning and geostatistics are powerful mathematical frameworks for modeling spatial data. Both approaches, however, suffer from poor scaling of the required computational resources for large data applications. We present the Stochastic Local Interaction (SLI) model, which employs a local representation to improve computational efficiency. SLI combines geostatistics and machine learning with ideas from statistical physics and computational geometry. It is based on a joint probability density function defined by an energy functional which involves local interactions implemented by means of kernel functions with adaptive local kernel bandwidths. SLI is expressed in terms of an explicit, typically sparse, precision (inverse covariance) matrix. This representation leads to a semi-analytical expression for interpolation (prediction), which is valid in any number of dimensions and avoids the computationally costly covariance matrix inversion.

  13. Machine-learning-assisted materials discovery using failed experiments.

    PubMed

    Raccuglia, Paul; Elbert, Katherine C; Adler, Philip D F; Falk, Casey; Wenny, Malia B; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A; Schrier, Joshua; Norquist, Alexander J

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions--failed or unsuccessful hydrothermal syntheses--collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions

  14. Machine-learning-assisted materials discovery using failed experiments.

    PubMed

    Raccuglia, Paul; Elbert, Katherine C; Adler, Philip D F; Falk, Casey; Wenny, Malia B; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A; Schrier, Joshua; Norquist, Alexander J

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions--failed or unsuccessful hydrothermal syntheses--collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions

  15. Machine-learning-assisted materials discovery using failed experiments

    NASA Astrophysics Data System (ADS)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic–organic hybrid materials such as organically templated metal oxides, metal–organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure–property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully

  16. Machine-learning-assisted materials discovery using failed experiments

    NASA Astrophysics Data System (ADS)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted

  17. Smarter Instruments, Smarter Archives: Machine Learning for Tactical Science

    NASA Astrophysics Data System (ADS)

    Thompson, D. R.; Kiran, R.; Allwood, A.; Altinok, A.; Estlin, T.; Flannery, D.

    2014-12-01

    There has been a growing interest by Earth and Planetary Sciences in machine learning, visualization and cyberinfrastructure to interpret ever-increasing volumes of instrument data. Such tools are commonly used to analyze archival datasets, but they can also play a valuable real-time role during missions. Here we discuss ways that machine learning can benefit tactical science decisions during Earth and Planetary Exploration. Machine learning's potential begins at the instrument itself. Smart instruments endowed with pattern recognition can immediately recognize science features of interest. This allows robotic explorers to optimize their limited communications bandwidth, triaging science products and prioritizing the most relevant data. Smart instruments can also target their data collection on the fly, using principles of experimental design to reduce redundancy and generally improve sampling efficiency for time-limited operations. Moreover, smart instruments can respond immediately to transient or unexpected phenomena. Examples include detections of cometary plumes, terrestrial floods, or volcanism. We show recent examples of smart instruments from 2014 tests including: aircraft and spacecraft remote sensing instruments that recognize cloud contamination, field tests of a "smart camera" for robotic surface geology, and adaptive data collection by X-Ray fluorescence spectrometers. Machine learning can also assist human operators when tactical decision making is required. Terrestrial scenarios include airborne remote sensing, where the decision to re-fly a transect must be made immediately. Planetary scenarios include deep space encounters or planetary surface exploration, where the number of command cycles is limited and operators make rapid daily decisions about where next to collect measurements. Visualization and modeling can reveal trends, clusters, and outliers in new data. This can help operators recognize instrument artifacts or spot anomalies in real time

  18. AstroML: Machine learning and data mining in astronomy

    NASA Astrophysics Data System (ADS)

    VanderPlas, Jacob; Fouesneau, Morgan; Taylor, Julia

    2014-07-01

    Written in Python, AstroML is a library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets. An optional companion library, astroML_addons, is available; it requires a C compiler and contains faster and more efficient implementations of certain algorithms in compiled code.

  19. Galaxy Zoo: reproducing galaxy morphologies via machine learning

    NASA Astrophysics Data System (ADS)

    Banerji, Manda; Lahav, Ofer; Lintott, Chris J.; Abdalla, Filipe B.; Schawinski, Kevin; Bamford, Steven P.; Andreescu, Dan; Murray, Phil; Raddick, M. Jordan; Slosar, Anze; Szalay, Alex; Thomas, Daniel; Vandenberg, Jan

    2010-07-01

    We present morphological classifications obtained using machine learning for objects in the Sloan Digital Sky Survey DR6 that have been classified by Galaxy Zoo into three classes, namely early types, spirals and point sources/artefacts. An artificial neural network is trained on a subset of objects classified by the human eye, and we test whether the machine-learning algorithm can reproduce the human classifications for the rest of the sample. We find that the success of the neural network in matching the human classifications depends crucially on the set of input parameters chosen for the machine-learning algorithm. The colours and parameters associated with profile fitting are reasonable in separating the objects into three classes. However, these results are considerably improved when adding adaptive shape parameters as well as concentration and texture. The adaptive moments, concentration and texture parameters alone cannot distinguish between early type galaxies and the point sources/artefacts. Using a set of 12 parameters, the neural network is able to reproduce the human classifications to better than 90 per cent for all three morphological classes. We find that using a training set that is incomplete in magnitude does not degrade our results given our particular choice of the input parameters to the network. We conclude that it is promising to use machine-learning algorithms to perform morphological classification for the next generation of wide-field imaging surveys and that the Galaxy Zoo catalogue provides an invaluable training set for such purposes. This publication has been made possible by the participation of more than 100000 volunteers in the Galaxy Zoo project. Their contributions are individually acknowledged at http://www.galaxyzoo.org/Volunteers.aspx. E-mail: mbanerji@ast.cam.ac.uk ‡ Einstein Fellow.

  20. Machine Shop Suggested Job and Task Sheets. Part II. 21 Advanced Jobs.

    ERIC Educational Resources Information Center

    Texas A and M Univ., College Station. Vocational Instructional Services.

    This volume consists of advanced job and task sheets adaptable for use in the regular vocational industrial education programs for the training of machinists and machine shop operators. Twenty-one advanced machine shop job sheets are included. Some or all of this material is provided for each job: an introductory sheet with aim, checking…

  1. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  2. Classification of ABO3 perovskite solids: a machine learning study.

    PubMed

    Pilania, G; Balachandran, P V; Gubernatis, J E; Lookman, T

    2015-10-01

    We explored the use of machine learning methods for classifying whether a particular ABO3 chemistry forms a perovskite or non-perovskite structured solid. Starting with three sets of feature pairs (the tolerance and octahedral factors, the A and B ionic radii relative to the radius of O, and the bond valence distances between the A and B ions from the O atoms), we used machine learning to create a hyper-dimensional partial dependency structure plot using all three feature pairs or any two of them. Doing so increased the accuracy of our predictions by 2-3 percentage points over using any one pair. We also included the Mendeleev numbers of the A and B atoms to this set of feature pairs. Doing this and using the capabilities of our machine learning algorithm, the gradient tree boosting classifier, enabled us to generate a new type of structure plot that has the simplicity of one based on using just the Mendeleev numbers, but with the added advantages of having a higher accuracy and providing a measure of likelihood of the predicted structure.

  3. Experimental Investigation of Three Machine Learning Algorithms for ITS Dataset

    NASA Astrophysics Data System (ADS)

    Yearwood, J. L.; Kang, B. H.; Kelarev, A. V.

    The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.

  4. Mammogram retrieval through machine learning within BI-RADS standards.

    PubMed

    Wei, Chia-Hung; Li, Yue; Huang, Pai Jung

    2011-08-01

    A content-based mammogram retrieval system can support usual comparisons made on images by physicians, answering similarity queries over images stored in the database. The importance of searching for similar mammograms lies in the fact that physicians usually try to recall similar cases by seeking images that are pathologically similar to a given image. This paper presents a content-based mammogram retrieval system, which employs a query example to search for similar mammograms in the database. In this system the mammographic lesions are interpreted based on their medical characteristics specified in the Breast Imaging Reporting and Data System (BI-RADS) standards. A hierarchical similarity measurement scheme based on a distance weighting function is proposed to model user's perception and maximizes the effectiveness of each feature in a mammographic descriptor. A machine learning approach based on support vector machines and user's relevance feedback is also proposed to analyze the user's information need in order to retrieve target images more accurately. Experimental results demonstrate that the proposed machine learning approach with Radial Basis Function (RBF) kernel function achieves the best performance among all tested ones. Furthermore, the results also show that the proposed learning approach can improve retrieval performance when applied to retrieve mammograms with similar mass and calcification lesions, respectively. PMID:21277387

  5. Classifying black and white spruce pollen using layered machine learning.

    PubMed

    Punyasena, Surangi W; Tcheng, David K; Wesseln, Cassandra; Mueller, Pietra G

    2012-11-01

    Pollen is among the most ubiquitous of terrestrial fossils, preserving an extended record of vegetation change. However, this temporal continuity comes with a taxonomic tradeoff. Analytical methods that improve the taxonomic precision of pollen identifications would expand the research questions that could be addressed by pollen, in fields such as paleoecology, paleoclimatology, biostratigraphy, melissopalynology, and forensics. We developed a supervised, layered, instance-based machine-learning classification system that uses leave-one-out bias optimization and discriminates among small variations in pollen shape, size, and texture. We tested our system on black and white spruce, two paleoclimatically significant taxa in the North American Quaternary. We achieved > 93% grain-to-grain classification accuracies in a series of experiments with both fossil and reference material. More significantly, when applied to Quaternary samples, the learning system was able to replicate the count proportions of a human expert (R(2) = 0.78, P = 0.007), with one key difference - the machine achieved these ratios by including larger numbers of grains with low-confidence identifications. Our results demonstrate the capability of machine-learning systems to solve the most challenging palynological classification problem, the discrimination of congeneric species, extending the capabilities of the pollen analyst and improving the taxonomic resolution of the palynological record.

  6. Machine Learning for Flood Prediction in Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.

    2015-12-01

    With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.

  7. Security Aspects of Smart Cards vs. Embedded Security in Machine-to-Machine (M2M) Advanced Mobile Network Applications

    NASA Astrophysics Data System (ADS)

    Meyerstein, Mike; Cha, Inhyok; Shah, Yogendra

    The Third Generation Partnership Project (3GPP) standardisation group currently discusses advanced applications of mobile networks such as Machine-to-Machine (M2M) communication. Several security issues arise in these contexts which warrant a fresh look at mobile networks’ security foundations, resting on smart cards. This paper contributes a security/efficiency analysis to this discussion and highlights the role of trusted platform technology to approach these issues.

  8. Advanced coordinate measuring machine at Sandia National Laboratories

    NASA Astrophysics Data System (ADS)

    Pilkey, R. D.; Klevgard, P. A.

    1993-03-01

    Sandia National Laboratories/California has acquired a new Moore M-48V CNC five-axis universal coordinate measuring machine (CMM). Site preparation, acceptance testing, and initial performance results are discussed. Unique features of the machine include a ceramic ram and vacuum evacuated laser pathways (VELPS). The implementation of a VELPS system on the machine imposed certain design requirements and entailed certain start-up problems. The machine's projected capabilities, workload, and research possibilities are outlined.

  9. Advanced coordinate measuring machine at Sandia National Laboratories/California

    SciTech Connect

    Pilkey, R.D.; Klevgard, P.A.

    1993-03-01

    Sandia National Laboratories/California has acquired a new Moore M-48V CNC five-axis universal coordinate measuring machine (CMM). Site preparation, acceptance testing, and initial performance results are discussed. Unique features of the machine include a ceramic ram and vacuum evacuated laser pathways (VELPS). The implementation of a VELPS system on the machine imposed certain design requirements and entailed certain start-up problems. The machine's projected capabilities, workload, and research possibilities are outlined.

  10. Machine Learning Techniques in Optimal Design

    NASA Technical Reports Server (NTRS)

    Cerbone, Giuseppe

    1992-01-01

    to the problem, is then obtained by solving in parallel each of the sub-problems in the set and computing the one with the minimum cost. In addition to speeding up the optimization process, our use of learning methods also relieves the expert from the burden of identifying rules that exactly pinpoint optimal candidate sub-problems. In real engineering tasks it is usually too costly to the engineers to derive such rules. Therefore, this paper also contributes to a further step towards the solution of the knowledge acquisition bottleneck [Feigenbaum, 1977] which has somewhat impaired the construction of rulebased expert systems.

  11. Machine-z: rapid machine-learned redshift indicator for Swift gamma-ray bursts

    NASA Astrophysics Data System (ADS)

    Ukwatta, T. N.; Woźniak, P. R.; Gehrels, N.

    2016-06-01

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce `machine-z', a redshift prediction algorithm and a `high-z' classifier for Swift GRBs based on machine learning. Our method relies exclusively on canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ˜100 per cent recall. The most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.

  12. Some Principles of Learning and Learning with the Aid of Machines.

    ERIC Educational Resources Information Center

    Dolyatovskii, V. A.; Sotnikov, E. M.

    A translated Soviet document describes some theories of learning, and the practical problems of developing a teaching machine--as taught in an Industrial Electronics course (in the automation and telemechanics curriculum). The point is stressed that the growing number of students at institutions of higher learning in the Soviet Union, up forty…

  13. Remotely sensed data assimilation technique to develop machine learning models for use in water management

    NASA Astrophysics Data System (ADS)

    Zaman, Bushra

    Increasing population and water conflicts are making water management one of the most important issues of the present world. It has become absolutely necessary to find ways to manage water more efficiently. Technological advancement has introduced various techniques for data acquisition and analysis, and these tools can be used to address some of the critical issues that challenge water resource management. This research used learning machine techniques and information acquired through remote sensing, to solve problems related to soil moisture estimation and crop identification on large spatial scales. In this dissertation, solutions were proposed in three problem areas that can be important in the decision making process related to water management in irrigated systems. A data assimilation technique was used to build a learning machine model that generated soil moisture estimates commensurate with the scale of the data. The research was taken further by developing a multivariate machine learning algorithm to predict root zone soil moisture both in space and time. Further, a model was developed for supervised classification of multi-spectral reflectance data using a multi-class machine learning algorithm. The procedure was designed for classifying crops but the model is data dependent and can be used with other datasets and hence can be applied to other landcover classification problems. The dissertation compared the performance of relevance vector and the support vector machines in estimating soil moisture. A multivariate relevance vector machine algorithm was tested in the spatio-temporal prediction of soil moisture, and the multi-class relevance vector machine model was used for classifying different crop types. It was concluded that the classification scheme may uncover important data patterns contributing greatly to knowledge bases, and to scientific and medical research. The results for the soil moisture models would give a rough idea to farmers

  14. Mining the Galaxy Zoo Database: Machine Learning Applications

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.

    2010-01-01

    The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.

  15. Automatic pathology classification using a single feature machine learning support - vector machines

    NASA Astrophysics Data System (ADS)

    Yepes-Calderon, Fernando; Pedregosa, Fabian; Thirion, Bertrand; Wang, Yalin; Lepore, Natasha

    2014-03-01

    Magnetic Resonance Imaging (MRI) has been gaining popularity in the clinic in recent years as a safe in-vivo imaging technique. As a result, large troves of data are being gathered and stored daily that may be used as clinical training sets in hospitals. While numerous machine learning (ML) algorithms have been implemented for Alzheimer's disease classification, their outputs are usually difficult to interpret in the clinical setting. Here, we propose a simple method of rapid diagnostic classification for the clinic using Support Vector Machines (SVM)1 and easy to obtain geometrical measurements that, together with a cortical and sub-cortical brain parcellation, create a robust framework capable of automatic diagnosis with high accuracy. On a significantly large imaging dataset consisting of over 800 subjects taken from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, classification-success indexes of up to 99.2% are reached with a single measurement.

  16. Amp: A modular approach to machine learning in atomistic simulations

    NASA Astrophysics Data System (ADS)

    Khorshidi, Alireza; Peterson, Andrew A.

    2016-10-01

    Electronic structure calculations, such as those employing Kohn-Sham density functional theory or ab initio wavefunction theories, have allowed for atomistic-level understandings of a wide variety of phenomena and properties of matter at small scales. However, the computational cost of electronic structure methods drastically increases with length and time scales, which makes these methods difficult for long time-scale molecular dynamics simulations or large-sized systems. Machine-learning techniques can provide accurate potentials that can match the quality of electronic structure calculations, provided sufficient training data. These potentials can then be used to rapidly simulate large and long time-scale phenomena at similar quality to the parent electronic structure approach. Machine-learning potentials usually take a bias-free mathematical form and can be readily developed for a wide variety of systems. Electronic structure calculations have favorable properties-namely that they are noiseless and targeted training data can be produced on-demand-that make them particularly well-suited for machine learning. This paper discusses our modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package (Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems. Potentials developed through the atom-centered approach are simultaneously applicable for systems with various sizes. Interpolation can be enhanced by introducing custom descriptors of the local environment. We demonstrate this in the current work for Gaussian-type, bispectrum, and Zernike-type descriptors. Amp has an intuitive and modular structure with an interface through the python scripting language yet has parallelizable fortran components for demanding tasks; it is designed to integrate closely with the widely used Atomic Simulation Environment (ASE), which

  17. Machine Learning of Protein Interactions in Fungal Secretory Pathways.

    PubMed

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja; Rousu, Juho

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities. PMID:27441920

  18. Machine Learning of Protein Interactions in Fungal Secretory Pathways.

    PubMed

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja; Rousu, Juho

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities.

  19. Machine Learning of Protein Interactions in Fungal Secretory Pathways

    PubMed Central

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker’s yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities. PMID:27441920

  20. Geological applications of machine learning on hyperspectral remote sensing data

    NASA Astrophysics Data System (ADS)

    Tse, C. H.; Li, Yi-liang; Lam, Edmund Y.

    2015-02-01

    The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.

  1. Mapping of Estimations and Prediction Intervals Using Extreme Learning Machines

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2015-04-01

    Due to the large amount and complexity of data available nowadays in environmental sciences, we face the need to apply more robust methodology allowing analyses and understanding of the phenomena under study. One particular but very important aspect of this understanding is the reliability of generated prediction models. From the data collection to the prediction map, several sources of error can occur and affect the final result. Theses sources are mainly identified as uncertainty in data (data noise), and uncertainty in the model. Their combination leads to the so-called prediction interval. Quantifying these two categories of uncertainty allows a finer understanding of phenomena under study and a better assessment of the prediction accuracy. The present research deals with a methodology combining a machine learning algorithm (ELM - Extreme Learning Machine) with a bootstrap-based procedure. Developed by G.-B. Huang et al. (2006), ELM is an artificial neural network following the structure of a multilayer perceptron (MLP) with one single hidden layer. Compared to classical MLP, ELM has the ability to learn faster without loss of accuracy, and need only one hyper-parameter to be fitted (that is the number of nodes in the hidden layer). The key steps of the proposed method are as following: sample from the original data a variety of subsets using bootstrapping; from these subsets, train and validate ELM models; and compute residuals. Then, the same procedure is performed a second time with only the squared training residuals. Finally, taking into account the two modeling levels allows developing the mean prediction map, the model uncertainty variance, and the data noise variance. The proposed approach is illustrated using geospatial data. References Efron B., and Tibshirani R. 1986, Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical accuracy, Statistical Science, vol. 1: 54-75. Huang G.-B., Zhu Q.-Y., and Siew C.-K. 2006

  2. Predicting experimental properties of proteins from sequence by machine learning techniques.

    PubMed

    Smialowski, Pawel; Martin-Galiano, Antonio J; Cox, Jürgen; Frishman, Dmitrij

    2007-04-01

    Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine-learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection. PMID:17430194

  3. Big Data and Machine Learning in Plastic Surgery: A New Frontier in Surgical Innovation.

    PubMed

    Kanevsky, Jonathan; Corban, Jason; Gaster, Richard; Kanevsky, Ari; Lin, Samuel; Gilardino, Mirko

    2016-05-01

    Medical decision-making is increasingly based on quantifiable data. From the moment patients come into contact with the health care system, their entire medical history is recorded electronically. Whether a patient is in the operating room or on the hospital ward, technological advancement has facilitated the expedient and reliable measurement of clinically relevant health metrics, all in an effort to guide care and ensure the best possible clinical outcomes. However, as the volume and complexity of biomedical data grow, it becomes challenging to effectively process "big data" using conventional techniques. Physicians and scientists must be prepared to look beyond classic methods of data processing to extract clinically relevant information. The purpose of this article is to introduce the modern plastic surgeon to machine learning and computational interpretation of large data sets. What is machine learning? Machine learning, a subfield of artificial intelligence, can address clinically relevant problems in several domains of plastic surgery, including burn surgery; microsurgery; and craniofacial, peripheral nerve, and aesthetic surgery. This article provides a brief introduction to current research and suggests future projects that will allow plastic surgeons to explore this new frontier of surgical science.

  4. Big Data and Machine Learning in Plastic Surgery: A New Frontier in Surgical Innovation.

    PubMed

    Kanevsky, Jonathan; Corban, Jason; Gaster, Richard; Kanevsky, Ari; Lin, Samuel; Gilardino, Mirko

    2016-05-01

    Medical decision-making is increasingly based on quantifiable data. From the moment patients come into contact with the health care system, their entire medical history is recorded electronically. Whether a patient is in the operating room or on the hospital ward, technological advancement has facilitated the expedient and reliable measurement of clinically relevant health metrics, all in an effort to guide care and ensure the best possible clinical outcomes. However, as the volume and complexity of biomedical data grow, it becomes challenging to effectively process "big data" using conventional techniques. Physicians and scientists must be prepared to look beyond classic methods of data processing to extract clinically relevant information. The purpose of this article is to introduce the modern plastic surgeon to machine learning and computational interpretation of large data sets. What is machine learning? Machine learning, a subfield of artificial intelligence, can address clinically relevant problems in several domains of plastic surgery, including burn surgery; microsurgery; and craniofacial, peripheral nerve, and aesthetic surgery. This article provides a brief introduction to current research and suggests future projects that will allow plastic surgeons to explore this new frontier of surgical science. PMID:27119951

  5. Classifying Structures in the ISM with Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, A. A.; Williams, J. P.

    2011-01-01

    The processes which govern molecular cloud evolution and star formation often sculpt structures in the ISM: filaments, pillars, shells, outflows, etc. Because of their morphological complexity, these objects are often identified manually. Manual classification has several disadvantages; the process is subjective, not easily reproducible, and does not scale well to handle increasingly large datasets. We have explored to what extent machine learning algorithms can be trained to autonomously identify specific morphological features in molecular cloud datasets. We show that the Support Vector Machine algorithm can successfully locate filaments and outflows blended with other emission structures. When the objects of interest are morphologically distinct from the surrounding emission, this autonomous classification achieves >90% accuracy. We have developed a set of IDL-based tools to apply this technique to other datasets.

  6. Machine Learning Assisted Design of Highly Active Peptides for Drug Discovery

    PubMed Central

    Giguère, Sébastien; Laviolette, François; Marchand, Mario; Tremblay, Denise; Moineau, Sylvain; Liang, Xinxia; Biron, Éric; Corbeil, Jacques

    2015-01-01

    The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/. PMID:25849257

  7. Machine learning assisted design of highly active peptides for drug discovery.

    PubMed

    Giguère, Sébastien; Laviolette, François; Marchand, Mario; Tremblay, Denise; Moineau, Sylvain; Liang, Xinxia; Biron, Éric; Corbeil, Jacques

    2015-04-01

    The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/.

  8. Teaching an Old Log New Tricks with Machine Learning.

    PubMed

    Schnell, Krista; Puri, Colin; Mahler, Paul; Dukatz, Carl

    2014-03-01

    To most people, the log file would not be considered an exciting area in technology today. However, these relatively benign, slowly growing data sources can drive large business transformations when combined with modern-day analytics. Accenture Technology Labs has built a new framework that helps to expand existing vendor solutions to create new methods of gaining insights from these benevolent information springs. This framework provides a systematic and effective machine-learning mechanism to understand, analyze, and visualize heterogeneous log files. These techniques enable an automated approach to analyzing log content in real time, learning relevant behaviors, and creating actionable insights applicable in traditionally reactive situations. Using this approach, companies can now tap into a wealth of knowledge residing in log file data that is currently being collected but underutilized because of its overwhelming variety and volume. By using log files as an important data input into the larger enterprise data supply chain, businesses have the opportunity to enhance their current operational log management solution and generate entirely new business insights-no longer limited to the realm of reactive IT management, but extending from proactive product improvement to defense from attacks. As we will discuss, this solution has immediate relevance in the telecommunications and security industries. However, the most forward-looking companies can take it even further. How? By thinking beyond the log file and applying the same machine-learning framework to other log file use cases (including logistics, social media, and consumer behavior) and any other transactional data source.

  9. Machine learning and data mining: strategies for hypothesis generation.

    PubMed

    Oquendo, M A; Baca-Garcia, E; Artés-Rodríguez, A; Perez-Cruz, F; Galfalvy, H C; Blasco-Fontecilla, H; Madigan, D; Duan, N

    2012-10-01

    Strategies for generating knowledge in medicine have included observation of associations in clinical or research settings and more recently, development of pathophysiological models based on molecular biology. Although critically important, they limit hypothesis generation to an incremental pace. Machine learning and data mining are alternative approaches to identifying new vistas to pursue, as is already evident in the literature. In concert with these analytic strategies, novel approaches to data collection can enhance the hypothesis pipeline as well. In data farming, data are obtained in an 'organic' way, in the sense that it is entered by patients themselves and available for harvesting. In contrast, in evidence farming (EF), it is the provider who enters medical data about individual patients. EF differs from regular electronic medical record systems because frontline providers can use it to learn from their own past experience. In addition to the possibility of generating large databases with farming approaches, it is likely that we can further harness the power of large data sets collected using either farming or more standard techniques through implementation of data-mining and machine-learning strategies. Exploiting large databases to develop new hypotheses regarding neurobiological and genetic underpinnings of psychiatric illness is useful in itself, but also affords the opportunity to identify novel mechanisms to be targeted in drug discovery and development.

  10. Machine learning approach for objective inpainting quality assessment

    NASA Astrophysics Data System (ADS)

    Frantc, V. A.; Voronin, V. V.; Marchuk, V. I.; Sherstobitov, A. I.; Agaian, S.; Egiazarian, K.

    2014-05-01

    This paper focuses on a machine learning approach for objective inpainting quality assessment. Inpainting has received a lot of attention in recent years and quality assessment is an important task to evaluate different image reconstruction approaches. Quantitative metrics for successful image inpainting currently do not exist; researchers instead are relying upon qualitative human comparisons in order to evaluate their methodologies and techniques. We present an approach for objective inpainting quality assessment based on natural image statistics and machine learning techniques. Our method is based on observation that when images are properly normalized or transferred to a transform domain, local descriptors can be modeled by some parametric distributions. The shapes of these distributions are different for noninpainted and inpainted images. Approach permits to obtain a feature vector strongly correlated with a subjective image perception by a human visual system. Next, we use a support vector regression learned on assessed by human images to predict perceived quality of inpainted images. We demonstrate how our predicted quality value repeatably correlates with a qualitative opinion in a human observer study.

  11. Effective feature selection for image steganalysis using extreme learning machine

    NASA Astrophysics Data System (ADS)

    Feng, Guorui; Zhang, Haiyan; Zhang, Xinpeng

    2014-11-01

    Image steganography delivers secret data by slight modifications of the cover. To detect these data, steganalysis tries to create some features to embody the discrepancy between the cover and steganographic images. Therefore, the urgent problem is how to design an effective classification architecture for given feature vectors extracted from the images. We propose an approach to automatically select effective features based on the well-known JPEG steganographic methods. This approach, referred to as extreme learning machine revisited feature selection (ELM-RFS), can tune input weights in terms of the importance of input features. This idea is derived from cross-validation learning and one-dimensional (1-D) search. While updating input weights, we seek the energy decreasing direction using the leave-one-out (LOO) selection. Furthermore, we optimize the 1-D energy function instead of directly discarding the least significant feature. Since recent Liu features can gain considerable low detection errors compared to a previous JPEG steganalysis, the experimental results demonstrate that the new approach results in less classification error than other classifiers such as SVM, Kodovsky ensemble classifier, direct ELM-LOO learning, kernel ELM, and conventional ELM in Liu features. Furthermore, ELM-RFS achieves a similar performance with a deep Boltzmann machine using less training time.

  12. Teaching an Old Log New Tricks with Machine Learning.

    PubMed

    Schnell, Krista; Puri, Colin; Mahler, Paul; Dukatz, Carl

    2014-03-01

    To most people, the log file would not be considered an exciting area in technology today. However, these relatively benign, slowly growing data sources can drive large business transformations when combined with modern-day analytics. Accenture Technology Labs has built a new framework that helps to expand existing vendor solutions to create new methods of gaining insights from these benevolent information springs. This framework provides a systematic and effective machine-learning mechanism to understand, analyze, and visualize heterogeneous log files. These techniques enable an automated approach to analyzing log content in real time, learning relevant behaviors, and creating actionable insights applicable in traditionally reactive situations. Using this approach, companies can now tap into a wealth of knowledge residing in log file data that is currently being collected but underutilized because of its overwhelming variety and volume. By using log files as an important data input into the larger enterprise data supply chain, businesses have the opportunity to enhance their current operational log management solution and generate entirely new business insights-no longer limited to the realm of reactive IT management, but extending from proactive product improvement to defense from attacks. As we will discuss, this solution has immediate relevance in the telecommunications and security industries. However, the most forward-looking companies can take it even further. How? By thinking beyond the log file and applying the same machine-learning framework to other log file use cases (including logistics, social media, and consumer behavior) and any other transactional data source. PMID:27447306

  13. Intra-and-Inter Species Biomass Prediction in a Plantation Forest: Testing the Utility of High Spatial Resolution Spaceborne Multispectral RapidEye Sensor and Advanced Machine Learning Algorithms

    PubMed Central

    Dube, Timothy; Mutanga, Onisimo; Adam, Elhadi; Ismail, Riyad

    2014-01-01

    The quantification of aboveground biomass using remote sensing is critical for better understanding the role of forests in carbon sequestration and for informed sustainable management. Although remote sensing techniques have been proven useful in assessing forest biomass in general, more is required to investigate their capabilities in predicting intra-and-inter species biomass which are mainly characterised by non-linear relationships. In this study, we tested two machine learning algorithms, Stochastic Gradient Boosting (SGB) and Random Forest (RF) regression trees to predict intra-and-inter species biomass using high resolution RapidEye reflectance bands as well as the derived vegetation indices in a commercial plantation. The results showed that the SGB algorithm yielded the best performance for intra-and-inter species biomass prediction; using all the predictor variables as well as based on the most important selected variables. For example using the most important variables the algorithm produced an R2 of 0.80 and RMSE of 16.93 t·ha−1 for E. grandis; R2 of 0.79, RMSE of 17.27 t·ha−1 for P. taeda and R2 of 0.61, RMSE of 43.39 t·ha−1 for the combined species data sets. Comparatively, RF yielded plausible results only for E. dunii (R2 of 0.79; RMSE of 7.18 t·ha−1). We demonstrated that although the two statistical methods were able to predict biomass accurately, RF produced weaker results as compared to SGB when applied to combined species dataset. The result underscores the relevance of stochastic models in predicting biomass drawn from different species and genera using the new generation high resolution RapidEye sensor with strategically positioned bands. PMID:25140631

  14. Kernel-based machine learning techniques for infrasound signal classification

    NASA Astrophysics Data System (ADS)

    Tuma, Matthias; Igel, Christian; Mialle, Pierrick

    2014-05-01

    Infrasound monitoring is one of four remote sensing technologies continuously employed by the CTBTO Preparatory Commission. The CTBTO's infrasound network is designed to monitor the Earth for potential evidence of atmospheric or shallow underground nuclear explosions. Upon completion, it will comprise 60 infrasound array stations distributed around the globe, of which 47 were certified in January 2014. Three stages can be identified in CTBTO infrasound data processing: automated processing at the level of single array stations, automated processing at the level of the overall global network, and interactive review by human analysts. At station level, the cross correlation-based PMCC algorithm is used for initial detection of coherent wavefronts. It produces estimates for trace velocity and azimuth of incoming wavefronts, as well as other descriptive features characterizing a signal. Detected arrivals are then categorized into potentially treaty-relevant versus noise-type signals by a rule-based expert system. This corresponds to a binary classification task at the level of station processing. In addition, incoming signals may be grouped according to their travel path in the atmosphere. The present work investigates automatic classification of infrasound arrivals by kernel-based pattern recognition methods. It aims to explore the potential of state-of-the-art machine learning methods vis-a-vis the current rule-based and task-tailored expert system. To this purpose, we first address the compilation of a representative, labeled reference benchmark dataset as a prerequisite for both classifier training and evaluation. Data representation is based on features extracted by the CTBTO's PMCC algorithm. As classifiers, we employ support vector machines (SVMs) in a supervised learning setting. Different SVM kernel functions are used and adapted through different hyperparameter optimization routines. The resulting performance is compared to several baseline classifiers. All

  15. Machine learning approaches in medical image analysis: From detection to diagnosis.

    PubMed

    de Bruijne, Marleen

    2016-10-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols, learning from weak labels, and interpretation and evaluation of results.

  16. Machine-z: Rapid machine-learned redshift indicator for Swift gamma-ray bursts

    DOE PAGES

    Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

    2016-03-08

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce ‘machine-z’, a redshift prediction algorithm and a ‘high-z’ classifier for Swift GRBs based on machine learning. Our method relies exclusively onmore » canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ~100 per cent recall. As a result, the most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.« less

  17. Machine Learning and the Starship - A Match Made in Heaven

    NASA Astrophysics Data System (ADS)

    Galea, P.

    The computer control system of an unmanned interstellar craft must deal with a variety of complex problems. For example, upon reaching the destination star, the computer may need to make assessments of the planets and other objects to prioritize the most `interesting', and assign appropriate probes to each. These decisions would normally be regarded as intelligent if they were made by humans. This paper looks at machine learning technologies currently deployed in non-aerospace contexts, such as book recommendation systems, dating websites and social network analysis, and investigates the ways in which they can be adapted for applications in the starship. This paper is a submission of the Project Icarus Study Group.

  18. Characterizing EMG data using machine-learning tools.

    PubMed

    Yousefi, Jamileh; Hamilton-Wright, Andrew

    2014-08-01

    Effective electromyographic (EMG) signal characterization is critical in the diagnosis of neuromuscular disorders. Machine-learning based pattern classification algorithms are commonly used to produce such characterizations. Several classifiers have been investigated to develop accurate and computationally efficient strategies for EMG signal characterization. This paper provides a critical review of some of the classification methodologies used in EMG characterization, and presents the state-of-the-art accomplishments in this field, emphasizing neuromuscular pathology. The techniques studied are grouped by their methodology, and a summary of the salient findings associated with each method is presented.

  19. Extreme learning machine for ranking: generalization analysis and applications.

    PubMed

    Chen, Hong; Peng, Jiangtao; Zhou, Yicong; Li, Luoqing; Pan, Zhibin

    2014-05-01

    The extreme learning machine (ELM) has attracted increasing attention recently with its successful applications in classification and regression. In this paper, we investigate the generalization performance of ELM-based ranking. A new regularized ranking algorithm is proposed based on the combinations of activation functions in ELM. The generalization analysis is established for the ELM-based ranking (ELMRank) in terms of the covering numbers of hypothesis space. Empirical results on the benchmark datasets show the competitive performance of the ELMRank over the state-of-the-art ranking methods. PMID:24590011

  20. Transferable Atomic Multipole Machine Learning Models for Small Organic Molecules.

    PubMed

    Bereau, Tristan; Andrienko, Denis; von Lilienfeld, O Anatole

    2015-07-14

    Accurate representation of the molecular electrostatic potential, which is often expanded in distributed multipole moments, is crucial for an efficient evaluation of intermolecular interactions. Here we introduce a machine learning model for multipole coefficients of atom types H, C, O, N, S, F, and Cl in any molecular conformation. The model is trained on quantum-chemical results for atoms in varying chemical environments drawn from thousands of organic molecules. Multipoles in systems with neutral, cationic, and anionic molecular charge states are treated with individual models. The models' predictive accuracy and applicability are illustrated by evaluating intermolecular interaction energies of nearly 1,000 dimers and the cohesive energy of the benzene crystal.

  1. Coordinated machine learning and decision support for situation awareness.

    SciTech Connect

    Draelos, Timothy John; Zhang, Peng-Chu.; Wunsch, Donald C.; Seiffertt, John; Conrad, Gregory N.; Brannon, Nathan Gregory

    2007-09-01

    For applications such as force protection, an effective decision maker needs to maintain an unambiguous grasp of the environment. Opportunities exist to leverage computational mechanisms for the adaptive fusion of diverse information sources. The current research employs neural networks and Markov chains to process information from sources including sensors, weather data, and law enforcement. Furthermore, the system operator's input is used as a point of reference for the machine learning algorithms. More detailed features of the approach are provided, along with an example force protection scenario.

  2. Large-scale machine learning for metagenomics sequence classification

    PubMed Central

    Vervier, Kévin; Mahé, Pierre; Tournoud, Maud; Veyrieras, Jean-Baptiste; Vert, Jean-Philippe

    2016-01-01

    Motivation: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. Results: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 108 samples in 107 dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2–17 times with respect to the BWA-MEM short read mapper, depending

  3. Coadaptive brain-machine interface via reinforcement learning.

    PubMed

    DiGiovanna, Jack; Mahmoudi, Babak; Fortes, Jose; Principe, Jose C; Sanchez, Justin C

    2009-01-01

    This paper introduces and demonstrates a novel brain-machine interface (BMI) architecture based on the concepts of reinforcement learning (RL), coadaptation, and shaping. RL allows the BMI control algorithm to learn to complete tasks from interactions with the environment, rather than an explicit training signal. Coadaption enables continuous, synergistic adaptation between the BMI control algorithm and BMI user working in changing environments. Shaping is designed to reduce the learning curve for BMI users attempting to control a prosthetic. Here, we present the theory and in vivo experimental paradigm to illustrate how this BMI learns to complete a reaching task using a prosthetic arm in a 3-D workspace based on the user's neuronal activity. This semisupervised learning framework does not require user movements. We quantify BMI performance in closed-loop brain control over six to ten days for three rats as a function of increasing task difficulty. All three subjects coadapted with their BMI control algorithms to control the prosthetic significantly above chance at each level of difficulty.

  4. Semi-supervised and unsupervised extreme learning machines.

    PubMed

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency. PMID:25415946

  5. Extremely Randomized Machine Learning Methods for Compound Activity Prediction.

    PubMed

    Czarnecki, Wojciech M; Podlewska, Sabina; Bojarski, Andrzej J

    2015-11-09

    Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called 'extremely randomized methods'-Extreme Entropy Machine and Extremely Randomized Trees-for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their 'non-extreme' competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.

  6. GeneRIF indexing: sentence selection based on machine learning

    PubMed Central

    2013-01-01

    Background A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. Results We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. Conclusions The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species. PMID:23725347

  7. Calibrating Building Energy Models Using Supercomputer Trained Machine Learning Agents

    SciTech Connect

    Sanyal, Jibonananda; New, Joshua Ryan; Edwards, Richard; Parker, Lynne Edwards

    2014-01-01

    Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrofit purposes. EnergyPlus is the flagship Department of Energy software that performs BEM for different types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manually by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building energy modeling unfeasible for smaller projects. In this paper, we describe the Autotune research which employs machine learning algorithms to generate agents for the different kinds of standard reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of EnergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-effective calibration of building models.

  8. Machine Learning for Quantum Metrology and Quantum Control

    NASA Astrophysics Data System (ADS)

    Sanders, Barry; Zahedinejad, Ehsan; Palittapongarnpim, Pantita

    Generating quantum metrological procedures and quantum gate designs, subject to constraints such as temporal or particle-number bounds or limits on the number of control parameters, are typically hard computationally. Although greedy machine learning algorithms are ubiquitous for tackling these problems, the severe constraints listed above limit the efficacy of such approaches. Our aim is to devise heuristic machine learning techniques to generate tractable procedures for adaptive quantum metrology and quantum gate design. In particular we have modified differential evolution to generate adaptive interferometric-phase quantum metrology procedures for up to 100 photons including loss and noise, and we have generated policies for designing single-shot high-fidelity three-qubit gates in superconducting circuits by avoided level crossings. Although quantum metrology and quantum control are regarded as disparate, we have developed a unified framework for these two subjects, and this unification enables us to transfer insights and breakthroughs from one of the topics to the other. Thanks to NSERC, AITF and 1000 Talent Plan.

  9. Overlay improvements using a real time machine learning algorithm

    NASA Astrophysics Data System (ADS)

    Schmitt-Weaver, Emil; Kubis, Michael; Henke, Wolfgang; Slotboom, Daan; Hoogenboom, Tom; Mulkens, Jan; Coogans, Martyn; ten Berge, Peter; Verkleij, Dick; van de Mast, Frank

    2014-04-01

    While semiconductor manufacturing is moving towards the 14nm node using immersion lithography, the overlay requirements are tightened to below 5nm. Next to improvements in the immersion scanner platform, enhancements in the overlay optimization and process control are needed to enable these low overlay numbers. Whereas conventional overlay control methods address wafer and lot variation autonomously with wafer pre exposure alignment metrology and post exposure overlay metrology, we see a need to reduce these variations by correlating more of the TWINSCAN system's sensor data directly to the post exposure YieldStar metrology in time. In this paper we will present the results of a study on applying a real time control algorithm based on machine learning technology. Machine learning methods use context and TWINSCAN system sensor data paired with post exposure YieldStar metrology to recognize generic behavior and train the control system to anticipate on this generic behavior. Specific for this study, the data concerns immersion scanner context, sensor data and on-wafer measured overlay data. By making the link between the scanner data and the wafer data we are able to establish a real time relationship. The result is an inline controller that accounts for small changes in scanner hardware performance in time while picking up subtle lot to lot and wafer to wafer deviations introduced by wafer processing.

  10. Effective and efficient optics inspection approach using machine learning algorithms

    SciTech Connect

    Abdulla, G; Kegelmeyer, L; Liao, Z; Carr, W

    2010-11-02

    The Final Optics Damage Inspection (FODI) system automatically acquires and utilizes the Optics Inspection (OI) system to analyze images of the final optics at the National Ignition Facility (NIF). During each inspection cycle up to 1000 images acquired by FODI are examined by OI to identify and track damage sites on the optics. The process of tracking growing damage sites on the surface of an optic can be made more effective by identifying and removing signals associated with debris or reflections. The manual process to filter these false sites is daunting and time consuming. In this paper we discuss the use of machine learning tools and data mining techniques to help with this task. We describe the process to prepare a data set that can be used for training and identifying hardware reflections in the image data. In order to collect training data, the images are first automatically acquired and analyzed with existing software and then relevant features such as spatial, physical and luminosity measures are extracted for each site. A subset of these sites is 'truthed' or manually assigned a class to create training data. A supervised classification algorithm is used to test if the features can predict the class membership of new sites. A suite of self-configuring machine learning tools called 'Avatar Tools' is applied to classify all sites. To verify, we used 10-fold cross correlation and found the accuracy was above 99%. This substantially reduces the number of false alarms that would otherwise be sent for more extensive investigation.

  11. Parsimonious kernel extreme learning machine in primal via Cholesky factorization.

    PubMed

    Zhao, Yong-Ping

    2016-08-01

    Recently, extreme learning machine (ELM) has become a popular topic in machine learning community. By replacing the so-called ELM feature mappings with the nonlinear mappings induced by kernel functions, two kernel ELMs, i.e., P-KELM and D-KELM, are obtained from primal and dual perspectives, respectively. Unfortunately, both P-KELM and D-KELM possess the dense solutions in direct proportion to the number of training data. To this end, a constructive algorithm for P-KELM (CCP-KELM) is first proposed by virtue of Cholesky factorization, in which the training data incurring the largest reductions on the objective function are recruited as significant vectors. To reduce its training cost further, PCCP-KELM is then obtained with the application of a probabilistic speedup scheme into CCP-KELM. Corresponding to CCP-KELM, a destructive P-KELM (CDP-KELM) is presented using a partial Cholesky factorization strategy, where the training data incurring the smallest reductions on the objective function after their removals are pruned from the current set of significant vectors. Finally, to verify the efficacy and feasibility of the proposed algorithms in this paper, experiments on both small and large benchmark data sets are investigated.

  12. Machine learning of molecular electronic properties in chemical compound space

    NASA Astrophysics Data System (ADS)

    Montavon, Grégoire; Rupp, Matthias; Gobre, Vivekanand; Vazquez-Mayagoitia, Alvaro; Hansen, Katja; Tkatchenko, Alexandre; Müller, Klaus-Robert; Anatole von Lilienfeld, O.

    2013-09-01

    The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure-property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e. nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ‘quantum machine’ is similar, and sometimes superior, to modern quantum-chemical methods—at negligible computational cost.

  13. Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Ntampaka, M.; Trac, H.; Sutherland, D. J.; Fromenteau, S.; Póczos, B.; Schneider, J.

    2016-11-01

    We study dynamical mass measurements of galaxy clusters contaminated by interlopers and show that a modern machine learning algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create two mock catalogs from Multidark’s publicly available N-body MDPL1 simulation, one with perfect galaxy cluster membership information and the other where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power-law scaling relation to infer cluster mass from galaxy line-of-sight (LOS) velocity dispersion. Assuming perfect membership knowledge, this unrealistic case produces a wide fractional mass error distribution, with a width of {{Δ }}ε ≈ 0.87. Interlopers introduce additional scatter, significantly widening the error distribution further ({{Δ }}ε ≈ 2.13). We employ the support distribution machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement ({{Δ }}ε ≈ 0.67) for the contaminated case. Remarkably, SDM applied to contaminated clusters is better able to recover masses than even the scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.

  14. Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff

    2016-01-01

    Galaxy clusters are a rich source of information for examining fundamental astrophysical processes and cosmological parameters, however, employing clusters as cosmological probes requires accurate mass measurements derived from cluster observables. We study dynamical mass measurements of galaxy clusters contaminated by interlopers, and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create a mock catalog from Multidark's publicly-available N-body MDPL1 simulation where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power law scaling relation to infer cluster mass from galaxy line of sight (LOS) velocity dispersion. The presence of interlopers in the catalog produces a wide, flat fractional mass error distribution, with width = 2.13. We employ the Support Distribution Machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (width = 0.67). Remarkably, SDM applied to contaminated clusters is better able to recover masses than even a scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.

  15. Machine Learning for Knowledge Extraction from PHR Big Data.

    PubMed

    Poulymenopoulou, Michaela; Malamateniou, Flora; Vassilacopoulos, George

    2014-01-01

    Cloud computing, Internet of things (IOT) and NoSQL database technologies can support a new generation of cloud-based PHR services that contain heterogeneous (unstructured, semi-structured and structured) patient data (health, social and lifestyle) from various sources, including automatically transmitted data from Internet connected devices of patient living space (e.g. medical devices connected to patients at home care). The patient data stored in such PHR systems constitute big data whose analysis with the use of appropriate machine learning algorithms is expected to improve diagnosis and treatment accuracy, to cut healthcare costs and, hence, to improve the overall quality and efficiency of healthcare provided. This paper describes a health data analytics engine which uses machine learning algorithms for analyzing cloud based PHR big health data towards knowledge extraction to support better healthcare delivery as regards disease diagnosis and prognosis. This engine comprises of the data preparation, the model generation and the data analysis modules and runs on the cloud taking advantage from the map/reduce paradigm provided by Apache Hadoop. PMID:25000009

  16. Estimation of Alpine Skier Posture Using Machine Learning Techniques

    PubMed Central

    Nemec, Bojan; Petrič, Tadej; Babič, Jan; Supej, Matej

    2014-01-01

    High precision Global Navigation Satellite System (GNSS) measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier's neck. A key issue is how to estimate other more relevant parameters of the skier's body, like the center of mass (COM) and ski trajectories. Previously, these parameters were estimated by modeling the skier's body with an inverted-pendulum model that oversimplified the skier's body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier's body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing. PMID:25313492

  17. Analyzing angle crashes at unsignalized intersections using machine learning techniques.

    PubMed

    Abdel-Aty, Mohamed; Haleem, Kirolos

    2011-01-01

    A recently developed machine learning technique, multivariate adaptive regression splines (MARS), is introduced in this study to predict vehicles' angle crashes. MARS has a promising prediction power, and does not suffer from interpretation complexity. Negative Binomial (NB) and MARS models were fitted and compared using extensive data collected on unsignalized intersections in Florida. Two models were estimated for angle crash frequency at 3- and 4-legged unsignalized intersections. Treating crash frequency as a continuous response variable for fitting a MARS model was also examined by considering the natural logarithm of the crash frequency. Finally, combining MARS with another machine learning technique (random forest) was explored and discussed. The fitted NB angle crash models showed several significant factors that contribute to angle crash occurrence at unsignalized intersections such as, traffic volume on the major road, the upstream distance to the nearest signalized intersection, the distance between successive unsignalized intersections, median type on the major approach, percentage of trucks on the major approach, size of the intersection and the geographic location within the state. Based on the mean square prediction error (MSPE) assessment criterion, MARS outperformed the corresponding NB models. Also, using MARS for predicting continuous response variables yielded more favorable results than predicting discrete response variables. The generated MARS models showed the most promising results after screening the covariates using random forest. Based on the results of this study, MARS is recommended as an efficient technique for predicting crashes at unsignalized intersections (angle crashes in this study).

  18. Parsimonious kernel extreme learning machine in primal via Cholesky factorization.

    PubMed

    Zhao, Yong-Ping

    2016-08-01

    Recently, extreme learning machine (ELM) has become a popular topic in machine learning community. By replacing the so-called ELM feature mappings with the nonlinear mappings induced by kernel functions, two kernel ELMs, i.e., P-KELM and D-KELM, are obtained from primal and dual perspectives, respectively. Unfortunately, both P-KELM and D-KELM possess the dense solutions in direct proportion to the number of training data. To this end, a constructive algorithm for P-KELM (CCP-KELM) is first proposed by virtue of Cholesky factorization, in which the training data incurring the largest reductions on the objective function are recruited as significant vectors. To reduce its training cost further, PCCP-KELM is then obtained with the application of a probabilistic speedup scheme into CCP-KELM. Corresponding to CCP-KELM, a destructive P-KELM (CDP-KELM) is presented using a partial Cholesky factorization strategy, where the training data incurring the smallest reductions on the objective function after their removals are pruned from the current set of significant vectors. Finally, to verify the efficacy and feasibility of the proposed algorithms in this paper, experiments on both small and large benchmark data sets are investigated. PMID:27203553

  19. Edge detection in grayscale imagery using machine learning

    SciTech Connect

    Glocer, K. A.; Perkins, S. J.

    2004-01-01

    Edge detection can be formulated as a binary classification problem at the pixel level with the goal of identifying individual pixels as either on-edge or off-edge. To solve this classification problem we use both fixed and adaptive feature selection in conjunction with a support vector machine. This approach provides a direct data-driven solution and does not require the intermediate step of learning a distribution to perform a likelihood-based classification. Furthermore, the approach can readily be adapted for other image processing tasks. The algorithm was tested on a data set of 50 object images, each associated with a hand-drawn 'ground truth' image. We computed ROC curves to evaluate the performance of the general feature extraction and machine learning approach, and compared that to the standard Canny edge detector and with recent work on statistical edge detection. Using a direct pixel-by-pixel error metric enabled us to compare to the statistical edge detection approach, and our algorithm compared favorably. Using a more 'natural' metric enabled comparision with work by the authors of the image data set, and our algorithm performed comparably to the suite of state-of-art edge detectors in that study.

  20. Prediction of brain tumor progression using a machine learning technique

    NASA Astrophysics Data System (ADS)

    Shen, Yuzhong; Banerjee, Debrup; Li, Jiang; Chandler, Adam; Shen, Yufei; McKenzie, Frederic D.; Wang, Jihong

    2010-03-01

    A machine learning technique is presented for assessing brain tumor progression by exploring six patients' complete MRI records scanned during their visits in the past two years. There are ten MRI series, including diffusion tensor image (DTI), for each visit. After registering all series to the corresponding DTI scan at the first visit, annotated normal and tumor regions were overlaid. Intensity value of each pixel inside the annotated regions were then extracted across all of the ten MRI series to compose a 10 dimensional vector. Each feature vector falls into one of three categories:normal, tumor, and normal but progressed to tumor at a later time. In this preliminary study, we focused on the trend of brain tumor progression during three consecutive visits, i.e., visit A, B, and C. A machine learning algorithm was trained using the data containing information from visit A to visit B, and the trained model was used to predict tumor progression from visit A to visit C. Preliminary results showed that prediction for brain tumor progression is feasible. An average of 80.9% pixel-wise accuracy was achieved for tumor progression prediction at visit C.

  1. Machine Learning for Knowledge Extraction from PHR Big Data.

    PubMed

    Poulymenopoulou, Michaela; Malamateniou, Flora; Vassilacopoulos, George

    2014-01-01

    Cloud computing, Internet of things (IOT) and NoSQL database technologies can support a new generation of cloud-based PHR services that contain heterogeneous (unstructured, semi-structured and structured) patient data (health, social and lifestyle) from various sources, including automatically transmitted data from Internet connected devices of patient living space (e.g. medical devices connected to patients at home care). The patient data stored in such PHR systems constitute big data whose analysis with the use of appropriate machine learning algorithms is expected to improve diagnosis and treatment accuracy, to cut healthcare costs and, hence, to improve the overall quality and efficiency of healthcare provided. This paper describes a health data analytics engine which uses machine learning algorithms for analyzing cloud based PHR big health data towards knowledge extraction to support better healthcare delivery as regards disease diagnosis and prognosis. This engine comprises of the data preparation, the model generation and the data analysis modules and runs on the cloud taking advantage from the map/reduce paradigm provided by Apache Hadoop.

  2. Machine Learning Based Road Detection from High Resolution Imagery

    NASA Astrophysics Data System (ADS)

    Lv, Ye; Wang, Guofeng; Hu, Xiangyun

    2016-06-01

    At present, remote sensing technology is the best weapon to get information from the earth surface, and it is very useful in geo- information updating and related applications. Extracting road from remote sensing images is one of the biggest demand of rapid city development, therefore, it becomes a hot issue. Roads in high-resolution images are more complex, patterns of roads vary a lot, which becomes obstacles for road extraction. In this paper, a machine learning based strategy is presented. The strategy overall uses the geometry features, radiation features, topology features and texture features. In high resolution remote sensing images, the images cover a great scale of landscape, thus, the speed of extracting roads is slow. So, roads' ROIs are firstly detected by using Houghline detection and buffering method to narrow down the detecting area. As roads in high resolution images are normally in ribbon shape, mean-shift and watershed segmentation methods are used to extract road segments. Then, Real Adaboost supervised machine learning algorithm is used to pick out segments that contain roads' pattern. At last, geometric shape analysis and morphology methods are used to prune and restore the whole roads' area and to detect the centerline of roads.

  3. Forecasting daily streamflow using online sequential extreme learning machines

    NASA Astrophysics Data System (ADS)

    Lima, Aranildo R.; Cannon, Alex J.; Hsieh, William W.

    2016-06-01

    While nonlinear machine methods have been widely used in environmental forecasting, in situations where new data arrive continually, the need to make frequent model updates can become cumbersome and computationally costly. To alleviate this problem, an online sequential learning algorithm for single hidden layer feedforward neural networks - the online sequential extreme learning machine (OSELM) - is automatically updated inexpensively as new data arrive (and the new data can then be discarded). OSELM was applied to forecast daily streamflow at two small watersheds in British Columbia, Canada, at lead times of 1-3 days. Predictors used were weather forecast data generated by the NOAA Global Ensemble Forecasting System (GEFS), and local hydro-meteorological observations. OSELM forecasts were tested with daily, monthly or yearly model updates. More frequent updating gave smaller forecast errors, including errors for data above the 90th percentile. Larger datasets used in the initial training of OSELM helped to find better parameters (number of hidden nodes) for the model, yielding better predictions. With the online sequential multiple linear regression (OSMLR) as benchmark, we concluded that OSELM is an attractive approach as it easily outperformed OSMLR in forecast accuracy.

  4. Predicting submicron air pollution indicators: a machine learning approach.

    PubMed

    Pandey, Gaurav; Zhang, Bin; Jian, Le

    2013-05-01

    The regulation of air pollutant levels is rapidly becoming one of the most important tasks for the governments of developing countries, especially China. Submicron particles, such as ultrafine particles (UFP, aerodynamic diameter ≤ 100 nm) and particulate matter ≤ 1.0 micrometers (PM1.0), are an unregulated emerging health threat to humans, but the relationships between the concentration of these particles and meteorological and traffic factors are poorly understood. To shed some light on these connections, we employed a range of machine learning techniques to predict UFP and PM1.0 levels based on a dataset consisting of observations of weather and traffic variables recorded at a busy roadside in Hangzhou, China. Based upon the thorough examination of over twenty five classifiers used for this task, we find that it is possible to predict PM1.0 and UFP levels reasonably accurately and that tree-based classification models (Alternating Decision Tree and Random Forests) perform the best for both these particles. In addition, weather variables show a stronger relationship with PM1.0 and UFP levels, and thus cannot be ignored for predicting submicron particle levels. Overall, this study has demonstrated the potential application value of systematically collecting and analysing datasets using machine learning techniques for the prediction of submicron sized ambient air pollutants.

  5. Distributed machine learning: Scaling up with coarse-grained parallelism

    SciTech Connect

    Provost, F.J.; Hennessy, D.N.

    1994-12-31

    Machine teaming methods are becoming accepted as additions to the biologist`s data-analysis tool kit. However, scaling these techniques up to large data sets, such as those in biological and medical domains, is problematic in terms of both the required computational search effort and required memory (and the detrimental effects of excessive swapping). Our approach to tackling the problem of scaling up to large datasets is to take advantage of the ubiquitous workstation networks that are generally available in scientific and engineering environments. This paper introduces the notion of the invariant-partitioning property--that for certain evaluation criteria it is possible to partition a data set across multiple processors such that any rule that is satisfactory over the entire data set will also be satisfactory on at least one subset. In addition, by taking advantage of cooperation through interprocess communication, it is possible to build distributed learning algorithms such that only rules that are satisfactory over the entire data set will be learned. We describe a distributed learning system, CorPRL, that takes advantage of the invariant-partitioning property to learn from very large data sets, and present results demonstrating CorPRL`s effectiveness in analyzing data from two databases.

  6. Application of machine learning using support vector machines for crater detection from Martian digital topography data

    NASA Astrophysics Data System (ADS)

    Salamunićcar, Goran; Lončarić, Sven

    In our previous work, in order to extend the GT-57633 catalogue [PSS, 56 (15), 1992-2008] with still uncatalogued impact-craters, the following has been done [GRS, 48 (5), in press, doi:10.1109/TGRS.2009.2037750]: (1) the crater detection algorithm (CDA) based on digital elevation model (DEM) was developed; (2) using 1/128° MOLA data, this CDA proposed 414631 crater-candidates; (3) each crater-candidate was analyzed manually; and (4) 57592 were confirmed as correct detections. The resulting GT-115225 catalog is the significant result of this effort. However, to check such a large number of crater-candidates manually was a demanding task. This was the main motivation for work on improvement of the CDA in order to provide better classification of craters as true and false detections. To achieve this, we extended the CDA with the machine learning capability, using support vector machines (SVM). In the first step, the CDA (re)calculates numerous terrain morphometric attributes from DEM. For this purpose, already existing modules of the CDA from our previous work were reused in order to be capable to prepare these attributes. In addition, new attributes were introduced such as ellipse eccentricity and tilt. For machine learning purpose, the CDA is additionally extended to provide 2-D topography-profile and 3-D shape for each crater-candidate. The latter two are a performance problem because of the large number of crater-candidates in combination with the large number of attributes. As a solution, we developed a CDA architecture wherein it is possible to combine the SVM with a radial basis function (RBF) or any other kernel (for initial set of attributes), with the SVM with linear kernel (for the cases when 2-D and 3-D data are included as well). Another challenge is that, in addition to diversity of possible crater types, there are numerous morphological differences between the smallest (mostly very circular bowl-shaped craters) and the largest (multi-ring) impact

  7. Two-Stage Machine Learning model for guideline development.

    PubMed

    Mani, S; Shankle, W R; Dick, M B; Pazzani, M J

    1999-05-01

    We present a Two-Stage Machine Learning (ML) model as a data mining method to develop practice guidelines and apply it to the problem of dementia staging. Dementia staging in clinical settings is at present complex and highly subjective because of the ambiguities and the complicated nature of existing guidelines. Our model abstracts the two-stage process used by physicians to arrive at the global Clinical Dementia Rating Scale (CDRS) score. The model incorporates learning intermediate concepts (CDRS category scores) in the first stage that then become the feature space for the second stage (global CDRS score). The sample consisted of 678 patients evaluated in the Alzheimer's Disease Research Center at the University of California, Irvine. The demographic variables, functional and cognitive test results used by physicians for the task of dementia severity staging were used as input to the machine learning algorithms. Decision tree learners and rule inducers (C4.5, Cart, C4.5 rules) were selected for our study as they give expressive models, and Naive Bayes was used as a baseline algorithm for comparison purposes. We first learned the six CDRS category scores (memory, orientation, judgement and problem solving, personal care, home and hobbies, and community affairs). These learned CDRS category scores were then used to learn the global CDRS scores. The Two-Stage ML model classified as well as or better than the published inter-rater agreements for both the category and global CDRS scoring by dementia experts. Furthermore, for the most critical distinction, normal versus very mildly impaired, the Two-Stage ML model was 28.1 and 6.6% more accurate than published performances by domain experts. Our study of the CDRS examined one of the largest, most diverse samples in the literature, suggesting that our findings are robust. The Two-Stage ML model also identified a CDRS category, Judgment and Problem Solving, which has low classification accuracy similar to published

  8. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star–galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star–galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star–galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  9. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star-galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star-galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star-galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  10. Machine Learning Techniques for Combining Multi-Model Climate Projections (Invited)

    NASA Astrophysics Data System (ADS)

    Monteleoni, C.

    2013-12-01

    The threat of climate change is one of the greatest challenges currently facing society. Given the profound impact machine learning has made on the natural sciences to which it has been applied, such as the field of bioinformatics, machine learning is poised to accelerate discovery in climate science. Recent advances in the fledgling field of climate informatics have demonstrated the promise of machine learning techniques for problems in climate science. A key problem in climate science is how to combine the projections of the multi-model ensemble of global climate models that inform the Intergovernmental Panel on Climate Change (IPCC). I will present three approaches to this problem. Our Tracking Climate Models (TCM) work demonstrated the promise of an algorithm for online learning with expert advice, for this task. Given temperature projections and hindcasts from 20 IPCC global climate models, and over 100 years of historical temperature data, TCM generated predictions that tracked the changing sequence of which model currently predicts best. On historical data, at both annual and monthly time-scales, and in future simulations, TCM consistently outperformed the average over climate models, the existing benchmark in climate science, at both global and continental scales. We then extended TCM to take into account climate model projections at higher spatial resolutions, and to model geospatial neighborhood influence between regions. Our second algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model from which TCM is derived, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best climate model at a given location. We recently applied a third technique, sparse matrix completion, in which we create a sparse (incomplete) matrix from climate model projections/hindcasts and observed temperature data, and apply a matrix completion algorithm to recover it, yielding

  11. Statistical and Machine-Learning Classifier Framework to Improve Pulse Shape Discrimination System Design

    SciTech Connect

    Wurtz, R.; Kaplan, A.

    2015-10-28

    Pulse shape discrimination (PSD) is a variety of statistical classifier. Fully-­realized statistical classifiers rely on a comprehensive set of tools for designing, building, and implementing. PSD advances rely on improvements to the implemented algorithm. PSD advances can be improved by using conventional statistical classifier or machine learning methods. This paper provides the reader with a glossary of classifier-­building elements and their functions in a fully-­designed and operational classifier framework that can be used to discover opportunities for improving PSD classifier projects. This paper recommends reporting the PSD classifier’s receiver operating characteristic (ROC) curve and its behavior at a gamma rejection rate (GRR) relevant for realistic applications.

  12. Visual tracking based on extreme learning machine and sparse representation.

    PubMed

    Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

    2015-01-01

    The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359

  13. Visual Tracking Based on Extreme Learning Machine and Sparse Representation

    PubMed Central

    Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

    2015-01-01

    The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359

  14. Automatic programming of binary morphological machines by PAC learning

    NASA Astrophysics Data System (ADS)

    Barrera, Junior; Tomita, Nina S.; Correa da Silva, Flavio S.; Terada, Routo

    1995-08-01

    Binary image analysis problems can be solved by set operators implemented as programs for a binary morphological machine (BMM). This is a very general and powerful approach to solve this type of problem. However, the design of these programs is not a task manageable by nonexperts on mathematical morphology. In order to overcome this difficulty we have worked on tools that help users describe their goals at higher levels of abstraction and to translate them into BMM programs. Some of these tools are based on the representation of the goals of the user as a collection of input-output pairs of images and the estimation of the target operator from these data. PAC learning is a well suited methodology for this task, since in this theory 'concepts' are represented as Boolean functions that are equivalent to set operators. In order to apply this technique in practice we must have efficient learning algorithms. In this paper we introduce two PAC learning algorithms, both are based on the minimal representation of Boolean functions, which has a straightforward translation to the canonical decomposition of set operators. The first algorithm is based on the classical Quine-McCluskey algorithm for the simplification of Boolean functions, and the second one is based on a new idea for the construction of Boolean functions: the incremental splitting of intervals. We also present a comparative complexity analysis of the two algorithms. Finally, we give some application examples.

  15. Enhancement of Plant Metabolite Fingerprinting by Machine Learning1[W

    PubMed Central

    Scott, Ian M.; Vermeer, Cornelia P.; Liakata, Maria; Corol, Delia I.; Ward, Jane L.; Lin, Wanchang; Johnson, Helen E.; Whitehead, Lynne; Kular, Baldeep; Baker, John M.; Walsh, Sean; Dave, Anuja; Larson, Tony R.; Graham, Ian A.; Wang, Trevor L.; King, Ross D.; Draper, John; Beale, Michael H.

    2010-01-01

    Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by 1H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, 1H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted. PMID

  16. A Sustainable Model for Integrating Current Topics in Machine Learning Research into the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Georgiopoulos, M.; DeMara, R. F.; Gonzalez, A. J.; Wu, A. S.; Mollaghasemi, M.; Gelenbe, E.; Kysilka, M.; Secretan, J.; Sharma, C. A.; Alnsour, A. J.

    2009-01-01

    This paper presents an integrated research and teaching model that has resulted from an NSF-funded effort to introduce results of current Machine Learning research into the engineering and computer science curriculum at the University of Central Florida (UCF). While in-depth exposure to current topics in Machine Learning has traditionally occurred…

  17. Advancing Research on Undergraduate Science Learning

    ERIC Educational Resources Information Center

    Singer, Susan Rundell

    2013-01-01

    This special issue of "Journal of Research in Science Teaching" reflects conclusions and recommendations in the "Discipline-Based Education Research" (DBER) report and makes a substantial contribution to advancing the field. Research on undergraduate science learning is currently a loose affiliation of related fields. The…

  18. Tasks to Advance the Learning of Mathematics

    ERIC Educational Resources Information Center

    Greenes, Carole

    2014-01-01

    Tasks to Advance the Learning of Mathematics (TALMs) were developed to stimulate grades 5-8 students' curiosity about complex mathematical relationships, inspire them to reason abstractly and quantitatively, encourage them to consider and create alternative solution approaches, develop their skills to persuade others about the viability of one…

  19. Network Modeling and Energy-Efficiency Optimization for Advanced Machine-to-Machine Sensor Networks

    PubMed Central

    Jung, Sungmo; Kim, Jong Hyun; Kim, Seoksoo

    2012-01-01

    Wireless machine-to-machine sensor networks with multiple radio interfaces are expected to have several advantages, including high spatial scalability, low event detection latency, and low energy consumption. Here, we propose a network model design method involving network approximation and an optimized multi-tiered clustering algorithm that maximizes node lifespan by minimizing energy consumption in a non-uniformly distributed network. Simulation results show that the cluster scales and network parameters determined with the proposed method facilitate a more efficient performance compared to existing methods. PMID:23202190

  20. Application of Machine Learning to the Prediction of Vegetation Health

    NASA Astrophysics Data System (ADS)

    Burchfield, Emily; Nay, John J.; Gilligan, Jonathan

    2016-06-01

    This project applies machine learning techniques to remotely sensed imagery to train and validate predictive models of vegetation health in Bangladesh and Sri Lanka. For both locations, we downloaded and processed eleven years of imagery from multiple MODIS datasets which were combined and transformed into two-dimensional matrices. We applied a gradient boosted machines model to the lagged dataset values to forecast future values of the Enhanced Vegetation Index (EVI). The predictive power of raw spectral data MODIS products were compared across time periods and land use categories. Our models have significantly more predictive power on held-out datasets than a baseline. Though the tool was built to increase capacity to monitor vegetation health in data scarce regions like South Asia, users may include ancillary spatiotemporal datasets relevant to their region of interest to increase predictive power and to facilitate interpretation of model results. The tool can automatically update predictions as new MODIS data is made available by NASA. The tool is particularly well-suited for decision makers interested in understanding and predicting vegetation health dynamics in countries in which environmental data is scarce and cloud cover is a significant concern.

  1. Detecting falls with wearable sensors using machine learning techniques.

    PubMed

    Özdemir, Ahmet Turan; Barshan, Billur

    2014-01-01

    Falls are a serious public health problem and possibly life threatening for people in fall risk groups. We develop an automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions. Each unit comprises three tri-axial devices (accelerometer, gyroscope, and magnetometer/compass). Fourteen volunteers perform a standardized set of movements including 20 voluntary falls and 16 activities of daily living (ADLs), resulting in a large dataset with 2520 trials. To reduce the computational complexity of training and testing the classifiers, we focus on the raw data for each sensor in a 4 s time window around the point of peak total acceleration of the waist sensor, and then perform feature extraction and reduction. Most earlier studies on fall detection employ rule-based approaches that rely on simple thresholding of the sensor outputs. We successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs). We compare the performance and the computational complexity of the classifiers and achieve the best results with the k-NN classifier and LSM, with sensitivity, specificity, and accuracy all above 99%. These classifiers also have acceptable computational requirements for training and testing. Our approach would be applicable in real-world scenarios where data records of indeterminate length, containing multiple activities in sequence, are recorded.

  2. A global prediction of seafloor sediment porosity using machine learning

    NASA Astrophysics Data System (ADS)

    Martin, Kylara M.; Wood, Warren T.; Becker, Joseph J.

    2015-12-01

    Porosity (void ratio) is a critical parameter in models of acoustic propagation, bearing strength, and many other seafloor phenomena. However, like many seafloor phenomena, direct measurements are expensive and sparse. We show here how porosity everywhere at the seafloor can be estimated using a machine learning technique (specifically, Random Forests). Such techniques use sparsely acquired direct samples and dense grids of other parameters to produce a statistically optimal estimate where direct measurements are lacking. Our porosity estimate is both qualitatively more consistent with geologic principles than the results produced by interpolation and quantitatively more accurate than results produced by interpolation or regression methods. We present here a seafloor porosity estimate on a 5 arc min, pixel registered grid, produced using widely available, densely sampled grids of other seafloor properties. These techniques represent the only practical means of estimating seafloor properties in inaccessible regions of the seafloor (e.g., the Arctic).

  3. Machine learning, medical diagnosis, and biomedical engineering research - commentary.

    PubMed

    Foster, Kenneth R; Koprowski, Robert; Skufca, Joseph D

    2014-07-05

    A large number of papers are appearing in the biomedical engineering literature that describe the use of machine learning techniques to develop classifiers for detection or diagnosis of disease. However, the usefulness of this approach in developing clinically validated diagnostic techniques so far has been limited and the methods are prone to overfitting and other problems which may not be immediately apparent to the investigators. This commentary is intended to help sensitize investigators as well as readers and reviewers of papers to some potential pitfalls in the development of classifiers, and suggests steps that researchers can take to help avoid these problems. Building classifiers should be viewed not simply as an add-on statistical analysis, but as part and parcel of the experimental process. Validation of classifiers for diagnostic applications should be considered as part of a much larger process of establishing the clinical validity of the diagnostic technique.

  4. Liquid intake monitoring through breathing signal using machine learning

    NASA Astrophysics Data System (ADS)

    Dong, Bo; Biswas, Subir

    2013-05-01

    This paper presents the design, system structure and performance for a wireless and wearable diet monitoring system. Food and drink intake can be detected by the way of detecting a person's swallow events. The system works based on the key observation that a person's otherwise continuous breathing process is interrupted by a short apnea when she or he swallows as a part of solid or liquid intake process. We detect the swallows through the difference between normal breathing cycle and breathing cycle with swallows using a wearable chest-belt. Three popular machine learning algorithms have been applied on both time and frequency domain features. Discrimination power of features is then analyzed for applications where only small number of features is allowed. It is shown that high detection performance can be achieved with only few features.

  5. Machine learning, medical diagnosis, and biomedical engineering research - commentary

    PubMed Central

    2014-01-01

    A large number of papers are appearing in the biomedical engineering literature that describe the use of machine learning techniques to develop classifiers for detection or diagnosis of disease. However, the usefulness of this approach in developing clinically validated diagnostic techniques so far has been limited and the methods are prone to overfitting and other problems which may not be immediately apparent to the investigators. This commentary is intended to help sensitize investigators as well as readers and reviewers of papers to some potential pitfalls in the development of classifiers, and suggests steps that researchers can take to help avoid these problems. Building classifiers should be viewed not simply as an add-on statistical analysis, but as part and parcel of the experimental process. Validation of classifiers for diagnostic applications should be considered as part of a much larger process of establishing the clinical validity of the diagnostic technique. PMID:24998888

  6. Nonlinear machine learning and design of reconfigurable digital colloids.

    PubMed

    Long, Andrew W; Phillips, Carolyn L; Jankowksi, Eric; Ferguson, Andrew L

    2016-09-14

    Digital colloids, a cluster of freely rotating "halo" particles tethered to the surface of a central particle, were recently proposed as ultra-high density memory elements for information storage. Rational design of these digital colloids for memory storage applications requires a quantitative understanding of the thermodynamic and kinetic stability of the configurational states within which information is stored. We apply nonlinear machine learning to Brownian dynamics simulations of these digital colloids to extract the low-dimensional intrinsic manifold governing digital colloid morphology, thermodynamics, and kinetics. By modulating the relative size ratio between halo particles and central particles, we investigate the size-dependent configurational stability and transition kinetics for the 2-state tetrahedral (N = 4) and 30-state octahedral (N = 6) digital colloids. We demonstrate the use of this framework to guide the rational design of a memory storage element to hold a block of text that trades off the competing design criteria of memory addressability and volatility. PMID:27498992

  7. Robust Machine Learning Applied to Terascale Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, N. M.; Brunner, R. J.; Myers, A. D.

    2008-08-01

    We present recent results from the Laboratory for Cosmological Data Mining {http://lcdm.astro.uiuc.edu} at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic algorithms, the use of supercomputing resources at NCSA, and the cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million objects in the SDSS, improved photometric redshifts, and a full exploitation of the powerful k-nearest neighbor algorithm. This work is the first to apply the full power of these algorithms to contemporary terascale astronomical datasets, and the improvement over existing results is demonstrable. We discuss issues that we have encountered in dealing with data on the terascale, and possible solutions that can be implemented to deal with upcoming petascale datasets.

  8. Monitoring frog communities: An application of machine learning

    SciTech Connect

    Taylor, A.; Watson, G.; Grigg, G.; McCallum, H.

    1996-12-31

    Automatic recognition of animal vocalizations would be a valuable tool for a variety of biological research and environmental monitoring applications. We report the development of a software system which can recognize the vocalizations of 22 species of frogs which occur in an area of northern Australia. This software system will be used in unattended operation to monitor the effect on frog populations of the introduced Cane Toad. The system is based around classification of local peaks in the spectrogram of the audio signal using Quinlan`s machine learning system, C4.5. Unreliable identifications of peaks are aggregated together using a hierarchical structure of segments based on the typical temporal vocalization species` patterns. This produces robust system performance.

  9. Combining satellite imagery and machine learning to predict poverty.

    PubMed

    Jean, Neal; Burke, Marshall; Xie, Michael; Davis, W Matthew; Lobell, David B; Ermon, Stefano

    2016-08-19

    Reliable data on economic livelihoods remain scarce in the developing world, hampering efforts to study these outcomes and to design policies that improve them. Here we demonstrate an accurate, inexpensive, and scalable method for estimating consumption expenditure and asset wealth from high-resolution satellite imagery. Using survey and satellite data from five African countries--Nigeria, Tanzania, Uganda, Malawi, and Rwanda--we show how a convolutional neural network can be trained to identify image features that can explain up to 75% of the variation in local-level economic outcomes. Our method, which requires only publicly available data, could transform efforts to track and target poverty in developing countries. It also demonstrates how powerful machine learning techniques can be applied in a setting with limited training data, suggesting broad potential application across many scientific domains.

  10. Gene discovery for facioscapulohumeral muscular dystrophy by machine learning techniques.

    PubMed

    González-Navarro, Félix F; Belanche-Muñoz, Lluís A; Gámez-Moreno, María G; Flores-Ríos, Brenda L; Ibarra-Esquer, Jorge E; López-Morteo, Gabriel A

    2016-04-28

    Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results. PMID:26960968

  11. Machine-Learning Techniques Applied to Antibacterial Drug Discovery

    PubMed Central

    Durrant, Jacob D.; Amaro, Rommie E.

    2014-01-01

    The emergence of drug-resistant bacteria threatens to catapult humanity back to the pre-antibiotic era. Even now, multi-drug-resistant bacterial infections annually result in millions of hospital days, billions in healthcare costs, and, most importantly, tens of thousands of lives lost. As many pharmaceutical companies have abandoned antibiotic development in search of more lucrative therapeutics, academic researchers are uniquely positioned to fill the resulting vacuum. Traditional high-throughput screens and lead-optimization efforts are expensive and labor intensive. Computer-aided drug discovery techniques, which are cheaper and faster, can accelerate the identification of novel antibiotics in an academic setting, leading to improved hit rates and faster transitions to pre-clinical and clinical testing. The current review describes two machine-learning techniques, neural networks and decision trees, that have been used to identify experimentally validated antibiotics. We conclude by describing the future directions of this exciting field. PMID:25521642

  12. Combining satellite imagery and machine learning to predict poverty.

    PubMed

    Jean, Neal; Burke, Marshall; Xie, Michael; Davis, W Matthew; Lobell, David B; Ermon, Stefano

    2016-08-19

    Reliable data on economic livelihoods remain scarce in the developing world, hampering efforts to study these outcomes and to design policies that improve them. Here we demonstrate an accurate, inexpensive, and scalable method for estimating consumption expenditure and asset wealth from high-resolution satellite imagery. Using survey and satellite data from five African countries--Nigeria, Tanzania, Uganda, Malawi, and Rwanda--we show how a convolutional neural network can be trained to identify image features that can explain up to 75% of the variation in local-level economic outcomes. Our method, which requires only publicly available data, could transform efforts to track and target poverty in developing countries. It also demonstrates how powerful machine learning techniques can be applied in a setting with limited training data, suggesting broad potential application across many scientific domains. PMID:27540167

  13. Applying Machine Learning to GlueX Data Analysis

    NASA Astrophysics Data System (ADS)

    Boettcher, Thomas

    2014-03-01

    GlueX is a high energy physics experiment with the goal of collecting data necessary for understanding confinement in quantum chromodynamics. Beginning in 2015, GlueX will collect huge amounts of data describing billions of particle collisions. In preparation for data collection, efforts are underway to develop a methodology for analyzing these large data sets. One of the primary challenges in GlueX data analysis is isolating events of interest from a proportionally large background. GlueX has recently begun approaching this selection problem using machine learning algorithms, specifically boosted decision trees. Preliminary studies indicate that these algorithms have the potential to offer vast improvements in both signal selection efficiency and purity over more traditional techniques.

  14. Calibration transfer via an extreme learning machine auto-encoder.

    PubMed

    Chen, Wo-Ruo; Bin, Jun; Lu, Hong-Mei; Zhang, Zhi-Min; Liang, Yi-Zeng

    2016-03-21

    In order to solve the spectra standardization problem in near-infrared (NIR) spectroscopy, a Transfer via Extreme learning machine Auto-encoder Method (TEAM) has been proposed in this study. A comparative study among TEAM, piecewise direct standardization (PDS), generalized least squares (GLS) and calibration transfer methods based on canonical correlation analysis (CCA) was conducted, and the performances of these algorithms were benchmarked with three spectral datasets: corn, tobacco and pharmaceutical tablet spectra. The results show that TEAM is a stable method and can significantly reduce prediction errors compared with PDS, GLS and CCA. TEAM can also achieve the best RMSEPs in most cases with a small number of calibration sets. TEAM is implemented in Python language and available as an open source package at https://github.com/zmzhang/TEAM. PMID:26846329

  15. Stochastic upscaling in solid mechanics: An excercise in machine learning

    SciTech Connect

    Koutsourelakis, P.S.

    2007-09-10

    This paper presents a consistent theoretical and computational framework for upscaling in random microstructures. We adopt an information theoretic approach in order to quantify the informational content of the microstructural details and find ways to condense it while assessing quantitatively the approximation introduced. In particular, we substitute the high-dimensional microscale description by a lower-dimensional representation corresponding for example to an equivalent homogeneous medium. The probabilistic characteristics of the latter are determined by minimizing the distortion between actual macroscale predictions and the predictions made using the coarse model. A machine learning framework is essentially adopted in which a vector quantizer is trained using data generated computationally or collected experimentally. Several parallels and differences with similar problems in source coding theory are pointed out and an efficient computational tool is employed. Various applications in linear and non-linear problems in solid mechanics are examined.

  16. Machine-learning techniques applied to antibacterial drug discovery.

    PubMed

    Durrant, Jacob D; Amaro, Rommie E

    2015-01-01

    The emergence of drug-resistant bacteria threatens to revert humanity back to the preantibiotic era. Even now, multidrug-resistant bacterial infections annually result in millions of hospital days, billions in healthcare costs, and, most importantly, tens of thousands of lives lost. As many pharmaceutical companies have abandoned antibiotic development in search of more lucrative therapeutics, academic researchers are uniquely positioned to fill the pipeline. Traditional high-throughput screens and lead-optimization efforts are expensive and labor intensive. Computer-aided drug-discovery techniques, which are cheaper and faster, can accelerate the identification of novel antibiotics, leading to improved hit rates and faster transitions to preclinical and clinical testing. The current review describes two machine-learning techniques, neural networks and decision trees, that have been used to identify experimentally validated antibiotics. We conclude by describing the future directions of this exciting field.

  17. Identifying Energy-Efficient Concurrency Levels using Machine Learning

    SciTech Connect

    Curtis-Maury, M; Singh, K; Blagojevic, F; Nikolopoulos, D S; de Supinski, B R; Schulz, M; McKee, S A

    2007-07-23

    Multicore microprocessors have been largely motivated by the diminishing returns in performance and the increased power consumption of single-threaded ILP microprocessors. With the industry already shifting from multicore to many-core microprocessors, software developers must extract more thread-level parallelism from applications. Unfortunately, low power-efficiency and diminishing returns in performance remain major obstacles with many cores. Poor interaction between software and hardware, and bottlenecks in shared hardware structures often prevent scaling to many cores, even in applications where a high degree of parallelism is potentially available. In some cases, throwing additional cores at a problem may actually harm performance and increase power consumption. Better use of otherwise limitedly beneficial cores by software components such as hypervisors and operating systems can improve system-wide performance and reliability, even in cases where power consumption is not a main concern. In response to these observations, we evaluate an approach to throttle concurrency in parallel programs dynamically. We throttle concurrency to levels with higher predicted efficiency from both performance and energy standpoints, and we do so via machine learning, specifically artificial neural networks (ANNs). One advantage of using ANNs over similar techniques previously explored is that the training phase is greatly simplified, thereby reducing the burden on the end user. Using machine learning in the context of concurrency throttling is novel. We show that ANNs are effective for identifying energy-efficient concurrency levels in multithreaded scientific applications, and we do so using physical experimentation on a state-of-the-art quad-core Xeon platform.

  18. Advancement in Productivity of Arabic into English Machine Translation Systems from 2008 to 2013

    ERIC Educational Resources Information Center

    Abu-Al-Sha'r, Awatif M.; AbuSeileek, Ali F.

    2013-01-01

    This paper attempts to compare between the advancements in the productivity of Arabic into English Machine Translation Systems between two years, 2008 and 2013. It also aims to evaluate the progress achieved by various systems of Arabic into English electronic translation between the two years. For tracing such advancement, a comparative analysis…

  19. Active machine learning-driven experimentation to determine compound effects on protein patterns

    PubMed Central

    Naik, Armaghan W; Kangas, Joshua D; Sullivan, Devin P; Murphy, Robert F

    2016-01-01

    High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learning-driven biological experimentation in which the set of possible phenotypes is unknown in advance. DOI: http://dx.doi.org/10.7554/eLife.10047.001 PMID:26840049

  20. Is extreme learning machine feasible? A theoretical assessment (part II).

    PubMed

    Lin, Shaobo; Liu, Xia; Fang, Jian; Xu, Zongben

    2015-01-01

    An extreme learning machine (ELM) can be regarded as a two-stage feed-forward neural network (FNN) learning system that randomly assigns the connections with and within hidden neurons in the first stage and tunes the connections with output neurons in the second stage. Therefore, ELM training is essentially a linear learning problem, which significantly reduces the computational burden. Numerous applications show that such a computation burden reduction does not degrade the generalization capability. It has, however, been open that whether this is true in theory. The aim of this paper is to study the theoretical feasibility of ELM by analyzing the pros and cons of ELM. In the previous part of this topic, we pointed out that via appropriately selected activation functions, ELM does not degrade the generalization capability in the sense of expectation. In this paper, we launch the study in a different direction and show that the randomness of ELM also leads to certain negative consequences. On one hand, we find that the randomness causes an additional uncertainty problem of ELM, both in approximation and learning. On the other hand, we theoretically justify that there also exist activation functions such that the corresponding ELM degrades the generalization capability. In particular, we prove that the generalization capability of ELM with Gaussian kernel is essentially worse than that of FNN with Gaussian kernel. To facilitate the use of ELM, we also provide a remedy to such a degradation. We find that the well-developed coefficient regularization technique can essentially improve the generalization capability. The obtained results reveal the essential characteristic of ELM in a certain sense and give theoretical guidance concerning how to use ELM.

  1. Gravity Spy - Integrating LIGO detector characterization, citizen science, and machine learning

    NASA Astrophysics Data System (ADS)

    Zevin, Michael; Gravity Spy

    2016-06-01

    On September 14th 2015, the Advanced Laser Interferometer Gravitational-wave Observatory (aLIGO) made the first direct observation of gravitational waves and opened a new field of observational astronomy. However, being the most complicated and sensitve experiment ever undertaken in gravitational physics, aLIGO is susceptible to various sources of environmental and instrumental noise that hinder the search for more gravitational waves.Of particular concern are transient, non-Gaussian noise features known as glitches. Glitches can mimic true astrophysical gravitational waves, occur at a high enough frequency to be coherent between the two detectors, and generally worsen aLIGO's detection capabilities. The proper classification and charaterization of glitches is paramount in optimizing aLIGO's ability to detect gravitational waves. However, teaching computers to identify and morphologically classify these artifacts is exceedingly difficult.Human intuition has proven to be a useful tool in classifcation probelms such as this. Gravity Spy is an innovative, interdisciplinary project hosted by Zooniverse that combines aLIGO detector characterization, citizen science, machine learning, and social science. In this project, citizen scientists and computers will work together in a sybiotic relationship that leverages human pattern recognition and the ability of machine learning to process large amounts of data systematically: volunteers classify triggers from the aLIGO data steam that are constantly updated as aLIGO takes in new data, and these classifications are used to train machine learning algorithms which proceed to classify the bulk of aLIGO data and feed questionable glithces back to the users.In this talk, I will discuss the workflow and initial results of the Gravity Spy project with regard to aLIGO's future observing runs and highlight the potential of such citizen science projects in promoting nascent fields such as gravitational wave astrophysics.

  2. Cost-Benefit Analysis of Computer Resources for Machine Learning

    USGS Publications Warehouse

    Champion, Richard A.

    2007-01-01

    Machine learning describes pattern-recognition algorithms - in this case, probabilistic neural networks (PNNs). These can be computationally intensive, in part because of the nonlinear optimizer, a numerical process that calibrates the PNN by minimizing a sum of squared errors. This report suggests efficiencies that are expressed as cost and benefit. The cost is computer time needed to calibrate the PNN, and the benefit is goodness-of-fit, how well the PNN learns the pattern in the data. There may be a point of diminishing returns where a further expenditure of computer resources does not produce additional benefits. Sampling is suggested as a cost-reduction strategy. One consideration is how many points to select for calibration and another is the geometric distribution of the points. The data points may be nonuniformly distributed across space, so that sampling at some locations provides additional benefit while sampling at other locations does not. A stratified sampling strategy can be designed to select more points in regions where they reduce the calibration error and fewer points in regions where they do not. Goodness-of-fit tests ensure that the sampling does not introduce bias. This approach is illustrated by statistical experiments for computing correlations between measures of roadless area and population density for the San Francisco Bay Area. The alternative to training efficiencies is to rely on high-performance computer systems. These may require specialized programming and algorithms that are optimized for parallel performance.

  3. Using machine learning techniques to automate sky survey catalog generation

    NASA Technical Reports Server (NTRS)

    Fayyad, Usama M.; Roden, J. C.; Doyle, R. J.; Weir, Nicholas; Djorgovski, S. G.

    1993-01-01

    We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data.

  4. Extreme learning machine and adaptive sparse representation for image classification.

    PubMed

    Cao, Jiuwen; Zhang, Kai; Luo, Minxia; Yin, Chun; Lai, Xiaoping

    2016-09-01

    Recent research has shown the speed advantage of extreme learning machine (ELM) and the accuracy advantage of sparse representation classification (SRC) in the area of image classification. Those two methods, however, have their respective drawbacks, e.g., in general, ELM is known to be less robust to noise while SRC is known to be time-consuming. Consequently, ELM and SRC complement each other in computational complexity and classification accuracy. In order to unify such mutual complementarity and thus further enhance the classification performance, we propose an efficient hybrid classifier to exploit the advantages of ELM and SRC in this paper. More precisely, the proposed classifier consists of two stages: first, an ELM network is trained by supervised learning. Second, a discriminative criterion about the reliability of the obtained ELM output is adopted to decide whether the query image can be correctly classified or not. If the output is reliable, the classification will be performed by ELM; otherwise the query image will be fed to SRC. Meanwhile, in the stage of SRC, a sub-dictionary that is adaptive to the query image instead of the entire dictionary is extracted via the ELM output. The computational burden of SRC thus can be reduced. Extensive experiments on handwritten digit classification, landmark recognition and face recognition demonstrate that the proposed hybrid classifier outperforms ELM and SRC in classification accuracy with outstanding computational efficiency.

  5. Extreme learning machine and adaptive sparse representation for image classification.

    PubMed

    Cao, Jiuwen; Zhang, Kai; Luo, Minxia; Yin, Chun; Lai, Xiaoping

    2016-09-01

    Recent research has shown the speed advantage of extreme learning machine (ELM) and the accuracy advantage of sparse representation classification (SRC) in the area of image classification. Those two methods, however, have their respective drawbacks, e.g., in general, ELM is known to be less robust to noise while SRC is known to be time-consuming. Consequently, ELM and SRC complement each other in computational complexity and classification accuracy. In order to unify such mutual complementarity and thus further enhance the classification performance, we propose an efficient hybrid classifier to exploit the advantages of ELM and SRC in this paper. More precisely, the proposed classifier consists of two stages: first, an ELM network is trained by supervised learning. Second, a discriminative criterion about the reliability of the obtained ELM output is adopted to decide whether the query image can be correctly classified or not. If the output is reliable, the classification will be performed by ELM; otherwise the query image will be fed to SRC. Meanwhile, in the stage of SRC, a sub-dictionary that is adaptive to the query image instead of the entire dictionary is extracted via the ELM output. The computational burden of SRC thus can be reduced. Extensive experiments on handwritten digit classification, landmark recognition and face recognition demonstrate that the proposed hybrid classifier outperforms ELM and SRC in classification accuracy with outstanding computational efficiency. PMID:27389571

  6. Skull-Stripping with Machine Learning Deformable Organisms

    PubMed Central

    Prasad, Gautam; Joshi, Anand A.; Feng, Albert; Toga, Arthur W.; Thompson, Paul M.; Terzopoulos, Demetri

    2014-01-01

    Background Segmentation methods for medical images may not generalize well to new data sets or new tasks, hampering their utility. We attempt to remedy these issues using deformable organisms to create an easily customizable segmentation plan. We validate our framework by creating a plan to locate the brain in 3D magnetic resonance images of the head (skull-stripping). New Method Our method borrows ideas from artificial life to govern a set of deformable models. We use control processes such as sensing, proactive planning, reactive behavior, and knowledge representation to segment an image. The image may have landmarks and features specific to that dataset; these may be easily incorporated into the plan. In addition, we use a machine learning method to make our segmentation more accurate. Results Our method had the least Hausdorff distance error, but included slightly less brain voxels (false negatives). It also had the lowest false positive error and performed on par to skull-stripping specific method on other metrics. Comparison with Existing Method(s) We tested our method on 838 T1-weighted images, evaluating results using distance and overlap error metrics based on expert gold standard segmentations. We evaluated the results before and after the learning step to quantify its benefit; we also compare our results to three other widely used methods: BSE, BET, and the Hybrid Watershed algorithm. Conclusions Our framework captures diverse categories of information needed for brain segmentation and will provide a foundation for tackling a wealth of segmentation problems. PMID:25124851

  7. Relational Machine Learning for Electronic Health Record-Driven Phenotyping

    PubMed Central

    Peissig, Peggy L.; Costa, Vitor Santos; Caldwell, Michael D.; Rottscheit, Carla; Berg, Richard L.; Mendonca, Eneida A.; Page, David

    2014-01-01

    Objective Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient’s clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using Inductive Logic Programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping. Methods Two relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance. Results We developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each MLapproach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p=0.039), J48 (p=0.003) and JRIP (p=0.003). Discussion ILP has the potential to improve

  8. Machine learning techniques for energy optimization in mobile embedded systems

    NASA Astrophysics Data System (ADS)

    Donohoo, Brad Kyoshi

    Mobile smartphones and other portable battery operated embedded systems (PDAs, tablets) are pervasive computing devices that have emerged in recent years as essential instruments for communication, business, and social interactions. While performance, capabilities, and design are all important considerations when purchasing a mobile device, a long battery lifetime is one of the most desirable attributes. Battery technology and capacity has improved over the years, but it still cannot keep pace with the power consumption demands of today's mobile devices. This key limiter has led to a strong research emphasis on extending battery lifetime by minimizing energy consumption, primarily using software optimizations. This thesis presents two strategies that attempt to optimize mobile device energy consumption with negligible impact on user perception and quality of service (QoS). The first strategy proposes an application and user interaction aware middleware framework that takes advantage of user idle time between interaction events of the foreground application to optimize CPU and screen backlight energy consumption. The framework dynamically classifies mobile device applications based on their received interaction patterns, then invokes a number of different power management algorithms to adjust processor frequency and screen backlight levels accordingly. The second strategy proposes the usage of machine learning techniques to learn a user's mobile device usage pattern pertaining to spatiotemporal and device contexts, and then predict energy-optimal data and location interface configurations. By learning where and when a mobile device user uses certain power-hungry interfaces (3G, WiFi, and GPS), the techniques, which include variants of linear discriminant analysis, linear logistic regression, non-linear logistic regression, and k-nearest neighbor, are able to dynamically turn off unnecessary interfaces at runtime in order to save energy.

  9. OpenCL based machine learning labeling of biomedical datasets

    NASA Astrophysics Data System (ADS)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  10. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  11. The identification of cis-regulatory elements: A review from a machine learning perspective.

    PubMed

    Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

    2015-12-01

    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field.

  12. Applications of machine learning and high-dimensional visualization in cancer detection, diagnosis, and management.

    PubMed

    McCarthy, John F; Marx, Kenneth A; Hoffman, Patrick E; Gee, Alexander G; O'Neil, Philip; Ujwal, M L; Hotchkiss, John

    2004-05-01

    Recent technical advances in combinatorial chemistry, genomics, and proteomics have made available large databases of biological and chemical information that have the potential to dramatically improve our understanding of cancer biology at the molecular level. Such an understanding of cancer biology could have a substantial impact on how we detect, diagnose, and manage cancer cases in the clinical setting. One of the biggest challenges facing clinical oncologists is how to extract clinically useful knowledge from the overwhelming amount of raw molecular data that are currently available. In this paper, we discuss how the exploratory data analysis techniques of machine learning and high-dimensional visualization can be applied to extract clinically useful knowledge from a heterogeneous assortment of molecular data. After an introductory overview of machine learning and visualization techniques, we describe two proprietary algorithms (PURS and RadViz) that we have found to be useful in the exploratory analysis of large biological data sets. We next illustrate, by way of three examples, the applicability of these techniques to cancer detection, diagnosis, and management using three very different types of molecular data. We first discuss the use of our exploratory analysis techniques on proteomic mass spectroscopy data for the detection of ovarian cancer. Next, we discuss the diagnostic use of these techniques on gene expression data to differentiate between squamous and adenocarcinoma of the lung. Finally, we illustrate the use of such techniques in selecting from a database of chemical compounds those most effective in managing patients with melanoma versus leukemia.

  13. The identification of cis-regulatory elements: A review from a machine learning perspective.

    PubMed

    Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

    2015-12-01

    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field. PMID:26499213

  14. The applications of machine learning algorithms in the modeling of estrogen-like chemicals.

    PubMed

    Liu, Huanxiang; Yao, Xiaojun; Gramatica, Paola

    2009-06-01

    Increasing concern is being shown by the scientific community, government regulators, and the public about endocrine-disrupting chemicals that, in the environment, are adversely affecting human and wildlife health through a variety of mechanisms, mainly estrogen receptor-mediated mechanisms of toxicity. Because of the large number of such chemicals in the environment, there is a great need for an effective means of rapidly assessing endocrine-disrupting activity in the toxicology assessment process. When faced with the challenging task of screening large libraries of molecules for biological activity, the benefits of computational predictive models based on quantitative structure-activity relationships to identify possible estrogens become immediately obvious. Recently, in order to improve the accuracy of prediction, some machine learning techniques were introduced to build more effective predictive models. In this review we will focus our attention on some recent advances in the use of these methods in modeling estrogen-like chemicals. The advantages and disadvantages of the machine learning algorithms used in solving this problem, the importance of the validation and performance assessment of the built models as well as their applicability domains will be discussed.

  15. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.

    PubMed

    Bravo, Àlex; Li, Tong Shu; Su, Andrew I; Good, Benjamin M; Furlong, Laura I

    2016-01-01

    Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V.

  16. Color-texture based extreme learning machines for tissue tumor classification

    NASA Astrophysics Data System (ADS)

    Yang, X.; Yeo, S. Y.; Wong, S. T.; Lee, G.; Su, Y.; Hong, J. M.; Choo, A.; Chen, S.

    2016-03-01

    In histopathological classification and diagnosis of cancer cases, pathologists perform visual assessments of immunohistochemistry (IHC)-stained biomarkers in cells to determine tumor versus non-tumor tissues. One of the prerequisites for such assessments is the correct identification of regions-of-interest (ROIs) with relevant histological features. Advances in image processing and machine learning give rise to the possibility of full automation in ROI identification by identifying image features such as colors and textures. Such computer-aided diagnostic systems could enhance research output and efficiency in identifying the pathology (normal, non-tumor or tumor) of a tissue pattern from ROI images. In this paper, a computational method using color-texture based extreme learning machines (ELM) is proposed for automatic tissue tumor classification. Our approach consists of three steps: (1) ROIs are manually identified and annotated from individual cores of tissue microarrays (TMAs); (2) color and texture features are extracted from the ROIs images; (3) ELM is applied to the extracted features to classify the ROIs into non-tumor or tumor categories. The proposed approach is tested on 100 sets of images from a kidney cancer TMA and the results show that ELM is able to achieve classification accuracies of 91.19% and 88.72% with a Gaussian radial basis function (RBF) and linear kernel, respectively, which is superior to using SVM with the same kernels.

  17. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.

    PubMed

    Bravo, Àlex; Li, Tong Shu; Su, Andrew I; Good, Benjamin M; Furlong, Laura I

    2016-01-01

    Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V. PMID:27307137

  18. Cortical feature analysis and machine learning improves detection of “MRI-negative” focal cortical dysplasia

    PubMed Central

    Ahmed, Bilal; Brodley, Carla E.; Blackmon, Karen E.; Kuzniecky, Ruben; Barash, Gilad; Carlson, Chad; Quinn, Brian T.; Doyle, Werner; French, Jacqueline; Devinsky, Orrin; Thesen, Thomas

    2015-01-01

    Focal cortical dysplasia (FCD) is the most common cause of pediatric epilepsy and the third most common lesion in adults with treatment-resistant epilepsy. Advances in MRI have revolutionized the diagnosis of FCD, resulting in higher success rates for resective epilepsy surgery. However, many histologically confirmed FCD patients have normal pre-surgical MRI studies (‘MRI-negative’), making pre-surgical diagnosis difficult. The purpose of this study is to test whether a novel MRI post-processing method successfully detects histopathologically-verified FCD in a sample of patients without visually appreciable lesions. We applied an automated quantitative morphometry approach which computed five surface-based MRI features and combined them in a machine learning model to classify lesional and non-lesional vertices. Accuracy was defined by classifying contiguous vertices as “lesional” when they fell within the surgical resection region. Our multivariate method correctly detected the lesion in 6 of 7 MRI-positive patients, which is comparable with the detection rates that have been reported in univariate vertex-based morphometry studies. More significantly, in patients that were MRI-negative, machine learning correctly identified 14 out of 24 FCD lesions (58%). This was achieved after separating abnormal thickness and thinness into distinct classifiers, as well as separating sulcal and gyral regions. Results demonstrate that MRI-negative images contain sufficient information to aid in the in-vivo detection of visually elusive FCD lesions. PMID:26037845

  19. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text

    PubMed Central

    Bravo, Àlex; Li, Tong Shu; Su, Andrew I.; Good, Benjamin M.; Furlong, Laura I.

    2016-01-01

    Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects. Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V PMID:27307137

  20. Structure classification and melting temperature prediction in octet AB solids via machine learning

    NASA Astrophysics Data System (ADS)

    Pilania, G.; Gubernatis, J. E.; Lookman, T.

    2015-06-01

    resulting model to be 11 % of the mean melting temperature of the data, but we note that if the accuracy of this predicted error is itself measured, our estimated fitting error itself has a root-mean-square error of 50 % . In short, what we illustrate is that classification and regression predictions can vary significantly, depending on the details of how machine learning methods are applied to small data sets. This variation makes it important, if not essential, to average the predictions and compute confidence intervals about these averages to report results meaningfully. However, when properly used, these statistical methods can advance our understanding and improve predictions of material properties even for small data sets.

  1. Detection of Alzheimer's disease by displacement field and machine learning.

    PubMed

    Zhang, Yudong; Wang, Shuihua

    2015-01-01

    Aim. Alzheimer's disease (AD) is a chronic neurodegenerative disease. Recently, computer scientists have developed various methods for early detection based on computer vision and machine learning techniques. Method. In this study, we proposed a novel AD detection method by displacement field (DF) estimation between a normal brain and an AD brain. The DF was treated as the AD-related features, reduced by principal component analysis (PCA), and finally fed into three classifiers: support vector machine (SVM), generalized eigenvalue proximal SVM (GEPSVM), and twin SVM (TSVM). The 10-fold cross validation repeated 50 times. Results. The results showed the "DF + PCA + TSVM" achieved the accuracy of 92.75 ± 1.77, sensitivity of 90.56 ± 1.15, specificity of 93.37 ± 2.05, and precision of 79.61 ± 2.21. This result is better than or comparable with not only the other proposed two methods, but also ten state-of-the-art methods. Besides, our method discovers the AD is related to following brain regions disclosed in recent publications: Angular Gyrus, Anterior Cingulate, Cingulate Gyrus, Culmen, Cuneus, Fusiform Gyrus, Inferior Frontal Gyrus, Inferior Occipital Gyrus, Inferior Parietal Lobule, Inferior Semi-Lunar Lobule, Inferior Temporal Gyrus, Insula, Lateral Ventricle, Lingual Gyrus, Medial Frontal Gyrus, Middle Frontal Gyrus, Middle Occipital Gyrus, Middle Temporal Gyrus, Paracentral Lobule, Parahippocampal Gyrus, Postcentral Gyrus, Posterior Cingulate, Precentral Gyrus, Precuneus, Sub-Gyral, Superior Parietal Lobule, Superior Temporal Gyrus, Supramarginal Gyrus, and Uncus. Conclusion. The displacement filed is effective in detection of AD and related brain-regions.

  2. Machine learning applications in cancer prognosis and prediction.

    PubMed

    Kourou, Konstantina; Exarchos, Themis P; Exarchos, Konstantinos P; Karamouzis, Michalis V; Fotiadis, Dimitrios I

    2015-01-01

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.

  3. Modeling the Swift BAT Trigger Algorithm with Machine Learning

    NASA Technical Reports Server (NTRS)

    Graff, Philip B.; Lien, Amy Y.; Baker, John G.; Sakamoto, Takanori

    2015-01-01

    To draw inferences about gamma-ray burst (GRB) source populations based on Swift observations, it is essential to understand the detection efficiency of the Swift burst alert telescope (BAT). This study considers the problem of modeling the Swift BAT triggering algorithm for long GRBs, a computationally expensive procedure, and models it using machine learning algorithms. A large sample of simulated GRBs from Lien et al. (2014) is used to train various models: random forests, boosted decision trees (with AdaBoost), support vector machines, and artificial neural networks. The best models have accuracies of approximately greater than 97% (approximately less than 3% error), which is a significant improvement on a cut in GRB flux which has an accuracy of 89:6% (10:4% error). These models are then used to measure the detection efficiency of Swift as a function of redshift z, which is used to perform Bayesian parameter estimation on the GRB rate distribution. We find a local GRB rate density of eta(sub 0) approximately 0.48(+0.41/-0.23) Gpc(exp -3) yr(exp -1) with power-law indices of eta(sub 1) approximately 1.7(+0.6/-0.5) and eta(sub 2) approximately -5.9(+5.7/-0.1) for GRBs above and below a break point of z(sub 1) approximately 6.8(+2.8/-3.2). This methodology is able to improve upon earlier studies by more accurately modeling Swift detection and using this for fully Bayesian model fitting. The code used in this is analysis is publicly available online.

  4. Machine learning applications in cancer prognosis and prediction

    PubMed Central

    Kourou, Konstantina; Exarchos, Themis P.; Exarchos, Konstantinos P.; Karamouzis, Michalis V.; Fotiadis, Dimitrios I.

    2014-01-01

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes. PMID:25750696

  5. Advancing the Relationship between Business School Ranking and Student Learning

    ERIC Educational Resources Information Center

    Elbeck, Matt

    2009-01-01

    This commentary advances a positive relationship between a business school's ranking in the popular press and student learning by advocating market-oriented measures of student learning. A framework for student learning is based on the Assurance of Learning mandated by the Association to Advance Collegiate Schools of Business International,…

  6. Using financial risk measures for analyzing generalization performance of machine learning models.

    PubMed

    Takeda, Akiko; Kanamori, Takafumi

    2014-09-01

    We propose a unified machine learning model (UMLM) for two-class classification, regression and outlier (or novelty) detection via a robust optimization approach. The model embraces various machine learning models such as support vector machine-based and minimax probability machine-based classification and regression models. The unified framework makes it possible to compare and contrast existing learning models and to explain their differences and similarities. In this paper, after relating existing learning models to UMLM, we show some theoretical properties for UMLM. Concretely, we show an interpretation of UMLM as minimizing a well-known financial risk measure (worst-case value-at risk (VaR) or conditional VaR), derive generalization bounds for UMLM using such a risk measure, and prove that solving problems of UMLM leads to estimators with the minimized generalization bounds. Those theoretical properties are applicable to related existing learning models.

  7. Application of machine learning for the evaluation of turfgrass plots using aerial images

    NASA Astrophysics Data System (ADS)

    Ding, Ke; Raheja, Amar; Bhandari, Subodh; Green, Robert L.

    2016-05-01

    Historically, investigation of turfgrass characteristics have been limited to visual ratings. Although relevant information may result from such evaluations, final inferences may be questionable because of the subjective nature in which the data is collected. Recent advances in computer vision techniques allow researchers to objectively measure turfgrass characteristics such as percent ground cover, turf color, and turf quality from the digital images. This paper focuses on developing a methodology for automated assessment of turfgrass quality from aerial images. Images of several turfgrass plots of varying quality were gathered using a camera mounted on an unmanned aerial vehicle. The quality of these plots were also evaluated based on visual ratings. The goal was to use the aerial images to generate quality evaluations on a regular basis for the optimization of water treatment. Aerial images are used to train a neural network so that appropriate features such as intensity, color, and texture of the turfgrass are extracted from these images. Neural network is a nonlinear classifier commonly used in machine learning. The output of the neural network trained model is the ratings of the grass, which is compared to the visual ratings. Currently, the quality and the color of turfgrass, measured as the greenness of the grass, are evaluated. The textures are calculated using the Gabor filter and co-occurrence matrix. Other classifiers such as support vector machines and simpler linear regression models such as Ridge regression and LARS regression are also used. The performance of each model is compared. The results show encouraging potential for using machine learning techniques for the evaluation of turfgrass quality and color.

  8. Auto-adaptive robot-aided therapy using machine learning techniques.

    PubMed

    Badesa, Francisco J; Morales, Ricardo; Garcia-Aracil, Nicolas; Sabater, J M; Casals, Alicia; Zollo, Loredana

    2014-09-01

    This paper presents an application of a classification method to adaptively and dynamically modify the therapy and real-time displays of a virtual reality system in accordance with the specific state of each patient using his/her physiological reactions. First, a theoretical background about several machine learning techniques for classification is presented. Then, nine machine learning techniques are compared in order to select the best candidate in terms of accuracy. Finally, first experimental results are presented to show that the therapy can be modulated in function of the patient state using machine learning classification techniques.

  9. Rapid Probabilistic Source Inversion Using Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Kaeufl, P.; Valentine, A. P.; Trampert, J.

    2013-12-01

    Determination of earthquake source parameters is an important task in seismology. For many applications, it is also valuable to understand the uncertainties associated with these determinations, and this is particularly true in the context of earthquake early warning and hazard mitigation. We present a framework for probabilistic centroid moment tensor point source inversions in near real-time, applicable to a wide variety of data-types. Our methodology allows us to find an approximation to p(m|d), the conditional probability of source parameters (m) given observations, (d). This approximation is obtained by smoothly interpolating a set of random prior samples, using a machine learning algorithm able to learn the mapping from d to m. The approximation obtained can be evaluated within milliseconds on a standard desktop computer for a new observation (d). This makes the method well suited for use in situations such as earthquake early warning, where inversions must be performed routinely, for a fixed station geometry, and where it is important that results are obtained rapidly. This is a major advantage over traditional sampling based techniques, such as Markov-Chain Monte-Carlo methods, where a re-sampling of the posterior is necessary every time a new observation is made. We demonstrated the method by applying it to a regional static GPS displacement data set for the 2010 MW 7.2 El Mayor Cucapah earthquake in Baja California and obtained estimates of logarithmic magnitude, centroid location and depth, and focal mechanism (Käufl et al., submitted). We will present an extension of this approach to the inversion of full waveforms and explore possibilities for jointly inverting seismic and geodetic data. (1) P. Käufl, A. P. Valentine, T.B. O'Toole, J. Trampert, submitted, Geophysical Journal International

  10. Prediction of preterm deliveries from EHG signals using machine learning.

    PubMed

    Fergus, Paul; Cheung, Pauline; Hussain, Abir; Al-Jumeily, Dhiya; Dobbins, Chelsea; Iram, Shamaila

    2013-01-01

    There has been some improvement in the treatment of preterm infants, which has helped to increase their chance of survival. However, the rate of premature births is still globally increasing. As a result, this group of infants are most at risk of developing severe medical conditions that can affect the respiratory, gastrointestinal, immune, central nervous, auditory and visual systems. In extreme cases, this can also lead to long-term conditions, such as cerebral palsy, mental retardation, learning difficulties, including poor health and growth. In the US alone, the societal and economic cost of preterm births, in 2005, was estimated to be $26.2 billion, per annum. In the UK, this value was close to £2.95 billion, in 2009. Many believe that a better understanding of why preterm births occur, and a strategic focus on prevention, will help to improve the health of children and reduce healthcare costs. At present, most methods of preterm birth prediction are subjective. However, a strong body of evidence suggests the analysis of uterine electrical signals (Electrohysterography), could provide a viable way of diagnosing true labour and predict preterm deliveries. Most Electrohysterography studies focus on true labour detection during the final seven days, before labour. The challenge is to utilise Electrohysterography techniques to predict preterm delivery earlier in the pregnancy. This paper explores this idea further and presents a supervised machine learning approach that classifies term and preterm records, using an open source dataset containing 300 records (38 preterm and 262 term). The synthetic minority oversampling technique is used to oversample the minority preterm class, and cross validation techniques, are used to evaluate the dataset against other similar studies. Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier. PMID

  11. Prediction of Preterm Deliveries from EHG Signals Using Machine Learning

    PubMed Central

    Fergus, Paul; Cheung, Pauline; Hussain, Abir; Al-Jumeily, Dhiya; Dobbins, Chelsea; Iram, Shamaila

    2013-01-01

    There has been some improvement in the treatment of preterm infants, which has helped to increase their chance of survival. However, the rate of premature births is still globally increasing. As a result, this group of infants are most at risk of developing severe medical conditions that can affect the respiratory, gastrointestinal, immune, central nervous, auditory and visual systems. In extreme cases, this can also lead to long-term conditions, such as cerebral palsy, mental retardation, learning difficulties, including poor health and growth. In the US alone, the societal and economic cost of preterm births, in 2005, was estimated to be $26.2 billion, per annum. In the UK, this value was close to £2.95 billion, in 2009. Many believe that a better understanding of why preterm births occur, and a strategic focus on prevention, will help to improve the health of children and reduce healthcare costs. At present, most methods of preterm birth prediction are subjective. However, a strong body of evidence suggests the analysis of uterine electrical signals (Electrohysterography), could provide a viable way of diagnosing true labour and predict preterm deliveries. Most Electrohysterography studies focus on true labour detection during the final seven days, before labour. The challenge is to utilise Electrohysterography techniques to predict preterm delivery earlier in the pregnancy. This paper explores this idea further and presents a supervised machine learning approach that classifies term and preterm records, using an open source dataset containing 300 records (38 preterm and 262 term). The synthetic minority oversampling technique is used to oversample the minority preterm class, and cross validation techniques, are used to evaluate the dataset against other similar studies. Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier. PMID

  12. Machine learning amplifies the effect of parental family history of Alzheimer's disease on list learning strategy.

    PubMed

    Chang, Timothy S; Coen, Michael H; La Rue, Asenath; Jonaitis, Erin; Koscik, Rebecca L; Hermann, Bruce; Sager, Mark A

    2012-05-01

    Identification of preclinical Alzheimer's disease (AD) is an essential first step in developing interventions to prevent or delay disease onset. In this study, we examine the hypothesis that deeper analyses of traditional cognitive tests may be useful in identifying subtle but potentially important learning and memory differences in asymptomatic populations that differ in risk for developing Alzheimer's disease. Subjects included 879 asymptomatic higher-risk persons (middle-aged children of parents with AD) and 355 asymptotic lower-risk persons (middle-aged children of parents without AD). All were administered the Rey Auditory Verbal Learning Test at baseline. Using machine learning approaches, we constructed a new measure that exploited finer differences in memory strategy than previous work focused on serial position and subjective organization. The new measure, based on stochastic gradient descent, provides a greater degree of statistical separation (p = 1.44 × 10-5) than previously observed for asymptomatic family history and non-family history groups, while controlling for apolipoprotein epsilon 4, age, gender, and education level. The results of our machine learning approach support analyzing memory strategy in detail to probe potential disease onset. Such distinct differences may be exploited in asymptomatic middle-aged persons as a potential risk factor for AD.

  13. Detection of blue-white veil areas in dermoscopy images using machine learning techniques

    NASA Astrophysics Data System (ADS)

    Celebi, M. E.; Kingravi, Hassan A.; Aslandogan, Y. A.; Stoecker, William V.

    2006-03-01

    As a result of the advances in skin imaging technology and the development of suitable image processing techniques, during the last decade, there has been a significant increase of interest in the computer-aided diagnosis of skin cancer. Dermoscopy is a non-invasive skin imaging technique which permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. One of the useful features in dermoscopic diagnosis is the blue-white veil (irregular, structureless areas of confluent blue pigmentation with an overlying white "ground-glass" film) which is mostly associated with invasive melanoma. In this preliminary study, a machine learning approach to the detection of blue-white veil areas in dermoscopy images is presented. The method involves pixel classification based on relative and absolute color features using a decision tree classifier. Promising results were obtained on a set of 224 dermoscopy images.

  14. Identifying relatively high-risk group of coronary artery calcification based on progression rate: statistical and machine learning methods.

    PubMed

    Kim, Ha-Young; Yoo, Sanghyun; Lee, Jihyun; Kam, Hye Jin; Woo, Kyoung-Gu; Choi, Yoon-Ho; Sung, Jidong; Kang, Mira

    2012-01-01

    Coronary artery calcification (CAC) score is an important predictor of coronary artery disease (CAD), which is the primary cause of death in advanced countries. Early prediction of high-risk of CAC based on progression rate enables people to prevent CAD from developing into severe symptoms and diseases. In this study, we developed various classifiers to identify patients in high risk of CAC using statistical and machine learning methods, and compared them with performance accuracy. For statistical approaches, linear regression based classifier and logistic regression model were developed. For machine learning approaches, we suggested three kinds of ensemble-based classifiers (best, top-k, and voting method) to deal with imbalanced distribution of our data set. Ensemble voting method outperformed all other methods including regression methods as AUC was 0.781. PMID:23366360

  15. The Value Simulation-Based Learning Added to Machining Technology in Singapore

    ERIC Educational Resources Information Center

    Fang, Linda; Tan, Hock Soon; Thwin, Mya Mya; Tan, Kim Cheng; Koh, Caroline

    2011-01-01

    This study seeks to understand the value simulation-based learning (SBL) added to the learning of Machining Technology in a 15-week core subject course offered to university students. The research questions were: (1) How did SBL enhance classroom learning? (2) How did SBL help participants in their test? (3) How did SBL prepare participants for…

  16. Automated mapping of building facades by machine learning

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-08-01

    Facades of buildings contain various types of objects which have to be recorded for information systems. The article describes a solution for this task focussing on automated classification by means of machine learning techniques. Stereo pairs of oblique images are used to derive 3D point clouds of buildings. The planes of the buildings are automatically detected. The derived planes are supplemented with a regular grid of points for which the colour values are found in the images. For each grid point of the façade additional attributes are derived from image and object data. This "intelligent" point cloud is analysed by a decision tree, which is derived from a small training set. The derived decision tree is then used to classify the complete point cloud. To each point of the regular façade grid a class is assigned and a façade plan is mapped by a colour palette representing the different objects. Some image processing methods are applied to improve the appearance of the interpreted façade plot and to extract additional information. The proposed method is tested on facades of a church. Accuracy measures were derived from 140 independent checkpoints, which were randomly selected. When selecting four classes ("window", "stone work", "painted wall", and "vegetation") the overall accuracy is assessed with 80 % (95 % Confidence Interval: 71 %-88 %). The user accuracy of class "stonework" was assessed with 90 % (95 % CI: 80 %-97 %). The proposed methodology has a high potential for automation and fast processing.

  17. Parsimonious extreme learning machine using recursive orthogonal least squares.

    PubMed

    Wang, Ning; Er, Meng Joo; Han, Min

    2014-10-01

    Novel constructive and destructive parsimonious extreme learning machines (CP- and DP-ELM) are proposed in this paper. By virtue of the proposed ELMs, parsimonious structure and excellent generalization of multiinput-multioutput single hidden-layer feedforward networks (SLFNs) are obtained. The proposed ELMs are developed by innovative decomposition of the recursive orthogonal least squares procedure into sequential partial orthogonalization (SPO). The salient features of the proposed approaches are as follows: 1) Initial hidden nodes are randomly generated by the ELM methodology and recursively orthogonalized into an upper triangular matrix with dramatic reduction in matrix size; 2) the constructive SPO in the CP-ELM focuses on the partial matrix with the subcolumn of the selected regressor including nonzeros as the first column while the destructive SPO in the DP-ELM operates on the partial matrix including elements determined by the removed regressor; 3) termination criteria for CP- and DP-ELM are simplified by the additional residual error reduction method; and 4) the output weights of the SLFN need not be solved in the model selection procedure and is derived from the final upper triangular equation by backward substitution. Both single- and multi-output real-world regression data sets are used to verify the effectiveness and superiority of the CP- and DP-ELM in terms of parsimonious architecture and generalization accuracy. Innovative applications to nonlinear time-series modeling demonstrate superior identification results. PMID:25291736

  18. A machine learning approach for detecting cell phone usage

    NASA Astrophysics Data System (ADS)

    Xu, Beilei; Loce, Robert P.

    2015-03-01

    Cell phone usage while driving is common, but widely considered dangerous due to distraction to the driver. Because of the high number of accidents related to cell phone usage while driving, several states have enacted regulations that prohibit driver cell phone usage while driving. However, to enforce the regulation, current practice requires dispatching law enforcement officers at road side to visually examine incoming cars or having human operators manually examine image/video records to identify violators. Both of these practices are expensive, difficult, and ultimately ineffective. Therefore, there is a need for a semi-automatic or automatic solution to detect driver cell phone usage. In this paper, we propose a machine-learning-based method for detecting driver cell phone usage using a camera system directed at the vehicle's front windshield. The developed method consists of two stages: first, the frontal windshield region localization using the deformable part model (DPM), next, we utilize Fisher vectors (FV) representation to classify the driver's side of the windshield into cell phone usage violation and non-violation classes. The proposed method achieved about 95% accuracy with a data set of more than 100 images with drivers in a variety of challenging poses with or without cell phones.

  19. A novel extreme learning machine for hypoglycemia detection.

    PubMed

    San, Phyo Phyo; Ling, Sai Ho; Soe, Ni Ni; Nguyen, Hung T

    2014-01-01

    Hypoglycemia is a common side-effect of insulin therapy for patients with type 1 diabetes mellitus (T1DM) and is the major limiting factor to maintain tight glycemic control. The deficiency in glucose counter-regulation may even lead to severe hypoglycaemia. It is always threatening to the well-being of patients with T1DM since more severe hypoglycemia leads to seizures or loss of consciousness and the possible development of permanent brain dysfunction under certain circumstances. Thus, an accurate early detection on hypoglycemia is an important research topic. With the use of new emerging technology, an extreme learning machine (ELM) based hypoglycemia detection system is developed to recognize the presence of hypoglycemic episodes. From a clinical study of 16 children with T1DM, natural occurrence of nocturnal hypoglycemic episodes are associated with increased heart rates (p <; 0.06) and increased corrected QT intervals (p <; 0.001). The overall data were organized into a training set with 8 patients (320 data points) and a testing set with 8 patients (269 data points). By using the ELM trained feed-forward neural network (ELM-FFNN), the testing sensitivity (true positive) and specificity (true negative) for detection of hypoglycemia is 78 and 60% respectability. PMID:25569957

  20. Machine learning methods for quantitative analysis of Raman spectroscopy data

    NASA Astrophysics Data System (ADS)

    Madden, Michael G.; Ryder, Alan G.

    2003-03-01

    The automated identification and quantification of illicit materials using Raman spectroscopy is of significant importance for law enforcement agencies. This paper explores the use of Machine Learning (ML) methods in comparison with standard statistical regression techniques for developing automated identification methods. In this work, the ML task is broken into two sub-tasks, data reduction and prediction. In well-conditioned data, the number of samples should be much larger than the number of attributes per sample, to limit the degrees of freedom in predictive models. In this spectroscopy data, the opposite is normally true. Predictive models based on such data have a high number of degrees of freedom, which increases the risk of models over-fitting to the sample data and having poor predictive power. In the work described here, an approach to data reduction based on Genetic Algorithms is described. For the prediction sub-task, the objective is to estimate the concentration of a component in a mixture, based on its Raman spectrum and the known concentrations of previously seen mixtures. Here, Neural Networks and k-Nearest Neighbours are used for prediction. Preliminary results are presented for the problem of estimating the concentration of cocaine in solid mixtures, and compared with previously published results in which statistical analysis of the same dataset was performed. Finally, this paper demonstrates how more accurate results may be achieved by using an ensemble of prediction techniques.

  1. Machine Learning Estimation of Atom Condensed Fukui Functions.

    PubMed

    Zhang, Qingyou; Zheng, Fangfang; Zhao, Tanfeng; Qu, Xiaohui; Aires-de-Sousa, João

    2016-02-01

    To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %. PMID:27491791

  2. Neural Network Machine Learning and Dimension Reduction for Data Visualization

    NASA Technical Reports Server (NTRS)

    Liles, Charles A.

    2014-01-01

    Neural network machine learning in computer science is a continuously developing field of study. Although neural network models have been developed which can accurately predict a numeric value or nominal classification, a general purpose method for constructing neural network architecture has yet to be developed. Computer scientists are often forced to rely on a trial-and-error process of developing and improving accurate neural network models. In many cases, models are constructed from a large number of input parameters. Understanding which input parameters have the greatest impact on the prediction of the model is often difficult to surmise, especially when the number of input variables is very high. This challenge is often labeled the "curse of dimensionality" in scientific fields. However, techniques exist for reducing the dimensionality of problems to just two dimensions. Once a problem's dimensions have been mapped to two dimensions, it can be easily plotted and understood by humans. The ability to visualize a multi-dimensional dataset can provide a means of identifying which input variables have the highest effect on determining a nominal or numeric output. Identifying these variables can provide a better means of training neural network models; models can be more easily and quickly trained using only input variables which appear to affect the outcome variable. The purpose of this project is to explore varying means of training neural networks and to utilize dimensional reduction for visualizing and understanding complex datasets.

  3. A Machine Learning Approach for Accurate Annotation of Noncoding RNAs.

    PubMed

    Song, Yinglei; Liu, Chunmei; Wang, Zhi

    2015-01-01

    Searching genomes to locate noncoding RNA genes with known secondary structure is an important problem in bioinformatics. In general, the secondary structure of a searched noncoding RNA is defined with a structure model constructed from the structural alignment of a set of sequences from its family. Computing the optimal alignment between a sequence and a structure model is the core part of an algorithm that can search genomes for noncoding RNAs. In practice, a single structure model may not be sufficient to capture all crucial features important for a noncoding RNA family. In this paper, we develop a novel machine learning approach that can efficiently search genomes for noncoding RNAs with high accuracy. During the search procedure, a sequence segment in the searched genome sequence is processed and a feature vector is extracted to represent it. Based on the feature vector, a classifier is used to determine whether the sequence segment is the searched ncRNA or not. Our testing results show that this approach is able to efficiently capture crucial features of a noncoding RNA family. Compared with existing search tools, it significantly improves the accuracy of genome annotation. PMID:26357266

  4. Machine learning and cosmological simulations - II. Hydrodynamical simulations

    NASA Astrophysics Data System (ADS)

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2016-04-01

    We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris simulation to train and test various sophisticated ML algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, g - r colour, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydrodynamical simulation surprisingly well in a computation time of mere minutes. The population of galaxies simulated by ML, while not numerically identical to Illustris, is statistically robust and physically consistent with Illustris galaxies and follows the same fundamental observational constraints. ML offers an intriguing and promising technique to create quick mock galaxy catalogues in the future.

  5. Using EHRs and Machine Learning for Heart Failure Survival Analysis

    PubMed Central

    Panahiazar, Maryam; Taslimitehrani, Vahid; Pereira, Naveen; Pathak, Jyotishman

    2016-01-01

    “Heart failure (HF) is a frequent health problem with high morbidity and mortality, increasing prevalence and escalating healthcare costs” [1]. By calculating a HF survival risk score based on patient-specific characteristics from Electronic Health Records (EHRs), we can identify high-risk patients and apply individualized treatment and healthy living choices to potentially reduce their mortality risk. The Seattle Heart Failure Model (SHFM) is one of the most popular models to calculate HF survival risk that uses multiple clinical variables to predict HF prognosis and also incorporates impact of HF therapy on patient outcomes. Although the SHFM has been validated across multiple cohorts [1–5], these studies were primarily done using clinical trials databases that do not reflect routine clinical care in the community. Further, the impact of contemporary therapeutic interventions, such as beta-blockers or defibrillators, was incorporated in SHFM by extrapolation from external trials. In this study, we assess the performance of SHFM using EHRs at Mayo Clinic, and sought to develop a risk prediction model using machine learning techiniques that applies routine clinical care data. Our results shows the models which were built using EHR data are more accurate (11% improvement in AUC) with the convenience of being more readily applicable in routine clinical care. Furthermore, we demonstrate that new predictive markers (such as co-morbidities) when incorporated into our models improve prognostic performance significantly (8% improvement in AUC). PMID:26262006

  6. Machine-learned pattern identification in olfactory subtest results

    PubMed Central

    Lötsch, Jörn; Hummel, Thomas; Ultsch, Alfred

    2016-01-01

    The human sense of smell is often analyzed as being composed of three main components comprising olfactory threshold, odor discrimination and the ability to identify odors. A relevant distinction of the three components and their differential changes in distinct disorders remains a research focus. The present data-driven analysis aimed at establishing a cluster structure in the pattern of olfactory subtest results. Therefore, unsupervised machine-learning was applied onto olfactory subtest results acquired in 10,714 subjects with nine different olfactory pathologies. Using the U-matrix, Emergent Self-organizing feature maps (ESOM) identified three different clusters characterized by (i) low threshold and good discrimination and identification, (ii) very high threshold associated with absent to poor discrimination and identification ability, or (iii) medium threshold, i.e., in the mid-range of possible thresholds, associated with reduced discrimination and identification ability. Specific etiologies of olfactory (dys)function were unequally represented in the clusters (p < 2.2 · 10−16). Patients with congenital anosmia were overrepresented in the second cluster while subjects with postinfectious olfactory dysfunction belonged frequently to the third cluster. However, the clusters provided no clear separation between etiologies. Hence, the present verification of a distinct cluster structure encourages continued scientific efforts at olfactory test pattern recognition. PMID:27762302

  7. Characterization of decohering quantum systems: Machine learning approach

    NASA Astrophysics Data System (ADS)

    Stenberg, Markku P. V.; Köhn, Oliver; Wilhelm, Frank K.

    2016-01-01

    Adaptive data collection and analysis, where data are being fed back to update the measurement settings, can greatly increase speed, precision, and reliability of the characterization of quantum systems. However, decoherence tends to make adaptive characterization difficult. As an example, we consider two coupled discrete quantum systems. When one of the systems can be controlled and measured, the standard method to characterize another, with an unknown frequency ωr, is swap spectroscopy. Here, adapting measurements can provide estimates whose error decreases exponentially in the number of measurement shots rather than as a power law in conventional swap spectroscopy. However, when the decoherence time is so short that an excitation oscillating between the two systems can only undergo less than a few tens of vacuum Rabi oscillations, this approach can be marred by a severe limit on accuracy unless carefully designed. We adopt machine learning techniques to search for efficient policies for the characterization of decohering quantum systems. We find, for instance, that when the system undergoes more than 2 Rabi oscillations during its relaxation time T1, O (103) measurement shots are sufficient to reduce the squared error of the Bayesian initial prior of the unknown frequency ωr by a factor O (104) or larger. We also develop policies optimized for extreme initial parameter uncertainty and for the presence of imperfections in the readout.

  8. Using Machine Learning to classify the diffuse interstellar bands

    NASA Astrophysics Data System (ADS)

    Baron, Dalya; Poznanski, Dovi; Watson, Darach; Yao, Yushu; Cox, Nick L. J.; Prochaska, J. Xavier

    2015-07-01

    Using over a million and a half extragalactic spectra from the Sloan Digital Sky Survey we study the correlations of the diffuse interstellar bands (DIBs) in the Milky Way. We measure the correlation between DIB strength and dust extinction for 142 DIBs using 24 stacked spectra in the reddening range E(B - V) < 0.2, many more lines than ever studied before. Most of the DIBs do not correlate with dust extinction. However, we find 10 weak and barely studied DIBs with correlations that are higher than 0.7 with dust extinction and confirm the high correlation of additional five strong DIBs. Furthermore, we find a pair of DIBs, 5925.9 and 5927.5 Å, which exhibits significant negative correlation with dust extinction, indicating that their carrier may be depleted on dust. We use Machine Learning algorithms to divide the DIBs to spectroscopic families based on 250 stacked spectra. By removing the dust dependence, we study how DIBs follow their local environment. We thus obtain six groups of weak DIBs, four of which are tightly associated with C2 or CN absorption lines.

  9. Machine learning of parameters for accurate semiempirical quantum chemical calculations

    SciTech Connect

    Dral, Pavlo O.; von Lilienfeld, O. Anatole; Thiel, Walter

    2015-04-14

    We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempirical OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.

  10. Electronic spectra from TDDFT and machine learning in chemical space

    SciTech Connect

    Ramakrishnan, Raghunathan; Hartmann, Mia; Tapavicza, Enrico; Lilienfeld, O. Anatole von

    2015-08-28

    Due to its favorable computational efficiency, time-dependent (TD) density functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chemical space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of reference second-order approximate coupled-cluster (CC2) singles and doubles spectra from TDDFT counterparts, or even from DFT gap. We applied this approach to low-lying singlet-singlet vertical electronic spectra of over 20 000 synthetically feasible small organic molecules with up to eight CONF atoms. The prediction errors decay monotonously as a function of training set size. For a training set of 10 000 molecules, CC2 excitation energies can be reproduced to within ±0.1 eV for the remaining molecules. Analysis of our spectral database via chromophore counting suggests that even higher accuracies can be achieved. Based on the evidence collected, we discuss open challenges associated with data-driven modeling of high-lying spectra and transition intensities.

  11. Machine learning of parameters for accurate semiempirical quantum chemical calculations

    DOE PAGES

    Dral, Pavlo O.; von Lilienfeld, O. Anatole; Thiel, Walter

    2015-04-14

    We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempiricalmore » OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.« less

  12. Machine Learning Estimation of Atom Condensed Fukui Functions.

    PubMed

    Zhang, Qingyou; Zheng, Fangfang; Zhao, Tanfeng; Qu, Xiaohui; Aires-de-Sousa, João

    2016-02-01

    To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %.

  13. A Machine Learning Approach for Accurate Annotation of Noncoding RNAs

    PubMed Central

    Liu, Chunmei; Wang, Zhi

    2016-01-01

    Searching genomes to locate noncoding RNA genes with known secondary structure is an important problem in bioinformatics. In general, the secondary structure of a searched noncoding RNA is defined with a structure model constructed from the structural alignment of a set of sequences from its family. Computing the optimal alignment between a sequence and a structure model is the core part of an algorithm that can search genomes for noncoding RNAs. In practice, a single structure model may not be sufficient to capture all crucial features important for a noncoding RNA family. In this paper, we develop a novel machine learning approach that can efficiently search genomes for noncoding RNAs with high accuracy. During the search procedure, a sequence segment in the searched genome sequence is processed and a feature vector is extracted to represent it. Based on the feature vector, a classifier is used to determine whether the sequence segment is the searched ncRNA or not. Our testing results show that this approach is able to efficiently capture crucial features of a noncoding RNA family. Compared with existing search tools, it significantly improves the accuracy of genome annotation. PMID:26357266

  14. Protein Sequence Classification with Improved Extreme Learning Machine Algorithms

    PubMed Central

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  15. Machine learning and cosmological simulations - I. Semi-analytical models

    NASA Astrophysics Data System (ADS)

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2016-01-01

    We present a new exploratory framework to model galaxy formation and evolution in a hierarchical Universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analysing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated ML algorithms (k-Nearest Neighbors, decision trees, random forests, and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of M > 1012 M⊙ and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon SAMs and demonstrably place ML as a promising and a computationally efficient tool to study small-scale structure formation.

  16. Machine Learning in the Big Data Era: Are We There Yet?

    SciTech Connect

    Sukumar, Sreenivas Rangan

    2014-01-01

    In this paper, we discuss the machine learning challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are machine learning algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstanding challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security and healthcare to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.

  17. Recent advances in computer camera methods for machine vision

    NASA Astrophysics Data System (ADS)

    Olson, Gaylord G.; Walker, Jo N.

    1998-10-01

    During the past year, several new computer camera methods (hardware and software) have been developed which have applications in machine vision. These are described below, along with some test results. The improvements are generally in the direction of higher speed and greater parallelism. A PCI interface card has been designed which is adaptable to multiple CCD types, both color and monochrome. A newly designed A/D converter allows for a choice of 8 or 10-bit conversion resolution and a choice of two different analog inputs. Thus, by using four of these converters feeding the 32-bit PCI data bus, up to 8 camera heads can be used with a single PCI card, and four camera heads can be operated in parallel. The card has been designed so that any of 8 different CCD types can be used with it (6 monochrome and 2 color CCDs) ranging in resolution from 192 by 165 pixels up to 1134 by 972 pixels. In the area of software, a method has been developed to better utilize the decision-making capability of the computer along with the sub-array scan capabilities of many CCDs. Specifically, it is shown below how to achieve a dual scan mode camera system wherein one scan mode is a low density, high speed scan of a complete image area, and a higher density sub-array scan is used in those areas where changes have been observed. The name given to this technique is adaptive sub-array scanning.

  18. An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge

    ERIC Educational Resources Information Center

    Mivule, Kato

    2014-01-01

    The purpose of this investigation is to study and pursue a user-defined approach in preserving data privacy while maintaining an acceptable level of data utility using machine learning classification techniques as a gauge in the generation of synthetic data sets. This dissertation will deal with data privacy, data utility, machine learning…

  19. Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Tout, Hicham

    2013-01-01

    The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning…

  20. Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm.

    PubMed

    Azé, Jérôme; Sola, Christophe; Zhang, Jian; Lafosse-Marin, Florian; Yasmin, Memona; Siddiqui, Rubina; Kremer, Kristin; van Soolingen, Dick; Refrégier, Guislaine

    2015-01-01

    Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface "TBminer." Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual

  1. Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm

    PubMed Central

    Azé, Jérôme; Sola, Christophe; Zhang, Jian; Lafosse-Marin, Florian; Yasmin, Memona; Siddiqui, Rubina; Kremer, Kristin; van Soolingen, Dick; Refrégier, Guislaine

    2015-01-01

    Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface “TBminer.” Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first

  2. Correct machine learning on protein sequences: a peer-reviewing perspective.

    PubMed

    Walsh, Ian; Pollastri, Gianluca; Tosatto, Silvio C E

    2016-09-01

    Machine learning methods are becoming increasingly popular to predict protein features from sequences. Machine learning in bioinformatics can be powerful but carries also the risk of introducing unexpected biases, which may lead to an overestimation of the performance. This article espouses a set of guidelines to allow both peer reviewers and authors to avoid common machine learning pitfalls. Understanding biology is necessary to produce useful data sets, which have to be large and diverse. Separating the training and test process is imperative to avoid over-selling method performance, which is also dependent on several hidden parameters. A novel predictor has always to be compared with several existing methods, including simple baseline strategies. Using the presented guidelines will help nonspecialists to appreciate the critical issues in machine learning.

  3. Position Paper: Applying Machine Learning to Software Analysis to Achieve Trusted, Repeatable Scientific Computing

    SciTech Connect

    Prowell, Stacy J; Symons, Christopher T

    2015-01-01

    Producing trusted results from high-performance codes is essential for policy and has significant economic impact. We propose combining rigorous analytical methods with machine learning techniques to achieve the goal of repeatable, trustworthy scientific computing.

  4. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation.

  5. Advanced human machine interaction for an image interpretation workstation

    NASA Astrophysics Data System (ADS)

    Maier, S.; Martin, M.; van de Camp, F.; Peinsipp-Byma, E.; Beyerer, J.

    2016-05-01

    In recent years, many new interaction technologies have been developed that enhance the usability of computer systems and allow for novel types of interaction. The areas of application for these technologies have mostly been in gaming and entertainment. However, in professional environments, there are especially demanding tasks that would greatly benefit from improved human machine interfaces as well as an overall improved user experience. We, therefore, envisioned and built an image-interpretation-workstation of the future, a multi-monitor workplace comprised of four screens. Each screen is dedicated to a complex software product such as a geo-information system to provide geographic context, an image annotation tool, software to generate standardized reports and a tool to aid in the identification of objects. Using self-developed systems for hand tracking, pointing gestures and head pose estimation in addition to touchscreens, face identification, and speech recognition systems we created a novel approach to this complex task. For example, head pose information is used to save the position of the mouse cursor on the currently focused screen and to restore it as soon as the same screen is focused again while hand gestures allow for intuitive manipulation of 3d objects in mid-air. While the primary focus is on the task of image interpretation, all of the technologies involved provide generic ways of efficiently interacting with a multi-screen setup and could be utilized in other fields as well. In preliminary experiments, we received promising feedback from users in the military and started to tailor the functionality to their needs

  6. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare.

    PubMed

    Mozaffari-Kermani, Mehran; Sur-Kolay, Susmita; Raghunathan, Anand; Jha, Niraj K

    2015-11-01

    Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.

  7. Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning.

    PubMed

    Sung, Yao-Ting; Chen, Ju-Ling; Cha, Ji-Her; Tseng, Hou-Chiang; Chang, Tao-Hsing; Chang, Kuo-En

    2015-06-01

    Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outliers in advance. The use of such readability formulae tends to produce a low text classification accuracy, while using a support vector machine (SVM) in machine learning can enhance the classification outcome. The present study constructed readability models by integrating multilevel linguistic features with SVM, which is more appropriate for text classification. Taking the Chinese language as an example, this study developed 31 linguistic features as the predicting variables at the word, semantic, syntax, and cohesion levels, with grade levels of texts as the criterion variable. The study compared four types of readability models by integrating unilevel and multilevel linguistic features with GLMs and an SVM. The results indicate that adopting a multilevel approach in readability analysis provides a better representation of the complexities of both texts and the reading comprehension process.

  8. Machine Learning Helps Identify CHRONO as a Circadian Clock Component

    PubMed Central

    Venkataraman, Anand; Ramanathan, Chidambaram; Kavakli, Ibrahim H.; Hughes, Michael E.; Baggs, Julie E.; Growe, Jacqueline; Liu, Andrew C.; Kim, Junhyong; Hogenesch, John B.

    2014-01-01

    Over the last decades, researchers have characterized a set of “clock genes” that drive daily rhythms in physiology and behavior. This arduous work has yielded results with far-reaching consequences in metabolic, psychiatric, and neoplastic disorders. Recent attempts to expand our understanding of circadian regulation have moved beyond the mutagenesis screens that identified the first clock components, employing higher throughput genomic and proteomic techniques. In order to further accelerate clock gene discovery, we utilized a computer-assisted approach to identify and prioritize candidate clock components. We used a simple form of probabilistic machine learning to integrate biologically relevant, genome-scale data and ranked genes on their similarity to known clock components. We then used a secondary experimental screen to characterize the top candidates. We found that several physically interact with known clock components in a mammalian two-hybrid screen and modulate in vitro cellular rhythms in an immortalized mouse fibroblast line (NIH 3T3). One candidate, Gene Model 129, interacts with BMAL1 and functionally represses the key driver of molecular rhythms, the BMAL1/CLOCK transcriptional complex. Given these results, we have renamed the gene CHRONO (computationally highlighted repressor of the network oscillator). Bi-molecular fluorescence complementation and co-immunoprecipitation demonstrate that CHRONO represses by abrogating the binding of BMAL1 to its transcriptional co-activator CBP. Most importantly, CHRONO knockout mice display a prolonged free-running circadian period similar to, or more drastic than, six other clock components. We conclude that CHRONO is a functional clock component providing a new layer of control on circadian molecular dynamics. PMID:24737000

  9. Short-term wind speed predictions with machine learning techniques

    NASA Astrophysics Data System (ADS)

    Ghorbani, M. A.; Khatibi, R.; FazeliFard, M. H.; Naghipour, L.; Makarynskyy, O.

    2016-02-01

    Hourly wind speed forecasting is presented by a modeling study with possible applications to practical problems including farming wind energy, aircraft safety and airport operations. Modeling techniques employed in this paper for such short-term predictions are based on the machine learning techniques of artificial neural networks (ANNs) and genetic expression programming (GEP). Recorded values of wind speed were used, which comprised 8 years of collected data at the Kersey site, Colorado, USA. The January data over the first 7 years (2005-2011) were used for model training; and the January data for 2012 were used for model testing. A number of model structures were investigated for the validation of the robustness of these two techniques. The prediction results were compared with those of a multiple linear regression (MLR) method and with the Persistence method developed for the data. The model performances were evaluated using the correlation coefficient, root mean square error, Nash-Sutcliffe efficiency coefficient and Akaike information criterion. The results indicate that forecasting wind speed is feasible using past records of wind speed alone, but the maximum lead time for the data was found to be 14 h. The results show that different techniques would lead to different results, where the choice between them is not easy. Thus, decision making has to be informed of these modeling results and decisions should be arrived at on the basis of an understanding of inherent uncertainties. The results show that both GEP and ANN are equally credible selections and even MLR should not be dismissed, as it has its uses.

  10. ASAP: a machine learning framework for local protein properties

    PubMed Central

    Brandes, Nadav; Ofer, Dan; Linial, Michal

    2016-01-01

    Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API. Database URL: ASAP’s and CleavePred source code, webtool and tutorials are available at: https://github.com/ddofer/asap; http://protonet.cs.huji.ac.il/cleavepred. PMID:27694209

  11. Using machine learning for discovery in synoptic survey imaging data

    NASA Astrophysics Data System (ADS)

    Brink, Henrik; Richards, Joseph W.; Poznanski, Dovi; Bloom, Joshua S.; Rice, John; Negahban, Sahand; Wainwright, Martin

    2013-10-01

    Modern time-domain surveys continuously monitor large swaths of the sky to look for astronomical variability. Astrophysical discovery in such data sets is complicated by the fact that detections of real transient and variable sources are highly outnumbered by `bogus' detections caused by imperfect subtractions, atmospheric effects and detector artefacts. In this work, we present a machine-learning (ML) framework for discovery of variability in time-domain imaging surveys. Our ML methods provide probabilistic statements, in near real time, about the degree to which each newly observed source is an astrophysically relevant source of variable brightness. We provide details about each of the analysis steps involved, including compilation of the training and testing sets, construction of descriptive image-based and contextual features, and optimization of the feature subset and model tuning parameters. Using a validation set of nearly 30 000 objects from the Palomar Transient Factory, we demonstrate a missed detection rate of at most 7.7 per cent at our chosen false-positive rate of 1 per cent for an optimized ML classifier of 23 features, selected to avoid feature correlation and overfitting from an initial library of 42 attributes. Importantly, we show that our classification methodology is insensitive to mislabelled training data up to a contamination of nearly 10 per cent, making it easier to compile sufficient training sets for accurate performance in future surveys. This ML framework, if so adopted, should enable the maximization of scientific gain from future synoptic survey and enable fast follow-up decisions on the vast amounts of streaming data produced by such experiments.

  12. Machine Learning in Ionospheric Phenomena Detection Using Passive Radar

    NASA Astrophysics Data System (ADS)

    Pankratius, V.; Barari, S.; Lind, F. D.

    2015-12-01

    This work describes an approach to automate ionospheric feature detection in passive radar data using a tunable pipeline of Python-implemented algorithms for detection and classification. In particular, our detector is tuned to capture E-region irregularities and various other events such as meteors, aircraft, and ambiguities that result from poor transmission of signals or noise interference. The detection stage applies to passive radar images with pixels normalized to a defined value range. To separate the background, we apply a thresholding value and an area cuttoff to keep regions with connected pixels of a minimum size; for each particular image, these parameters can be determined algorithmically in two ways through our ExplainedEntropy (EE) and MaximumRegionArea (MRA) techniques. EE identifies the smallest set of regions that explain the most entropy of the image. MRA sets the area threshold to be a function of the largest region size. The classification stage picks up on these detected areas and applies neural networks and random forests to the image feature space. This way we are able categorize images based on their scientific content and make them searchable for scientists. A training set of real radar images was available to evaluate our approach and its adaptivity. Based on these labeled real images, we also evaluated the robustness of the detection with enhanced set of perturbed images that were generated through a model-based simulator. The simulator also allowed for controlled experiments in the amount of perturbation and noise added, to precisely characterize the operation ranges of our machine learning algorithms. We will discuss the performance of the algorithms and potential scientific applications. Acknowledgements. We would like to acknowledge support from the NSF ACI-1442997 (PI V. Pankratius).

  13. ASAP: a machine learning framework for local protein properties

    PubMed Central

    Brandes, Nadav; Ofer, Dan; Linial, Michal

    2016-01-01

    Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API. Database URL: ASAP’s and CleavePred source code, webtool and tutorials are available at: https://github.com/ddofer/asap; http://protonet.cs.huji.ac.il/cleavepred.

  14. Machine Learning to Assess Grassland Productivity in Southeastern Arizona

    NASA Astrophysics Data System (ADS)

    Ponce-Campos, G. E.; Heilman, P.; Armendariz, G.; Moser, E.; Archer, V.; Vaughan, R.

    2015-12-01

    We present preliminary results of machine learning (ML) techniques modeling the combined effects of climate, management, and inherent potential on productivity of grazed semi-arid grasslands in southeastern Arizona. Our goal is to support public land managers determine if agency management policies are meeting objectives and where to focus attention. Monitoring in the field is becoming more and more limited in space and time. Remotely sensed data cover the entire allotments and go back in time, but do not consider the key issue of species composition. By estimating expected vegetative production as a function of site potential and climatic inputs, management skill can be assessed through time, across individual allotments, and between allotments. Here we present the use of Random Forest (RF) as the main ML technique, in this case for the purpose of regression. Our response variable is the maximum annual NDVI, a surrogate for grassland productivity, as generated by the Google Earth Engine cloud computing platform based on Landsat 5, 7, and 8 datasets. PRISM 33-year normal precipitation (1980-2013) was resampled to the Landsat scale. In addition, the GRIDMET climate dataset was the source for the calculation of the annual SPEI (Standardized Precipitation Evapotranspiration Index), a drought index. We also included information about landscape position, aspect, streams, ponds, roads and fire disturbances as part of the modeling process. Our results show that in terms of variable importance, the 33-year normal precipitation, along with SPEI, are the most important features affecting grasslands productivity within the study area. The RF approach was compared to a linear regression model with the same variables. The linear model resulted in an r2 = 0.41, whereas RF showed a significant improvement with an r2 = 0.79. We continue refining the model by comparison with aerial photography and to include grazing intensity and infrastructure from units/allotments to assess the

  15. Machine learning helps identify CHRONO as a circadian clock component.

    PubMed

    Anafi, Ron C; Lee, Yool; Sato, Trey K; Venkataraman, Anand; Ramanathan, Chidambaram; Kavakli, Ibrahim H; Hughes, Michael E; Baggs, Julie E; Growe, Jacqueline; Liu, Andrew C; Kim, Junhyong; Hogenesch, John B

    2014-04-01

    Over the last decades, researchers have characterized a set of "clock genes" that drive daily rhythms in physiology and behavior. This arduous work has yielded results with far-reaching consequences in metabolic, psychiatric, and neoplastic disorders. Recent attempts to expand our understanding of circadian regulation have moved beyond the mutagenesis screens that identified the first clock components, employing higher throughput genomic and proteomic techniques. In order to further accelerate clock gene discovery, we utilized a computer-assisted approach to identify and prioritize candidate clock components. We used a simple form of probabilistic machine learning to integrate biologically relevant, genome-scale data and ranked genes on their similarity to known clock components. We then used a secondary experimental screen to characterize the top candidates. We found that several physically interact with known clock components in a mammalian two-hybrid screen and modulate in vitro cellular rhythms in an immortalized mouse fibroblast line (NIH 3T3). One candidate, Gene Model 129, interacts with BMAL1 and functionally represses the key driver of molecular rhythms, the BMAL1/CLOCK transcriptional complex. Given these results, we have renamed the gene CHRONO (computationally highlighted repressor of the network oscillator). Bi-molecular fluorescence complementation and co-immunoprecipitation demonstrate that CHRONO represses by abrogating the binding of BMAL1 to its transcriptional co-activator CBP. Most importantly, CHRONO knockout mice display a prolonged free-running circadian period similar to, or more drastic than, six other clock components. We conclude that CHRONO is a functional clock component providing a new layer of control on circadian molecular dynamics.

  16. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction

    PubMed Central

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  17. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    PubMed

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  18. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    PubMed

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance.

  19. Classification of BMI control commands from rat's neural signals using extreme learning machine.

    PubMed

    Lee, Youngbum; Lee, Hyunjoo; Kim, Jinkwon; Shin, Hyung-Cheul; Lee, Myoungho

    2009-01-01

    A recently developed machine learning algorithm referred to as Extreme Learning Machine (ELM) was used to classify machine control commands out of time series of spike trains of ensembles of CA1 hippocampus neurons (n = 34) of a rat, which was performing a target-to-goal task on a two-dimensional space through a brain-machine interface system. Performance of ELM was analyzed in terms of training time and classification accuracy. The results showed that some processes such as class code prefix, redundancy code suffix and smoothing effect of the classifiers' outputs could improve the accuracy of classification of robot control commands for a brain-machine interface system. PMID:19860924

  20. Classifying chemical mode of action using gene networks and machine learning: a case study with the herbicide linuron.

    PubMed

    Ornostay, Anna; Cowie, Andrew M; Hindle, Matthew; Baker, Christopher J O; Martyniuk, Christopher J

    2013-12-01

    The herbicide linuron (LIN) is an endocrine disruptor with an anti-androgenic mode of action. The objectives of this study were to (1) improve knowledge of androgen and anti-androgen signaling in the teleostean ovary and to (2) assess the ability of gene networks and machine learning to classify LIN as an anti-androgen using transcriptomic data. Ovarian explants from vitellogenic fathead minnows (FHMs) were exposed to three concentrations of either 5α-dihydrotestosterone (DHT), flutamide (FLUT), or LIN for 12h. Ovaries exposed to DHT showed a significant increase in 17β-estradiol (E2) production while FLUT and LIN had no effect on E2. To improve understanding of androgen receptor signaling in the ovary, a reciprocal gene expression network was constructed for DHT and FLUT using pathway analysis and these data suggested that steroid metabolism, translation, and DNA replication are processes regulated through AR signaling in the ovary. Sub-network enrichment analysis revealed that FLUT and LIN shared more regulated gene networks in common compared to DHT. Using transcriptomic datasets from different fish species, machine learning algorithms classified LIN successfully with other anti-androgens. This study advances knowledge regarding molecular signaling cascades in the ovary that are responsive to androgens and anti-androgens and provides proof of concept that gene network analysis and machine learning can classify priority chemicals using experimental transcriptomic data collected from different fish species.

  1. Classification of older adults with/without a fall history using machine learning methods.

    PubMed

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.

  2. Classification of older adults with/without a fall history using machine learning methods.

    PubMed

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls. PMID:26737845

  3. The use of machine learning and nonlinear statistical tools for ADME prediction.

    PubMed

    Sakiyama, Yojiro

    2009-02-01

    Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.

  4. The use of machine learning and nonlinear statistical tools for ADME prediction.

    PubMed

    Sakiyama, Yojiro

    2009-02-01

    Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future. PMID:19239395

  5. On-line Machine Learning and Event Detection in Petascale Data Streams

    NASA Astrophysics Data System (ADS)

    Thompson, David R.; Wagstaff, K. L.

    2012-01-01

    Traditional statistical data mining involves off-line analysis in which all data are available and equally accessible. However, petascale datasets have challenged this premise since it is often impossible to store, let alone analyze, the relevant observations. This has led the machine learning community to investigate adaptive processing chains where data mining is a continuous process. Here pattern recognition permits triage and followup decisions at multiple stages of a processing pipeline. Such techniques can also benefit new astronomical instruments such as the Large Synoptic Survey Telescope (LSST) and Square Kilometre Array (SKA) that will generate petascale data volumes. We summarize some machine learning perspectives on real time data mining, with representative cases of astronomical applications and event detection in high volume datastreams. The first is a "supervised classification" approach currently used for transient event detection at the Very Long Baseline Array (VLBA). It injects known signals of interest - faint single-pulse anomalies - and tunes system parameters to recover these events. This permits meaningful event detection for diverse instrument configurations and observing conditions whose noise cannot be well-characterized in advance. Second, "semi-supervised novelty detection" finds novel events based on statistical deviations from previous patterns. It detects outlier signals of interest while considering known examples of false alarm interference. Applied to data from the Parkes pulsar survey, the approach identifies anomalous "peryton" phenomena that do not match previous event models. Finally, we consider online light curve classification that can trigger adaptive followup measurements of candidate events. Classifier performance analyses suggest optimal survey strategies, and permit principled followup decisions from incomplete data. These examples trace a broad range of algorithm possibilities available for online astronomical data

  6. Environmental Monitoring Networks Optimization Using Advanced Active Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Kanevski, Mikhail; Volpi, Michele; Copa, Loris

    2010-05-01

    The problem of environmental monitoring networks optimization (MNO) belongs to one of the basic and fundamental tasks in spatio-temporal data collection, analysis, and modeling. There are several approaches to this problem, which can be considered as a design or redesign of monitoring network by applying some optimization criteria. The most developed and widespread methods are based on geostatistics (family of kriging models, conditional stochastic simulations). In geostatistics the variance is mainly used as an optimization criterion which has some advantages and drawbacks. In the present research we study an application of advanced techniques following from the statistical learning theory (SLT) - support vector machines (SVM) and the optimization of monitoring networks when dealing with a classification problem (data are discrete values/classes: hydrogeological units, soil types, pollution decision levels, etc.) is considered. SVM is a universal nonlinear modeling tool for classification problems in high dimensional spaces. The SVM solution is maximizing the decision boundary between classes and has a good generalization property for noisy data. The sparse solution of SVM is based on support vectors - data which contribute to the solution with nonzero weights. Fundamentally the MNO for classification problems can be considered as a task of selecting new measurement points which increase the quality of spatial classification and reduce the testing error (error on new independent measurements). In SLT this is a typical problem of active learning - a selection of the new unlabelled points which efficiently reduce the testing error. A classical approach (margin sampling) to active learning is to sample the points closest to the classification boundary. This solution is suboptimal when points (or generally the dataset) are redundant for the same class. In the present research we propose and study two new advanced methods of active learning adapted to the solution of

  7. Smart Interpretation - Application of Machine Learning in Geological Interpretation of AEM Data

    NASA Astrophysics Data System (ADS)

    Bach, T.; Gulbrandsen, M. L.; Jacobsen, R.; Pallesen, T. M.; Jørgensen, F.; Høyer, A. S.; Hansen, T. M.

    2015-12-01

    When using airborne geophysical measurements in e.g. groundwater mapping, an overwhelming amount of data is collected. Increasingly larger survey areas, denser data collection and limited resources, combines to an increasing problem of building geological models that use all the available data in a manner that is consistent with the geologists knowledge about the geology of the survey area. In the ERGO project, funded by The Danish National Advanced Technology Foundation, we address this problem, by developing new, usable tools, enabling the geologist utilize her geological knowledge directly in the interpretation of the AEM data, and thereby handle the large amount of data, In the project we have developed the mathematical basis for capturing geological expertise in a statistical model. Based on this, we have implemented new algorithms that have been operationalized and embedded in user friendly software. In this software, the machine learning algorithm, Smart Interpretation, enables the geologist to use the system as an assistant in the geological modelling process. As the software 'learns' the geology from the geologist, the system suggest new modelling features in the data. In this presentation we demonstrate the application of the results from the ERGO project, including the proposed modelling workflow utilized on a variety of data examples.

  8. Using machine learning to drive computational discovery in self-organizing material systems

    NASA Astrophysics Data System (ADS)

    Phillips, Carolyn; Voth, Gregory

    2014-03-01

    In a complex self-organizing system, small changes in the interactions between the system's components can result in different emergent macrostructures or macrobehavior. In chemical engineering and material science, such spontaneously self-assembling systems, using polymers, nanoscale or colloidal-scale particles, and DNA are an attractive way to create materials that are precisely engineered. Computer simulations of such systems are a powerful tool for discovery. However, as the rate at which data can be amassed continues to accelerate, the pace of discovery becomes limited not by the rate at which data can be generated, but can be analyzed. We consider this problem from two ends, using a model particle system that self-assembles simple and complex crystals. First, we show how the ordered states can be discovered in a large data set of simulation results by using a hierarchy of pattern analysis techniques including shape matching and machine learning algorithms. Second, we introduce a learning algorithm, inspired by adaptive mesh refinement, that guides the deployment of computational experiments. This algorithm densely searches the space of the degrees of freedom for a self-organizing system, while targeting certain features, thus, gathering more information for less computational effort. New algorithmic techniques, such as these, for managing the growing volume of simulation data catalyze advancing computational power to be a tool for discovery. We acknowledge ONR and DOE-OS DE-AC02-06CH11357.

  9. Machine learning applications in proteomics research: how the past can boost the future.

    PubMed

    Kelchtermans, Pieter; Bittremieux, Wout; De Grave, Kurt; Degroeve, Sven; Ramon, Jan; Laukens, Kris; Valkenborg, Dirk; Barsnes, Harald; Martens, Lennart

    2014-03-01

    Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis. PMID:24323524

  10. Cross-person activity recognition using reduced kernel extreme learning machine.

    PubMed

    Deng, Wan-Yu; Zheng, Qing-Hua; Wang, Zhong-Min

    2014-05-01

    Activity recognition based on mobile embedded accelerometer is very important for developing human-centric pervasive applications such as healthcare, personalized recommendation and so on. However, the distribution of accelerometer data is heavily affected by varying users. The performance will degrade when the model trained on one person is used to others. To solve this problem, we propose a fast and accurate cross-person activity recognition model, known as TransRKELM (Transfer learning Reduced Kernel Extreme Learning Machine) which uses RKELM (Reduced Kernel Extreme Learning Machine) to realize initial activity recognition model. In the online phase OS-RKELM (Online Sequential Reduced Kernel Extreme Learning Machine) is applied to update the initial model and adapt the recognition model to new device users based on recognition results with high confidence level efficiently. Experimental results show that, the proposed model can adapt the classifier to new device users quickly and obtain good recognition performance.

  11. Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems.

    PubMed

    Kuusisto, Finn; Dutra, Inês; Elezaby, Mai; Mendonça, Eneida A; Shavlik, Jude; Burnside, Elizabeth S

    2015-01-01

    While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy.

  12. Cross-person activity recognition using reduced kernel extreme learning machine.

    PubMed

    Deng, Wan-Yu; Zheng, Qing-Hua; Wang, Zhong-Min

    2014-05-01

    Activity recognition based on mobile embedded accelerometer is very important for developing human-centric pervasive applications such as healthcare, personalized recommendation and so on. However, the distribution of accelerometer data is heavily affected by varying users. The performance will degrade when the model trained on one person is used to others. To solve this problem, we propose a fast and accurate cross-person activity recognition model, known as TransRKELM (Transfer learning Reduced Kernel Extreme Learning Machine) which uses RKELM (Reduced Kernel Extreme Learning Machine) to realize initial activity recognition model. In the online phase OS-RKELM (Online Sequential Reduced Kernel Extreme Learning Machine) is applied to update the initial model and adapt the recognition model to new device users based on recognition results with high confidence level efficiently. Experimental results show that, the proposed model can adapt the classifier to new device users quickly and obtain good recognition performance. PMID:24513850

  13. Collaborative Learning: Cognitive and Computational Approaches. Advances in Learning and Instruction Series.

    ERIC Educational Resources Information Center

    Dillenbourg, Pierre, Ed.

    Intended to illustrate the benefits of collaboration between scientists from psychology and computer science, namely machine learning, this book contains the following chapters, most of which are co-authored by scholars from both sides: (1) "Introduction: What Do You Mean by 'Collaborative Learning'?" (Pierre Dillenbourg); (2) "Learning Together:…

  14. Uncertainty "escalation" and use of machine learning to forecast residual and data model uncertainties

    NASA Astrophysics Data System (ADS)

    Solomatine, Dimitri

    2016-04-01

    When speaking about model uncertainty many authors implicitly assume the data uncertainty (mainly in parameters or inputs) which is probabilistically described by distributions. Often however it is look also into the residual uncertainty as well. It is hence reasonable to classify the main approaches to uncertainty analysis with respect to the two main types of model uncertainty that can be distinguished: A. The residual uncertainty of models. In this case the model parameters and/or model inputs are considered to be fixed (deterministic), i.e. the model is considered to be optimal (calibrated) and deterministic. Model error is considered as the manifestation of uncertainty. If there is enough past data about the model errors (i.e. it uncertainty), it is possible to build a statistical or machine learning model of uncertainty trained on this data. The following methods can be mentioned: (a) quantile regression (QR) method by Koenker and Basset in which linear regression is used to build predictive models for distribution quantiles [1] (b) a more recent approach that takes into account the input variables influencing such uncertainty and uses more advanced machine learning (non-linear) methods (neural networks, model trees etc.) - the UNEEC method [2,3,7] (c) and even more recent DUBRAUE method (Dynamic Uncertainty Model By Regression on Absolute Error), a autoregressive model of model residuals (it corrects the model residual first and then carries out the uncertainty prediction by a autoregressive statistical model) [5] B. The data uncertainty (parametric and/or input) - in this case we study the propagation of uncertainty (presented typically probabilistically) from parameters or inputs to the model outputs. In case of simple functions representing models analytical approaches can be used, or approximation methods (e.g., first-order second moment method). However, for real complex non-linear models implemented in software there is no other choice except using

  15. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques.

    PubMed

    Wang, Guanjin; Lam, Kin-Man; Deng, Zhaohong; Choi, Kup-Sze

    2015-08-01

    Bladder cancer is a common cancer in genitourinary malignancy. For muscle invasive bladder cancer, surgical removal of the bladder, i.e. radical cystectomy, is in general the definitive treatment which, unfortunately, carries significant morbidities and mortalities. Accurate prediction of the mortality of radical cystectomy is therefore needed. Statistical methods have conventionally been used for this purpose, despite the complex interactions of high-dimensional medical data. Machine learning has emerged as a promising technique for handling high-dimensional data, with increasing application in clinical decision support, e.g. cancer prediction and prognosis. Its ability to reveal the hidden nonlinear interactions and interpretable rules between dependent and independent variables is favorable for constructing models of effective generalization performance. In this paper, seven machine learning methods are utilized to predict the 5-year mortality of radical cystectomy, including back-propagation neural network (BPN), radial basis function (RBFN), extreme learning machine (ELM), regularized ELM (RELM), support vector machine (SVM), naive Bayes (NB) classifier and k-nearest neighbour (KNN), on a clinicopathological dataset of 117 patients of the urology unit of a hospital in Hong Kong. The experimental results indicate that RELM achieved the highest average prediction accuracy of 0.8 at a fast learning speed. The research findings demonstrate the potential of applying machine learning techniques to support clinical decision making.

  16. Integrating data sources to improve hydraulic head predictions : a hierarchical machine learning approach.

    SciTech Connect

    Michael, W. J.; Minsker, B. S.; Tcheng, D.; Valocchi, A. J.; Quinn, J. J.; Environmental Assessment; Univ. of Illinois

    2005-03-26

    This study investigates how machine learning methods can be used to improve hydraulic head predictions by integrating different types of data, including data from numerical models, in a hierarchical approach. A suite of four machine learning methods (decision trees, instance-based weighting, inverse distance weighting, and neural networks) are tested in several hierarchical configurations with different types of data from the 317/319 area at Argonne National Laboratory-East. The best machine learning model had a mean predicted head error 50% smaller than an existing MODFLOW numerical flow model, and a standard deviation of predicted head error 67% lower than the MODFLOW model, computed across all sampled locations used for calibrating the MODFLOW model. These predictions were obtained using decision trees trained with all historical quarterly data; the hourly head measurements were not as useful for prediction, most likely because of their poor spatial coverage. The results show promise for using hierarchical machine learning approaches to improve predictions and to identify the most essential types of data to guide future sampling efforts. Decision trees were also combined with an existing MODFLOW model to test their capabilities for updating numerical models to improve predictions as new data are collected. The combined model had a mean error 50% lower than the MODFLOW model alone. These results demonstrate that hierarchical machine learning approaches can be used to improve predictive performance of existing numerical models in areas with good data coverage. Further research is needed to compare this approach with methods such as Kalman filtering.

  17. Automatic Quality Inspection of Percussion Cap Mass Production by Means of 3D Machine Vision and Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Tellaeche, A.; Arana, R.; Ibarguren, A.; Martínez-Otzeta, J. M.

    The exhaustive quality control is becoming very important in the world's globalized market. One of these examples where quality control becomes critical is the percussion cap mass production. These elements must achieve a minimum tolerance deviation in their fabrication. This paper outlines a machine vision development using a 3D camera for the inspection of the whole production of percussion caps. This system presents multiple problems, such as metallic reflections in the percussion caps, high speed movement of the system and mechanical errors and irregularities in percussion cap placement. Due to these problems, it is impossible to solve the problem by traditional image processing methods, and hence, machine learning algorithms have been tested to provide a feasible classification of the possible errors present in the percussion caps.

  18. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning

    PubMed Central

    2013-01-01

    Background Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. Results In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction

  19. Machine learning aid for knowledge engineering: learning new plans for Pilot's Associate

    NASA Astrophysics Data System (ADS)

    Miller, Christopher A.; Levi, Keith R.; Druhan, Barry; Shalin, Valerie L.

    1992-08-01

    DARPA and Lockheed's Pilot's Associate (PA) represents one of the largest and most complex artificially intelligent systems constructed to date. Its architecture of five modular, cooperative expert systems posses a knowledge engineering problem unique in its scope, though not in its basic nature. The knowledge bases for each of PA's modules will be very large, constantly changing (in response to new tactics and new technological capabilities), and highly specialized for the task of the specific module. For efficiency, each module must contain only that knowledge necessary for its task, yet for cooperation, each system's knowledge must be consistent with the others'. Machine learning approaches hold the promise of greatly reducing knowledge acquisition and knowledge engineering time and of making the entire PA system more flexible, more accurate, and more consistent. We present the results of a three-year program investigating an Explanation-Based Learning approach to acquiring new plans from a simulator-based learning scenario and then propagating this knowledge to two of the five PA modules--as a tactical plan which focuses on changing world states for the Tactics Planner module, and as a list of pilot information needs for the dynamic display configuration algorithm used in the Pilot-Vehicle Interface module.

  20. Gaussian Process Regression as a machine learning tool for predicting organic carbon from soil spectra - a machine learning comparison study

    NASA Astrophysics Data System (ADS)

    Schmidt, Andreas; Lausch, Angela; Vogel, Hans-Jörg

    2016-04-01

    Diffuse reflectance spectroscopy as a soil analytical tool is spreading more and more. There is a wide range of possible applications ranging from the point scale (e.g. simple soil samples, drill cores, vertical profile scans) through the field scale to the regional and even global scale (UAV, airborne and space borne instruments, soil reflectance databases). The basic idea is that the soil's reflectance spectrum holds information about its properties (like organic matter content or mineral composition). The relation between soil properties and the observable spectrum is usually not exactly know and is typically derived from statistical methods. Nowadays these methods are classified in the term machine learning, which comprises a vast pool of algorithms and methods for learning the relationship between pairs if input - output data (training data set). Within this pool of methods a Gaussian Process Regression (GPR) is newly emerging method (originating from Bayesian statistics) which is increasingly applied to applications in different fields. For example, it was successfully used to predict vegetation parameters from hyperspectral remote sensing data. In this study we apply GPR to predict soil organic carbon from soil spectroscopy data (400 - 2500 nm). We compare it to more traditional and widely used methods such as Partitial Least Squares Regression (PLSR), Random Forest (RF) and Gradient Boosted Regression Trees (GBRT). All these methods have the common ability to calculate a measure for the variable importance (wavelengths importance). The main advantage of GPR is its ability to also predict the variance of the target parameter. This makes it easy to see whether a prediction is reliable or not. The ability to choose from various covariance functions makes GPR a flexible method. This allows for including different assumptions or a priori knowledge about the data. For this study we use samples from three different locations to test the prediction accuracies. One

  1. Learning Simple Machines through Cross-Age Collaborations

    ERIC Educational Resources Information Center

    Lancor, Rachael; Schiebel, Amy

    2008-01-01

    In this project, introductory college physics students (noneducation majors) were asked to teach simple machines to a class of second graders. This nontraditional activity proved to be a successful way to encourage college students to think critically about physics and how it applied to their everyday lives. The noneducation majors benefited by…

  2. Analysis of multidimensional signals as classifiers for machine learning prediction of disruptions

    NASA Astrophysics Data System (ADS)

    Parsons, Matthew; Tang, William; Feibush, Eliot

    2015-11-01

    ITER and future tokamaks beyond will require systems to predict oncoming disruptions so that damage to the machine can be avoided or mitigated. The use of supervised machine learning has proven to be successful in predicting the onset of disruptions with higher accuracy than a simple locked-mode detector, but only zero-dimensional time trace signals have been considered to examine this. We present initial results from our analysis of multidimensional signals (time + spatial dimensions) from the JET database to identify higher fidelity, physics-based classifiers that would allow the development of disruption prediction tools that are portable between machines.

  3. Machine Learning for Power System Disturbance and Cyber-attack Discrimination

    SciTech Connect

    Borges, Raymond Charles; Beaver, Justin M; Buckner, Mark A; Morris, Thomas; Adhikari, Uttam; Pan, Shengyi

    2014-01-01

    Power system disturbances are inherently complex and can be attributed to a wide range of sources, including both natural and man-made events. Currently, the power system operators are heavily relied on to make decisions regarding the causes of experienced disturbances and the appropriate course of action as a response. In the case of cyber-attacks against a power system, human judgment is less certain since there is an overt attempt to disguise the attack and deceive the operators as to the true state of the system. To enable the human decision maker, we explore the viability of machine learning as a means for discriminating types of power system disturbances, and focus specifically on detecting cyber-attacks where deception is a core tenet of the event. We evaluate various machine learning methods as disturbance discriminators and discuss the practical implications for deploying machine learning systems as an enhancement to existing power system architectures.

  4. Recent progresses in the exploration of machine learning methods as in-silico ADME prediction tools.

    PubMed

    Tao, L; Zhang, P; Qin, C; Chen, S Y; Zhang, C; Chen, Z; Zhu, F; Yang, S Y; Wei, Y Q; Chen, Y Z

    2015-06-23

    In-silico methods have been explored as potential tools for assessing ADME and ADME regulatory properties particularly in early drug discovery stages. Machine learning methods, with their ability in classifying diverse structures and complex mechanisms, are well suited for predicting ADME and ADME regulatory properties. Recent efforts have been directed at the broadening of application scopes and the improvement of predictive performance with particular focuses on the coverage of ADME properties, and exploration of more diversified training data, appropriate molecular features, and consensus modeling. Moreover, several online machine learning ADME prediction servers have emerged. Here we review these progresses and discuss the performances, application prospects and challenges of exploring machine learning methods as useful tools in predicting ADME and ADME regulatory properties.

  5. Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

    PubMed

    Thompson, Jeffrey A; Tan, Jie; Greene, Casey S

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  6. Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

    PubMed

    Thompson, Jeffrey A; Tan, Jie; Greene, Casey S

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

  7. Cross-platform normalization of microarray and RNA-seq data for machine learning applications

    PubMed Central

    Thompson, Jeffrey A.; Tan, Jie

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  8. Deep learning of support vector machines with class probability output networks.

    PubMed

    Kim, Sangwook; Yu, Zhibin; Kil, Rhee Man; Lee, Minho

    2015-04-01

    Deep learning methods endeavor to learn features automatically at multiple levels and allow systems to learn complex functions mapping from the input space to the output space for the given data. The ability to learn powerful features automatically is increasingly important as the volume of data and range of applications of machine learning methods continues to grow. This paper proposes a new deep architecture that uses support vector machines (SVMs) with class probability output networks (CPONs) to provide better generalization power for pattern classification problems. As a result, deep features are extracted without additional feature engineering steps, using multiple layers of the SVM classifiers with CPONs. The proposed structure closely approaches the ideal Bayes classifier as the number of layers increases. Using a simulation of classification problems, the effectiveness of the proposed method is demonstrated.

  9. Developing Prognosis Tools to Identify Learning Difficulties in Children Using Machine Learning Technologies.

    PubMed

    Loizou, Antonis; Laouris, Yiannis

    2011-09-01

    The Mental Attributes Profiling System was developed in 2002 (Laouris and Makris, Proceedings of multilingual & cross-cultural perspectives on Dyslexia, Omni Shoreham Hotel, Washington, D.C, 2002), to provide a multimodal evaluation of the learning potential and abilities of young children's brains. The method is based on the assessment of non-verbal abilities using video-like interfaces and was compared to more established methodologies in (Papadopoulos, Laouris, Makris, Proceedings of IDA 54th annual conference, San Diego, 2003), such as the Wechsler Intelligence Scale for Children (Watkins et al., Psychol Sch 34(4):309-319, 1997). To do so, various tests have been applied to a population of 134 children aged 7-12 years old. This paper addresses the issue of identifying a minimal set of variables that are able to accurately predict the learning abilities of a given child. The use of Machine Learning technologies to do this provides the advantage of making no prior assumptions about the nature of the data and eliminating natural bias associated with data processing carried out by humans. Kohonen's Self Organising Maps (Kohonen, Biol Cybern 43:59-69, 1982) algorithm is able to split a population into groups based on large and complex sets of observations. Once the population is split, the individual groups can then be probed for their defining characteristics providing insight into the rationale of the split. The characteristics identified form the basis of classification systems that are able to accurately predict which group an individual will belong to, using only a small subset of the tests available. The specifics of this methodology are detailed herein, and the resulting classification systems provide an effective tool to prognose the learning abilities of new subjects.

  10. DOE FreedomCAR and vehicle technologies program advanced power electronic and electrical machines annual review report

    SciTech Connect

    Olszewski, Mitch

    2006-10-11

    This report is a summary of the Review Panel at the FY06 DOE FreedomCAR and Vehicle Technologies (FCVT) Annual Review of Advanced Power Electronics and Electric Machine (APEEM) research activities held on August 15-17, 2006.

  11. Machine-learning to characterise neonatal functional connectivity in the preterm brain.

    PubMed

    Ball, G; Aljabar, P; Arichi, T; Tusor, N; Cox, D; Merchant, N; Nongena, P; Hajnal, J V; Edwards, A D; Counsell, S J

    2016-01-01

    Brain development is adversely affected by preterm birth. Magnetic resonance image analysis has revealed a complex fusion of structural alterations across all tissue compartments that are apparent by term-equivalent age, persistent into adolescence and adulthood, and associated with wide-ranging neurodevelopment disorders. Although functional MRI has revealed the relatively advanced organisational state of the neonatal brain, the full extent and nature of functional disruptions following preterm birth remain unclear. In this study, we apply machine-learning methods to compare whole-brain functional connectivity in preterm infants at term-equivalent age and healthy term-born neonates in order to test the hypothesis that preterm birth results in specific alterations to functional connectivity by term-equivalent age. Functional connectivity networks were estimated in 105 preterm infants and 26 term controls using group-independent component analysis and a graphical lasso model. A random forest-based feature selection method was used to identify discriminative edges within each network and a nonlinear support vector machine was used to classify subjects based on functional connectivity alone. We achieved 80% cross-validated classification accuracy informed by a small set of discriminative edges. These edges connected a number of functional nodes in subcortical and cortical grey matter, and most were stronger in term neonates compared to those born preterm. Half of the discriminative edges connected one or more nodes within the basal ganglia. These results demonstrate that functional connectivity in the preterm brain is significantly altered by term-equivalent age, confirming previous reports of altered connectivity between subcortical structures and higher-level association cortex following preterm birth.

  12. Collaborative Learning in Advanced Supply Systems: The KLASS Pilot Project.

    ERIC Educational Resources Information Center

    Rhodes, Ed; Carter, Ruth

    2003-01-01

    The Knowledge and Learning in Advanced Supply Systems (KLASS) project developed collaborative learning networks of suppliers in the British automotive and aerospace industries. Methods included face-to-face and distance learning, work toward National Vocational Qualifications, and diagnostic workshops for senior managers on improving quality,…

  13. Demarcating Advanced Learning Approaches from Methodological and Technological Perspectives

    ERIC Educational Resources Information Center

    Horvath, Imre; Peck, David; Verlinden, Jouke

    2009-01-01

    In the field of design and engineering education, the fast and expansive evolution of information and communication technologies is steadily converting traditional learning approaches into more advanced ones. Facilitated by Broadband (high bandwidth) personal computers, distance learning has developed into web-hosted electronic learning. The…

  14. Lifelong Learning in Artistic Context Mediated by Advanced Technologies

    ERIC Educational Resources Information Center

    Ferrari, Mirella

    2016-01-01

    This research starts by analysing the current state of artistic heritage in Italy and studying some examples in Europe: we try to investigate the scope of non-formal learning in artistic context, mediated by advanced technology. The framework within which we have placed our investigation is that of lifelong learning and lifedeep learning. The…

  15. Machine Learning Approaches for High-resolution Urban Land Cover Classification: A Comparative Study

    SciTech Connect

    Vatsavai, Raju; Chandola, Varun; Cheriyadat, Anil M; Bright, Eddie A; Bhaduri, Budhendra L; Graesser, Jordan B

    2011-01-01

    The proliferation of several machine learning approaches makes it difficult to identify a suitable classification technique for analyzing high-resolution remote sensing images. In this study, ten classification techniques were compared from five broad machine learning categories. Surprisingly, the performance of simple statistical classification schemes like maximum likelihood and Logistic regression over complex and recent techniques is very close. Given that these two classifiers require little input from the user, they should still be considered for most classification tasks. Multiple classifier systems is a good choice if the resources permit.

  16. Uncertainty "escalation" and use of machine learning to forecast residual and data model uncertainties

    NASA Astrophysics Data System (ADS)

    Solomatine, Dimitri

    2016-04-01

    When speaking about model uncertainty many authors implicitly assume the data uncertainty (mainly in parameters or inputs) which is probabilistically described by distributions. Often however it is look also into the residual uncertainty as well. It is hence reasonable to classify the main approaches to uncertainty analysis with respect to the two main types of model uncertainty that can be distinguished: A. The residual uncertainty of models. In this case the model parameters and/or model inputs are considered to be fixed (deterministic), i.e. the model is considered to be optimal (calibrated) and deterministic. Model error is considered as the manifestation of uncertainty. If there is enough past data about the model errors (i.e. it uncertainty), it is possible to build a statistical or machine learning model of uncertainty trained on this data. The following methods can be mentioned: (a) quantile regression (QR) method by Koenker and Basset in which linear regression is used to build predictive models for distribution quantiles [1] (b) a more recent approach that takes into account the input variables influencing such uncertainty and uses more advanced machine learning (non-linear) methods (neural networks, model trees etc.) - the UNEEC method [2,3,7] (c) and even more recent DUBRAUE method (Dynamic Uncertainty Model By Regression on Absolute Error), a autoregressive model of model residuals (it corrects the model residual first and then carries out the uncertainty prediction by a autoregressive statistical model) [5] B. The data uncertainty (parametric and/or input) - in this case we study the propagation of uncertainty (presented typically probabilistically) from parameters or inputs to the model outputs. In case of simple functions representing models analytical approaches can be used, or approximation methods (e.g., first-order second moment method). However, for real complex non-linear models implemented in software there is no other choice except using

  17. A machine learning approach to improve contactless heart rate monitoring using a webcam.

    PubMed

    Monkaresi, Hamed; Calvo, Rafael A; Yan, Hong

    2014-07-01

    Unobtrusive, contactless recordings of physiological signals are very important for many health and human-computer interaction applications. Most current systems require sensors which intrusively touch the user's skin. Recent advances in contact-free physiological signals open the door to many new types of applications. This technology promises to measure heart rate (HR) and respiration using video only. The effectiveness of this technology, its limitations, and ways of overcoming them deserves particular attention. In this paper, we evaluate this technique for measuring HR in a controlled situation, in a naturalistic computer interaction session, and in an exercise situation. For comparison, HR was measured simultaneously using an electrocardiography device during all sessions. The results replicated the published results in controlled situations, but show that they cannot yet be considered as a valid measure of HR in naturalistic human-computer interaction. We propose a machine learning approach to improve the accuracy of HR detection in naturalistic measurements. The results demonstrate that the root mean squared error is reduced from 43.76 to 3.64 beats/min using the proposed method. PMID:25014930

  18. Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

    NASA Astrophysics Data System (ADS)

    Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin

    2010-12-01

    We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.

  19. Solar Flare Predictions Using Time Series of SDO/HMI Observations and Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Ilonidis, Stathis; Bobra, Monica; Couvidat, Sebastien

    2015-08-01

    Solar active regions are dynamic systems that can rapidly evolve in time and produce flare eruptions. The temporal evolution of an active region can provide important information about its potential to produce major flares. In this study, we build a flare forecasting model using supervised machine learning methods and time series of SDO/HMI data for all the flaring regions with magnitude M1.0 or higher that have been observed with HMI and several thousand non-flaring regions. We define and compute hundreds of features that characterize the temporal evolution of physical properties related to the size, non-potentiality, and complexity of the active region, as well as its flaring history, for several days before the flare eruption. Using these features, we implement and test the performance of several machine learning algorithms, including support vector machines, neural networks, decision trees, discriminant analysis, and others. We also apply feature selection algorithms that aim to discard features with low predictive power and improve the performance of the machine learning methods. Our results show that support vector machines provide the best forecasts for the next 24 hours, achieving a True Skill Statistic of 0.923, an accuracy of 0.985, and a Heidke skill score of 0.861, which improve the scores obtained by Bobra and Couvidat (2015). The results of this study contribute to the development of a more reliable and fully automated data-driven flare forecasting system.

  20. Simulated annealing and stochastic learning in optical neural nets: An optical Boltzmann machine

    SciTech Connect

    Shae, Zonyin.

    1989-01-01

    This dissertation deals with the study of stochastic learning and neural computation in opto-electronic hardware. It presents the first demonstration of a fully operational optical learning machine. Learning in the machine is stochastic taking place in a self-organized multi-layered opto-electronic neural net with plastic connectivity weights that are formed in a programmable non-volatile spatial light modulator. Operation of the machine is made possible by two developments in this work: (a) Fast annealing by optically induced tremors in the energy landscape of the net. The objective of this scheme is to exploit the parallelism of the optical noise pattern so as to speed up the simulated annealing process. The procedure can be viewed as that of generating controlled gradually decreasing deformations or tremors in the energy landscape of the net that prevents entrapment in a local minimum energy state. Both the random drawing of neurons and the state update of the net are now done in parallel at the same time and without having to computer explicitly the change in the energy of the net and associated Boltzmann factor as required ordinarily in the Metropolis-Kirkpartrik simulated annealing algorithm. This leads to significant acceleration of the annealing process. (b) Stochastic learning with binary weights. Learning in opto-electronic neural nets can be simplified greatly if binary weights can be used. A third development, that is the development of schemes for driving and enhancing the frame rate of magneto-optic spatial light modulators, can make the machine learning speed potentially fast. Details of these developments together with the principle, architecture, structure, and performance evaluation of this machine are given.

  1. A new machine learning algorithm for removal of salt and pepper noise

    NASA Astrophysics Data System (ADS)

    Wang, Yi; Adhami, Reza; Fu, Jian

    2015-07-01

    Supervised machine learning algorithm has been extensively studied and applied to different fields of image processing in past decades. This paper proposes a new machine learning algorithm, called margin setting (MS), for restoring images that are corrupted by salt and pepper impulse noise. Margin setting generates decision surface to classify the noise pixels and non-noise pixels. After the noise pixels are detected, a modified ranked order mean (ROM) filter is used to replace the corrupted pixels for images reconstruction. Margin setting algorithm is tested with grayscale and color images for different noise densities. The experimental results are compared with those of the support vector machine (SVM) and standard median filter (SMF). The results show that margin setting outperforms these methods with higher Peak Signal-to-Noise Ratio (PSNR), lower mean square error (MSE), higher image enhancement factor (IEF) and higher Structural Similarity Index (SSIM).

  2. Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse

    PubMed Central

    Garrard, Peter; Rentoumi, Vassiliki; Gesierich, Benno; Miller, Bruce; Gorno-Tempini, Maria Luisa

    2014-01-01

    Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can ‘learn’ from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction. PMID:23876449

  3. Drought monitoring using downscaled soil moisture through machine learning approaches over North and South Korea

    NASA Astrophysics Data System (ADS)

    Park, S.; Im, J.; Rhee, J.; Park, S.

    2015-12-01

    Soil moisture is one of the most important key variables for drought monitoring. It reflects hydrological and agricultural processes because soil moisture is a function of precipitation and energy flux and crop yield is highly related to soil moisture. Many satellites including Advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR-E), Soil Moisture and Ocean Salinity sensor (SMOS), and Soil Moisture Active Passive (SMAP) provide global scale soil moisture products through microwave sensors. However, as the spatial resolution of soil moisture products is typically tens of kilometers, it is difficult to monitor drought using soil moisture at local or regional scale. In this study, AMSR-E and AMSR2 soil moisture were downscaled up to 1 km spatial resolution using Moderate Resolution Imaging Spectroradiometer (MODIS) data—Evapotranspiration, Land Surface Temperature, Leaf Area Index, Normalized Difference Vegetation Index, Enhanced Vegetation Index and Albedo—through machine learning approaches over Korean peninsula. To monitor drought from 2003 to 2014, each pixel of the downscaled soil moisture was scaled from 0 to 1 (1 is the wettest and 0 is the driest). The soil moisture based drought maps were validated using Standardized Precipitation Index (SPI) and crop yield data. Spatial distribution of drought status was also compared with other drought indices such as Scaled Drought Condition Index (SDCI). Machine learning approaches were performed well (R=0.905) for downscaling. Downscaled soil moisture was validated using in situ Asia flux data. The Root Mean Square Errors (RMSE) improved from 0.172 (25 km AMSR2) to 0.065 (downscaled soil moisture). The correlation coefficients improved from 0.201 (25 km AMSR2) to 0.341 (downscaled soil moisture). The soil moisture based drought maps and SDCI showed similar spatial distribution that caught both extreme drought and no drought. Since the proposed drought monitoring approach based on the downscaled

  4. A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.

    PubMed

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414

  5. A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.

    PubMed

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.

  6. Unsupervised nonlinear dimensionality reduction machine learning methods applied to multiparametric MRI in cerebral ischemia: preliminary results

    NASA Astrophysics Data System (ADS)

    Parekh, Vishwa S.; Jacobs, Jeremy R.; Jacobs, Michael A.

    2014-03-01

    The evaluation and treatment of acute cerebral ischemia requires a technique that can determine the total area of tissue at risk for infarction using diagnostic magnetic resonance imaging (MRI) sequences. Typical MRI data sets consist of T1- and T2-weighted imaging (T1WI, T2WI) along with advanced MRI parameters of diffusion-weighted imaging (DWI) and perfusion weighted imaging (PWI) methods. Each of these parameters has distinct radiological-pathological meaning. For example, DWI interrogates the movement of water in the tissue and PWI gives an estimate of the blood flow, both are critical measures during the evolution of stroke. In order to integrate these data and give an estimate of the tissue at risk or damaged; we have developed advanced machine learning methods based on unsupervised non-linear dimensionality reduction (NLDR) techniques. NLDR methods are a class of algorithms that uses mathematically defined manifolds for statistical sampling of multidimensional classes to generate a discrimination rule of guaranteed statistical accuracy and they can generate a two- or three-dimensional map, which represents the prominent structures of the data and provides an embedded image of meaningful low-dimensional structures hidden in their high-dimensional observations. In this manuscript, we develop NLDR methods on high dimensional MRI data sets of preclinical animals and clinical patients with stroke. On analyzing the performance of these methods, we observed that there was a high of similarity between multiparametric embedded images from NLDR methods and the ADC map and perfusion map. It was also observed that embedded scattergram of abnormal (infarcted or at risk) tissue can be visualized and provides a mechanism for automatic methods to delineate potential stroke volumes and early tissue at risk.

  7. Using Machine-Learned Detectors to Assess and Predict Students' Inquiry Performance

    ERIC Educational Resources Information Center

    Gobert, Janice D.; Baker, Ryan; Pedro, Michael Sao

    2011-01-01

    The authors present work towards automatically assessing data collection behaviors as middle school students engage in inquiry within a physics microworld. In this study, the authors used machine learned models that can detect when students test their articulated hypotheses, design controlled experiments, and engage in planning behaviors using…

  8. Learning Activity Packets for Grinding Machines. Unit II--Surface Grinding.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the second unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  9. AN INVESTIGATION OF "TEACHING MACHINE" VARIABLES USING LEARNING PROGRAMS IN SYMBOLIC LOGIC.

    ERIC Educational Resources Information Center

    EVANS, JAMES LEE; AND OTHERS

    PURPOSES OF THIS STUDY WERE TO EXPLORE THE FEASIBILITY OF SYMBOLIC LOGIC AS AN EXPERIMENTAL TASK TO BE PRESENTED USING PROGRAMED INSTRUCTION ON TEACHING MACHINES, TO DEVELOP A STANDARD PROGRAM AND RELIABLE CRITERION MEASURES OF ITS CONTENT, AND TO INVESTIGATE EFFECTS OF RESPONSE MODE, FEEDBACK, AND PROGRAM CONSTRUCTION ON LEARNING RATE AND ON…

  10. e-Learning Application for Machine Maintenance Process using Iterative Method in XYZ Company

    NASA Astrophysics Data System (ADS)

    Nurunisa, Suaidah; Kurniawati, Amelia; Pramuditya Soesanto, Rayinda; Yunan Kurnia Septo Hediyanto, Umar

    2016-02-01

    XYZ Company is a company based on manufacturing part for airplane, one of the machine that is categorized as key facility in the company is Millac 5H6P. As a key facility, the machines should be assured to work well and in peak condition, therefore, maintenance process is needed periodically. From the data gathering, it is known that there are lack of competency from the maintenance staff to maintain different type of machine which is not assigned by the supervisor, this indicate that knowledge which possessed by maintenance staff are uneven. The purpose of this research is to create knowledge-based e-learning application as a realization from externalization process in knowledge transfer process to maintain the machine. The application feature are adjusted for maintenance purpose using e-learning framework for maintenance process, the content of the application support multimedia for learning purpose. QFD is used in this research to understand the needs from user. The application is built using moodle with iterative method for software development cycle and UML Diagram. The result from this research is e-learning application as sharing knowledge media for maintenance staff in the company. From the test, it is known that the application make maintenance staff easy to understand the competencies.

  11. Exploring Machine Learning Techniques Using Patient Interactions in Online Health Forums to Classify Drug Safety

    ERIC Educational Resources Information Center

    Chee, Brant Wah Kwong

    2011-01-01

    This dissertation explores the use of personal health messages collected from online message forums to predict drug safety using natural language processing and machine learning techniques. Drug safety is defined as any drug with an active safety alert from the US Food and Drug Administration (FDA). It is believed that this is the first…

  12. Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

    ERIC Educational Resources Information Center

    Nehm, Ross H.; Ha, Minsu; Mayfield, Elijah

    2012-01-01

    This study explored the use of machine learning to automatically evaluate the accuracy of students' written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate…

  13. Introduction to the JASIST Special Topic Issue on Web Retrieval and Mining: A Machine Learning Perspective.

    ERIC Educational Resources Information Center

    Chen, Hsinchun

    2003-01-01

    Discusses information retrieval techniques used on the World Wide Web. Topics include machine learning in information extraction; relevance feedback; information filtering and recommendation; text classification and text clustering; Web mining, based on data mining techniques; hyperlink structure; and Web size. (LRW)

  14. Use of Machine Learning to Identify Children with Autism and Their Motor Abnormalities

    ERIC Educational Resources Information Center

    Crippa, Alessandro; Salvatore, Christian; Perego, Paolo; Forti, Sara; Nobile, Maria; Molteni, Massimo; Castiglioni, Isabella

    2015-01-01

    In the present work, we have undertaken a proof-of-concept study to determine whether a simple upper-limb movement could be useful to accurately classify low-functioning children with autism spectrum disorder (ASD) aged 2-4. To answer this question, we developed a supervised machine-learning method to correctly discriminate 15 preschool children…

  15. Games and Machine Learning: A Powerful Combination in an Artificial Intelligence Course

    ERIC Educational Resources Information Center

    Wallace, Scott A.; McCartney, Robert; Russell, Ingrid

    2010-01-01

    Project MLeXAI [Machine Learning eXperiences in Artificial Intelligence (AI)] seeks to build a set of reusable course curriculum and hands on laboratory projects for the artificial intelligence classroom. In this article, we describe two game-based projects from the second phase of project MLeXAI: Robot Defense--a simple real-time strategy game…

  16. Learning Control: Sense-Making, CNC Machines, and Changes in Vocational Training for Industrial Work

    ERIC Educational Resources Information Center

    Berner, Boel

    2009-01-01

    The paper explores how novices in school-based vocational training make sense of computerized numerical control (CNC) machines. Based on two ethnographic studies in Swedish schools, one from the early 1980s and one from 2006, it analyses change and continuity in the cognitive, social, and emotional processes of learning how to become a machine…

  17. Drought Forecasting Based on Machine Learning of Remote Sensing and Long-Range Forecast Data

    NASA Astrophysics Data System (ADS)

    Rhee, J.; Im, J.; Park, S.

    2016-06-01

    The reduction of drought impacts may be achieved through sustainable drought management and proactive measures against drought disaster. Accurate and timely provision of drought information is essential. In this study, drought forecasting models to provide high-resolution drought information based on drought indicators for ungauged areas were developed. The developed models predict drought indices of the 6-month Standardized Precipitation Index (SPI6) and the 6-month Standardized Precipitation Evapotranspiration Index (SPEI6). An interpolation method based on multiquadric spline interpolation method as well as three machine learning models were tested. Three machine learning models of Decision Tree, Random Forest, and Extremely Randomized Trees were tested to enhance the provision of drought initial conditions based on remote sensing data, since initial conditions is one of the most important factors for drought forecasting. Machine learning-based methods performed better than interpolation methods for both classification and regression, and the methods using climatology data outperformed the methods using long-range forecast. The model based on climatological data and the machine learning method outperformed overall.

  18. Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

    PubMed Central

    Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M.; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John

    2016-01-01

    Objective: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient’s status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral. PMID:27257386

  19. Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.

    PubMed

    Armañanzas, Rubén; Alonso-Nanclares, Lidia; Defelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G; Defelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery.

  20. Detection of dispersed radio pulses: a machine learning approach to candidate identification and classification

    NASA Astrophysics Data System (ADS)

    Devine, Thomas Ryan; Goseva-Popstojanova, Katerina; McLaughlin, Maura

    2016-06-01

    Searching for extraterrestrial, transient signals in astronomical data sets is an active area of current research. However, machine learning techniques are lacking in the literature concerning single-pulse detection. This paper presents a new, two-stage approach for identifying and classifying dispersed pulse groups (DPGs) in single-pulse search output. The first stage identified DPGs and extracted features to characterize them using a new peak identification algorithm which tracks sloping tendencies around local maxima in plots of signal-to-noise ratio versus dispersion measure. The second stage used supervised machine learning to classify DPGs. We created four benchmark data sets: one unbalanced and three balanced versions using three different imbalance treatments. We empirically evaluated 48 classifiers by training and testing binary and multiclass versions of six machine learning algorithms on each of the four benchmark versions. While each classifier had advantages and disadvantages, all classifiers with imbalance treatments had higher recall values than those with unbalanced data, regardless of the machine learning algorithm used. Based on the benchmarking results, we selected a subset of classifiers to classify the full, unlabelled data set of over 1.5 million DPGs identified in 42 405 observations made by the Green Bank Telescope. Overall, the classifiers using a multiclass ensemble tree learner in combination with two oversampling imbalance treatments were the most efficient; they identified additional known pulsars not in the benchmark data set and provided six potential discoveries, with significantly less false positives than the other classifiers.

  1. Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.

    PubMed

    Armañanzas, Rubén; Alonso-Nanclares, Lidia; Defelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G; Defelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148

  2. Machine Learning Approach for the Outcome Prediction of Temporal Lobe Epilepsy Surgery

    PubMed Central

    DeFelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G.; DeFelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148

  3. Machine Translation in Foreign Language Learning: Language Learners' and Tutors' Perceptions of Its Advantages and Disadvantages

    ERIC Educational Resources Information Center

    Nino, Ana

    2009-01-01

    This paper presents a snapshot of what has been investigated in terms of the relationship between machine translation (MT) and foreign language (FL) teaching and learning. For this purpose four different roles of MT in the language class have been identified: MT as a bad model, MT as a good model, MT as a vocational training tool (especially in…

  4. Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality

    PubMed Central

    2016-01-01

    Background One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Objective Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Methods Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Results Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Conclusions Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data. PMID:27185366

  5. Machine learning method for knowledge discovery experimented with otoneurological data.

    PubMed

    Varpa, Kirsi; Iltanen, Kati; Juhola, Martti

    2008-08-01

    We have been interested in developing an otoneurological decision support system that supports diagnostics of vertigo diseases. In this study, we concentrate on testing its inference mechanism and knowledge discovery method. Knowledge is presented as patterns of classes. Each pattern includes attributes with weight and fitness values concerning the class. With the knowledge discovery method it is possible to form fitness values from data. Knowledge formation is based on frequency distributions of attributes. Knowledge formed by the knowledge discovery method is tested with two vertigo data sets and compared to experts' knowledge. The experts' and machine learnt knowledge are also combined in various ways in order to examine effects of weights on classification accuracy. The classification accuracy of knowledge discovery method is compared to 1- and 5-nearest neighbour method and Naive-Bayes classifier. The results showed that knowledge bases combining machine learnt knowledge with the experts' knowledge yielded the best classification accuracies. Further, attribute weighting had an important effect on the classification capability of the system. When considering different diseases in the used data sets, the performance of the knowledge discovery method and the inference method is comparable to other methods employed in this study.

  6. Recent CESAR (Center for Engineering Systems Advanced Research) research activities in sensor based reasoning for autonomous machines

    SciTech Connect

    Pin, F.G.; de Saussure, G.; Spelt, P.F.; Killough, S.M.; Weisbin, C.R.

    1988-01-01

    This paper describes recent research activities at the Center for Engineering Systems Advanced Research (CESAR) in the area of sensor based reasoning, with emphasis being given to their application and implementation on our HERMIES-IIB autonomous mobile vehicle. These activities, including navigation and exploration in a-priori unknown and dynamic environments, goal recognition, vision-guided manipulation and sensor-driven machine learning, are discussed within the framework of a scenario in which an autonomous robot is asked to navigate through an unknown dynamic environment, explore, find and dock at the panel, read and understand the status of the panel's meters and dials, learn the functioning of a process control panel, and successfully manipulate the control devices of the panel to solve a maintenance emergency problems. A demonstration of the successful implementation of the algorithms on our HERMIES-IIB autonomous robot for resolution of this scenario is presented. Conclusions are drawn concerning the applicability of the methodologies to more general classes of problems and implications for future work on sensor-driven reasoning for autonomous robots are discussed. 8 refs., 3 figs.

  7. Coordinated machine learning and decision support for situation awareness.

    PubMed

    Brannon, N G; Seiffertt, J E; Draelos, T J; Wunsch, D C

    2009-04-01

    Domains such as force protection require an effective decision maker to maintain a high level of situation awareness. A system that combines humans with neural networks is a desirable approach. Furthermore, it is advantageous for the calculation engine to operate in three learning modes: supervised for initial training and known updating, reinforcement for online operational improvement, and unsupervised in the absence of all external signaling. An Adaptive Resonance Theory based architecture capable of seamlessly switching among the three types of learning is discussed that can be used to help optimize the decision making of a human operator in such a scenario. This is followed by a situation assessment module.

  8. The influence of negative training set size on machine learning-based virtual screening

    PubMed Central

    2014-01-01

    Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening. PMID:24976867

  9. Machine learning methods in the computational biology of cancer.

    PubMed

    Vidyasagar, M

    2014-07-01

    The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.

  10. Machine learning methods in the computational biology of cancer

    PubMed Central

    Vidyasagar, M.

    2014-01-01

    The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented. PMID:25002826

  11. Dynamical analysis of contrastive divergence learning: Restricted Boltzmann machines with Gaussian visible units.

    PubMed

    Karakida, Ryo; Okada, Masato; Amari, Shun-Ichi

    2016-07-01

    The restricted Boltzmann machine (RBM) is an essential constituent of deep learning, but it is hard to train by using maximum likelihood (ML) learning, which minimizes the Kullback-Leibler (KL) divergence. Instead, contrastive divergence (CD) learning has been developed as an approximation of ML learning and widely used in practice. To clarify the performance of CD learning, in this paper, we analytically derive the fixed points where ML and CDn learning rules converge in two types of RBMs: one with Gaussian visible and Gaussian hidden units and the other with Gaussian visible and Bernoulli hidden units. In addition, we analyze the stability of the fixed points. As a result, we find that the stable points of CDn learning rule coincide with those of ML learning rule in a Gaussian-Gaussian RBM. We also reveal that larger principal components of the input data are extracted at the stable points. Moreover, in a Gaussian-Bernoulli RBM, we find that both ML and CDn learning can extract independent components at one of stable points. Our analysis demonstrates that the same feature components as those extracted by ML learning are extracted simply by performing CD1 learning. Expanding this study should elucidate the specific solutions obtained by CD learning in other types of RBMs or in deep networks. PMID:27131468

  12. Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification.

    PubMed

    Mirza, Bilal; Lin, Zhiping

    2016-08-01

    In this paper, a meta-cognitive online sequential extreme learning machine (MOS-ELM) is proposed for class imbalance and concept drift learning. In MOS-ELM, meta-cognition is used to self-regulate the learning by selecting suitable learning strategies for class imbalance and concept drift problems. MOS-ELM is the first sequential learning method to alleviate the imbalance problem for both binary class and multi-class data streams with concept drift. In MOS-ELM, a new adaptive window approach is proposed for concept drift learning. A single output update equation is also proposed which unifies various application specific OS-ELM methods. The performance of MOS-ELM is evaluated under different conditions and compared with methods each specific to some of the conditions. On most of the datasets in comparison, MOS-ELM outperforms the competing methods.

  13. Machine Methods for Acquiring, Learning, and Applying Knowledge.

    ERIC Educational Resources Information Center

    Hayes-Roth, Frederick; And Others

    A research plan for identifying and acting upon constraints that impede the development of knowledge-based intelligent systems is described. The two primary problems identified are knowledge programming, the task of which is to create an intelligent system that does what an expert says it should, and learning, the problem requiring the criticizing…

  14. Construction and Analysis of Educational Tests Using Abductive Machine Learning

    ERIC Educational Resources Information Center

    El-Alfy, El-Sayed M.; Abdel-Aal, Radwan E.

    2008-01-01

    Recent advances in educational technologies and the wide-spread use of computers in schools have fueled innovations in test construction and analysis. As the measurement accuracy of a test depends on the quality of the items it includes, item selection procedures play a central role in this process. Mathematical programming and the item response…

  15. How to represent crystal structures for machine learning: Towards fast prediction of electronic properties

    NASA Astrophysics Data System (ADS)

    Schütt, K. T.; Glawe, H.; Brockherde, F.; Sanna, A.; Müller, K. R.; Gross, E. K. U.

    2014-05-01

    High-throughput density functional calculations of solids are highly time-consuming. As an alternative, we propose a machine learning approach for the fast prediction of solid-state properties. To achieve this, local spin-density approximation calculations are used as a training set. We focus on predicting the value of the density of electronic states at the Fermi energy. We find that conventional representations of the input data, such as the Coulomb matrix, are not suitable for the training of learning machines in the case of periodic solids. We propose a novel crystal structure representation for which learning and competitive prediction accuracies become possible within an unrestricted class of spd systems of arbitrary unit-cell size.

  16. Contextual quick-learning and generalization by humans and machines.

    PubMed

    Bernasconi, J; Gustafson, K

    1998-02-01

    In a previous study (1994 Network: Comput. Neural Syst. 5 203-27) we compared human quick-learning and generalization (quick modelling) with that of neural nets (feedforward architectures), symbolic algorithms (decision tree procedures), and pattern classifiers (truth-set descriptors). Those studies raised the question of the role of context in the nature and rapidity of human learning. Here we address that issue in the setting of the same basic experiment (Quinlan classification problem) used for the previous studies. A major implication of our findings is that humans overwhelmingly seek, create, or imagine context in order to provide meaning when presented with abstract or apparently incomplete or contradictory or otherwise untenable situations.

  17. Advanced man-machine interface systems and advanced information management systems programs

    SciTech Connect

    Naser, J.; Gray, S.; Machiels, A.

    1997-12-01

    The Advanced Light Water Reactor (ALWR) Program started in the early 1980`s. This work involves the development and NRC review of the ALWR Utility Requirements Documents, the development and design certification of ALWR designs, the analysis of the Early Site Permit process, and the First-of-a-Kind Engineering for two of the ALWR plant designs. ALWRs will embody modern proven technology. However, technologies expected to be used in these plants are changing very rapidly so that additional capabilities will become available that will be beneficial for future plants. To remain competitive on a life-cycle basis in the future, the ALWR must take advantage of the best and most modem technologies available. 1 ref.

  18. Machine learning using a higher order correlation network

    SciTech Connect

    Lee, Y.C.; Doolen, G.; Chen, H.H.; Sun, G.Z.; Maxwell, T.; Lee, H.Y.

    1986-01-01

    A high-order correlation tensor formalism for neural networks is described. The model can simulate auto associative, heteroassociative, as well as multiassociative memory. For the autoassociative model, simulation results show a drastic increase in the memory capacity and speed over that of the standard Hopfield-like correlation matrix methods. The possibility of using multiassociative memory for a learning universal inference network is also discussed. 9 refs., 5 figs.

  19. Peak intensity prediction in MALDI-TOF mass spectrometry: A machine learning study to support quantitative proteomics

    PubMed Central

    Timm, Wiebke; Scherbart, Alexandra; Böcker, Sebastian; Kohlbacher, Oliver; Nattkemper, Tim W

    2008-01-01

    Background Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e.g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification. Results In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation). Conclusion The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics. PMID:18937839

  20. The application of machine learning in multi sensor data fusion for activity recognition in mobile device space

    NASA Astrophysics Data System (ADS)

    Marhoubi, Asmaa H.; Saravi, Sara; Edirisinghe, Eran A.

    2015-05-01

    The present generation of mobile handheld devices comes equipped with a large number of sensors. The key sensors include the Ambient Light Sensor, Proximity Sensor, Gyroscope, Compass and the Accelerometer. Many mobile applications are driven based on the readings obtained from either one or two of these sensors. However the presence of multiple-sensors will enable the determination of more detailed activities that are carried out by the user of a mobile device, thus enabling smarter mobile applications to be developed that responds more appropriately to user behavior and device usage. In the proposed research we use recent advances in machine learning to fuse together the data obtained from all key sensors of a mobile device. We investigate the possible use of single and ensemble classifier based approaches to identify a mobile device's behavior in the space it is present. Feature selection algorithms are used to remove non-discriminant features that often lead to poor classifier performance. As the sensor readings are noisy and include a significant proportion of missing values and outliers, we use machine learning based approaches to clean the raw data obtained from the sensors, before use. Based on selected practical case studies, we demonstrate the ability to accurately recognize device behavior based on multi-sensor data fusion.