Science.gov

Sample records for advanced machine learning

  1. Recent Advances in Predictive (Machine) Learning

    SciTech Connect

    Friedman, J

    2004-01-24

    Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

  2. Advances in Machine Learning and Data Mining for Astronomy

    NASA Astrophysics Data System (ADS)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  3. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    ERIC Educational Resources Information Center

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  4. Machine Learning in Medicine.

    PubMed

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  5. Advances in Climate Informatics: Accelerating Discovery in Climate Science with Machine Learning

    NASA Astrophysics Data System (ADS)

    Monteleoni, C.

    2015-12-01

    Despite the scientific consensus on climate change, drastic uncertainties remain. The climate system is characterized by complex phenomena that are imperfectly observed and even more imperfectly simulated. Climate data is Big Data, yet the magnitude of data and climate model output increasingly overwhelms the tools currently used to analyze them. Computational innovation is therefore needed. Machine learning is a cutting-edge research area at the intersection of computer science and statistics, focused on developing algorithms for big data analytics. Machine learning has revolutionized scientific discovery (e.g. Bioinformatics), and spawned new technologies (e.g. Web search). The impact of machine learning on climate science promises to be similarly profound. The goal of the novel interdisciplinary field of Climate Informatics is to accelerate discovery in climate science with machine learning, in order to shed light on urgent questions about climate change. In this talk, I will survey my research group's progress in the emerging field of climate informatics. Our work includes algorithms to improve the combined predictions of the IPCC multi-model ensemble, applications to seasonal and subseasonal prediction, and a data-driven technique to detect and define extreme events.

  6. Introduction to machine learning.

    PubMed

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods. PMID:24272434

  7. Advances in Patient Classification for Traditional Chinese Medicine: A Machine Learning Perspective

    PubMed Central

    Zhao, Changbo; Li, Guo-Zheng; Wang, Chengjun; Niu, Jinling

    2015-01-01

    As a complementary and alternative medicine in medical field, traditional Chinese medicine (TCM) has drawn great attention in the domestic field and overseas. In practice, TCM provides a quite distinct methodology to patient diagnosis and treatment compared to western medicine (WM). Syndrome (ZHENG or pattern) is differentiated by a set of symptoms and signs examined from an individual by four main diagnostic methods: inspection, auscultation and olfaction, interrogation, and palpation which reflects the pathological and physiological changes of disease occurrence and development. Patient classification is to divide patients into several classes based on different criteria. In this paper, from the machine learning perspective, a survey on patient classification issue will be summarized on three major aspects of TCM: sign classification, syndrome differentiation, and disease classification. With the consideration of different diagnostic data analyzed by different computational methods, we present the overview for four subfields of TCM diagnosis, respectively. For each subfield, we design a rectangular reference list with applications in the horizontal direction and machine learning algorithms in the longitudinal direction. According to the current development of objective TCM diagnosis for patient classification, a discussion of the research issues around machine learning techniques with applications to TCM diagnosis is given to facilitate the further research for TCM patient classification. PMID:26246834

  8. Machine Learning and Radiology

    PubMed Central

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  9. Leveraging advanced data analytics, machine learning, and metrology models to enable critical dimension metrology solutions for advanced integrated circuit nodes

    NASA Astrophysics Data System (ADS)

    Rana, Narender; Zhang, Yunlin; Kagalwala, Taher; Bailey, Todd

    2014-10-01

    the useful and more accurate CD and profile information of the structures. This paper presents the optimization of scatterometry and MBIR model calibration and the feasibility to extrapolate not only in design and process space but also from one process step to a previous process step. A well-calibrated scatterometry model or patterning simulation model can be used to accurately extrapolate and interpolate in the design and process space for lithography patterning where AFM is not capable of accurately measuring sub-40 nm trenches. The uncertainty associated with extrapolation can be large and needs to be minimized. We have made use of measurements from CD-SEM and CD-AFM, along with the patterning and scatterometry simulation models to estimate the uncertainty associated with extrapolation and the methods to reduce it. For the first time, we have reported the application of machine learning (artificial neural networks) to the resist shrinkage systematic phenomenon to accurately predict the preshrink CD based on supervised learning using the CD-AFM data. The study lays out various basic concepts, approaches, and protocols of multiple source data processing and integration for a hybrid metrology approach. Impacts of this study include more accurate metrology, patterning models, and better process controls for advanced IC nodes.

  10. Paradigms for machine learning

    NASA Technical Reports Server (NTRS)

    Schlimmer, Jeffrey C.; Langley, Pat

    1991-01-01

    Five paradigms are described for machine learning: connectionist (neural network) methods, genetic algorithms and classifier systems, empirical methods for inducing rules and decision trees, analytic learning methods, and case-based approaches. Some dimensions are considered along with these paradigms vary in their approach to learning, and the basic methods are reviewed that are used within each framework, together with open research issues. It is argued that the similarities among the paradigms are more important than their differences, and that future work should attempt to bridge the existing boundaries. Finally, some recent developments in the field of machine learning are discussed, and their impact on both research and applications is examined.

  11. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances.

    PubMed

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance. PMID:26346869

  12. Remediating radium contaminated legacy sites: Advances made through machine learning in routine monitoring of "hot" particles.

    PubMed

    Varley, Adam; Tyler, Andrew; Smith, Leslie; Dale, Paul; Davies, Mike

    2015-07-15

    The extensive use of radium during the 20th century for industrial, military and pharmaceutical purposes has led to a large number of contaminated legacy sites across Europe and North America. Sites that pose a high risk to the general public can present expensive and long-term remediation projects. Often the most pragmatic remediation approach is through routine monitoring operating gamma-ray detectors to identify, in real-time, the signal from the most hazardous heterogeneous contamination (hot particles); thus facilitating their removal and safe disposal. However, current detection systems do not fully utilise all spectral information resulting in low detection rates and ultimately an increased risk to the human health. The aim of this study was to establish an optimised detector-algorithm combination. To achieve this, field data was collected using two handheld detectors (sodium iodide and lanthanum bromide) and a number of Monte Carlo simulated hot particles were randomly injected into the field data. This allowed for the detection rate of conventional deterministic (gross counts) and machine learning (neural networks and support vector machines) algorithms to be assessed. The results demonstrated that a Neural Network operated on a sodium iodide detector provided the best detection capability. Compared to deterministic approaches, this optimised detection system could detect a hot particle on average 10cm deeper into the soil column or with half of the activity at the same depth. It was also found that noise presented by internal contamination restricted lanthanum bromide for this application. PMID:25847171

  13. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery. PMID:26017444

  14. Probabilistic machine learning and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  15. Multistrategy machine-learning vision system

    NASA Astrophysics Data System (ADS)

    Roberts, Barry A.

    1993-04-01

    Advances in the field of machine learning technology have yielded learning techniques with solid theoretical foundations that are applicable to the problems being encountered by object recognition systems. At Honeywell an object recognition system that works with high-level, symbolic, object features is under development. This system, named object recognition accomplished through combined learning expertise (ORACLE), employs both an inductive learning technique (i.e., conceptual clustering, CC) and a deductive technique (i.e., explanation-based learning, EBL) that are combined in a synergistic manner. This paper provides an overview of the ORACLE system, describes the machine learning mechanisms (EBL and CC) that it employs, and provides example results of system operation. The paper emphasizes the beneficial effect of integrating machine learning into object recognition systems.

  16. Stacked Extreme Learning Machines.

    PubMed

    Zhou, Hongming; Huang, Guang-Bin; Lin, Zhiping; Wang, Han; Soh, Yeng Chai

    2015-09-01

    Extreme learning machine (ELM) has recently attracted many researchers' interest due to its very fast learning speed, good generalization ability, and ease of implementation. It provides a unified solution that can be used directly to solve regression, binary, and multiclass classification problems. In this paper, we propose a stacked ELMs (S-ELMs) that is specially designed for solving large and complex data problems. The S-ELMs divides a single large ELM network into multiple stacked small ELMs which are serially connected. The S-ELMs can approximate a very large ELM network with small memory requirement. To further improve the testing accuracy on big data problems, the ELM autoencoder can be implemented during each iteration of the S-ELMs algorithm. The simulation results show that the S-ELMs even with random hidden nodes can achieve similar testing accuracy to support vector machine (SVM) while having low memory requirements. With the help of ELM autoencoder, the S-ELMs can achieve much better testing accuracy than SVM and slightly better accuracy than deep belief network (DBN) with much faster training speed. PMID:25361517

  17. A Machine Learning Based Framework for Adaptive Mobile Learning

    NASA Astrophysics Data System (ADS)

    Al-Hmouz, Ahmed; Shen, Jun; Yan, Jun

    Advances in wireless technology and handheld devices have created significant interest in mobile learning (m-learning) in recent years. Students nowadays are able to learn anywhere and at any time. Mobile learning environments must also cater for different user preferences and various devices with limited capability, where not all of the information is relevant and critical to each learning environment. To address this issue, this paper presents a framework that depicts the process of adapting learning content to satisfy individual learner characteristics by taking into consideration his/her learning style. We use a machine learning based algorithm for acquiring, representing, storing, reasoning and updating each learner acquired profile.

  18. Machine learning: An artificial intelligence approach. Vol. II

    SciTech Connect

    Michalski, R.S.; Carbonell, J.G.; Mitchell, T.M.

    1986-01-01

    This book reflects the expansion of machine learning research through presentation of recent advances in the field. The book provides an account of current research directions. Major topics covered include the following: learning concepts and rules from examples; cognitive aspects of learning; learning by analogy; learning by observation and discovery; and an exploration of general aspects of learning.

  19. Model-based machine learning

    PubMed Central

    Bishop, Christopher M.

    2013-01-01

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  20. Model-based machine learning.

    PubMed

    Bishop, Christopher M

    2013-02-13

    Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications. PMID:23277612

  1. Machine Learning in Systems Biology

    PubMed Central

    d'Alché-Buc, Florence; Wehenkel, Louis

    2008-01-01

    This supplement contains extended versions of a selected subset of papers presented at the workshop MLSB 2007, Machine Learning in Systems Biology, Evry, France, from September 24 to 25, 2007. PMID:19091048

  2. Machine learning in systems biology.

    PubMed

    d'Alché-Buc, Florence; Wehenkel, Louis

    2008-01-01

    This supplement contains extended versions of a selected subset of papers presented at the workshop MLSB 2007, Machine Learning in Systems Biology, Evry, France, from September 24 to 25, 2007. PMID:19091048

  3. Web Mining: Machine Learning for Web Applications.

    ERIC Educational Resources Information Center

    Chen, Hsinchun; Chau, Michael

    2004-01-01

    Presents an overview of machine learning research and reviews methods used for evaluating machine learning systems. Ways that machine-learning algorithms were used in traditional information retrieval systems in the "pre-Web" era are described, and the field of Web mining and how machine learning has been used in different Web mining applications…

  4. Machine Shop. Student Learning Guide.

    ERIC Educational Resources Information Center

    Palm Beach County Board of Public Instruction, West Palm Beach, FL.

    This student learning guide contains eight modules for completing a course in machine shop. It is designed especially for use in Palm Beach County, Florida. Each module covers one task, and consists of a purpose, performance objective, enabling objectives, learning activities and resources, information sheets, student self-check with answer key,…

  5. Gaussian processes for machine learning.

    PubMed

    Seeger, Matthias

    2004-04-01

    Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided. PMID:15112367

  6. Advanced Learning

    ERIC Educational Resources Information Center

    Hijon-Neira, Raquel, Ed.

    2009-01-01

    The education industry has obviously been influenced by the Internet revolution. Teaching and learning methods have changed significantly since the coming of the Web and it is very likely they will keep evolving many years to come thanks to it. A good example of this changing reality is the spectacular development of e-Learning. In a more…

  7. Game-powered machine learning

    PubMed Central

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-01-01

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786

  8. Game-powered machine learning.

    PubMed

    Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

    2012-04-24

    Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786

  9. Machine learning methods in chemoinformatics

    PubMed Central

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  10. Photonic Neurocomputers And Learning Machines

    NASA Astrophysics Data System (ADS)

    Farhat, Nabil H.

    1990-05-01

    The study of complex multidimensional nonlinear dynamical systems and the modeling and emulation of cognitive brain-like processing of sensory information (neural network research), including the study of chaos and its role in such systems would benefit immensely from the development of a new generation of programmable analog computers capable of carrying out collective, nonlinear and iterative computations at very high speed. The massive interconnectivity and nonlinearity needed in such analog computing structures indicate that a mix of optics and electronics mediated by judicial choice of device physics offer benefits for realizing networks with the following desirable properties: (a) large scale nets, i.e. nets with high number of decision making elements (neurons), (b) modifiable structure, i.e. ability to partition the net into any desired number of layers of prescribed size (number of neurons per layer) with any prescribed pattern of communications between them (e.g. feed forward or feedback (recurrent)), (c) programmable and/or adaptive connectivity weights between the neurons for self-organization and learning, (d) both synchroneous or asynchroneous update rules be possible, (e) high speed update i.e. neurons with lisec response time to enable rapid iteration and convergence, (f) can be used in the study and evaluation of a variety of adaptive learning algorithms, (g) can be used in rapid solution by fast simulated annealing of complex optimization problems of the kind encountered in adaptive learning, pattern recognition, and image processing. The aim of this paper is to describe recent efforts and progress made towards achieving these desirable attributes in analog photonic (optoelectronic and/or electron optical) hardware that utilizes primarily incoherent light. A specific example, hardware implementation of a stochastic Boltzmann learning machine, is used as vehicle for identifying generic issues and clarify research and development areas for further

  11. A Machine Learning System for Recognizing Subclasses (Demo)

    SciTech Connect

    Vatsavai, Raju

    2012-01-01

    Thematic information extraction from remote sensing images is a complex task. In this demonstration, we present *Miner machine learning system. In particular, we demonstrate an advanced subclass recognition algorithm that is specifically designed to extract finer classes from aggregate classes.

  12. Applications of Machine Learning in Information Retrieval.

    ERIC Educational Resources Information Center

    Cunningham, Sally Jo; Witten, Ian H.; Littin, James

    1999-01-01

    Introduces the basic ideas that underpin applications of machine learning to information retrieval. Describes applications of machine learning to text categorization. Considers how machine learning can be applied to the query-formulation process. Examines methods of document filtering, where the user specifies a query that is to be applied to an…

  13. Machine learning phases of matter

    NASA Astrophysics Data System (ADS)

    Carrasquilla, Juan; Stoudenmire, Miles; Melko, Roger

    We show how the technology that allows automatic teller machines read hand-written digits in cheques can be used to encode and recognize phases of matter and phase transitions in many-body systems. In particular, we analyze the (quasi-)order-disorder transitions in the classical Ising and XY models. Furthermore, we successfully use machine learning to study classical Z2 gauge theories that have important technological application in the coming wave of quantum information technologies and whose phase transitions have no conventional order parameter.

  14. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  15. Learning Machine Learning: A Case Study

    ERIC Educational Resources Information Center

    Lavesson, N.

    2010-01-01

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning…

  16. The Higgs Machine Learning Challenge

    NASA Astrophysics Data System (ADS)

    Adam-Bourdarios, C.; Cowan, G.; Germain-Renaud, C.; Guyon, I.; Kégl, B.; Rousseau, D.

    2015-12-01

    The Higgs Machine Learning Challenge was an open data analysis competition that took place between May and September 2014. Samples of simulated data from the ATLAS Experiment at the LHC corresponding to signal events with Higgs bosons decaying to τ+τ- together with background events were made available to the public through the website of the data science organization Kaggle (kaggle.com). Participants attempted to identify the search region in a space of 30 kinematic variables that would maximize the expected discovery significance of the signal process. One of the primary goals of the Challenge was to promote communication of new ideas between the Machine Learning (ML) and HEP communities. In this regard it was a resounding success, with almost 2,000 participants from HEP, ML and other areas. The process of understanding and integrating the new ideas, particularly from ML into HEP, is currently underway.

  17. Finding new perovskite halides via machine learning

    DOE PAGESBeta

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-26

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach toward rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning, henceforth referred to as ML) via building a support vectormore » machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br, or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 185 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor, and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. As a result, the trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.« less

  18. Finding New Perovskite Halides via Machine learning

    NASA Astrophysics Data System (ADS)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  19. Application of advanced machine learning methods on resting-state fMRI network for identification of mild cognitive impairment and Alzheimer's disease.

    PubMed

    Khazaee, Ali; Ebrahimzadeh, Ata; Babajani-Feremi, Abbas

    2016-09-01

    The study of brain networks by resting-state functional magnetic resonance imaging (rs-fMRI) is a promising method for identifying patients with dementia from healthy controls (HC). Using graph theory, different aspects of the brain network can be efficiently characterized by calculating measures of integration and segregation. In this study, we combined a graph theoretical approach with advanced machine learning methods to study the brain network in 89 patients with mild cognitive impairment (MCI), 34 patients with Alzheimer's disease (AD), and 45 age-matched HC. The rs-fMRI connectivity matrix was constructed using a brain parcellation based on a 264 putative functional areas. Using the optimal features extracted from the graph measures, we were able to accurately classify three groups (i.e., HC, MCI, and AD) with accuracy of 88.4 %. We also investigated performance of our proposed method for a binary classification of a group (e.g., MCI) from two other groups (e.g., HC and AD). The classification accuracies for identifying HC from AD and MCI, AD from HC and MCI, and MCI from HC and AD, were 87.3, 97.5, and 72.0 %, respectively. In addition, results based on the parcellation of 264 regions were compared to that of the automated anatomical labeling atlas (AAL), consisted of 90 regions. The accuracy of classification of three groups using AAL was degraded to 83.2 %. Our results show that combining the graph measures with the machine learning approach, on the basis of the rs-fMRI connectivity analysis, may assist in diagnosis of AD and MCI. PMID:26363784

  20. Introducing Machine Learning Concepts with WEKA.

    PubMed

    Smith, Tony C; Frank, Eibe

    2016-01-01

    This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information. PMID:27008023

  1. Application of advanced materials to rotating machines

    NASA Technical Reports Server (NTRS)

    Triner, J. E.

    1983-01-01

    In discussing the application of advanced materials to rotating machinery, the following topics are covered: the torque speed characteristics of ac and dc machines, motor and transformer losses, the factors affecting core loss in motors, advanced magnetic materials and conductors, and design tradeoffs for samarium cobalt motors.

  2. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  3. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  4. Data Mining and Machine Learning in Astronomy

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  5. Applying Machine Learning to Star Cluster Classification

    NASA Astrophysics Data System (ADS)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  6. Machine learning in sedimentation modelling.

    PubMed

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    The paper presents machine learning (ML) models that predict sedimentation in the harbour basin of the Port of Rotterdam. The important factors affecting the sedimentation process such as waves, wind, tides, surge, river discharge, etc. are studied, the corresponding time series data is analysed, missing values are estimated and the most important variables behind the process are chosen as the inputs. Two ML methods are used: MLP ANN and M5 model tree. The latter is a collection of piece-wise linear regression models, each being an expert for a particular region of the input space. The models are trained on the data collected during 1992-1998 and tested by the data of 1999-2000. The predictive accuracy of the models is found to be adequate for the potential use in the operational decision making. PMID:16530383

  7. Machine learning in motion control

    NASA Technical Reports Server (NTRS)

    Su, Renjeng; Kermiche, Noureddine

    1989-01-01

    The existing methodologies for robot programming originate primarily from robotic applications to manufacturing, where uncertainties of the robots and their task environment may be minimized by repeated off-line modeling and identification. In space application of robots, however, a higher degree of automation is required for robot programming because of the desire of minimizing the human intervention. We discuss a new paradigm of robotic programming which is based on the concept of machine learning. The goal is to let robots practice tasks by themselves and the operational data are used to automatically improve their motion performance. The underlying mathematical problem is to solve the problem of dynamical inverse by iterative methods. One of the key questions is how to ensure the convergence of the iterative process. There have been a few small steps taken into this important approach to robot programming. We give a representative result on the convergence problem.

  8. Defect classification using machine learning

    NASA Astrophysics Data System (ADS)

    Carr, Adra; Kegelmeyer, L.; Liao, Z. M.; Abdulla, G.; Cross, D.; Kegelmeyer, W. P.; Ravizza, F.; Carr, C. W.

    2008-10-01

    Laser-induced damage growth on the surface of fused silica optics has been extensively studied and has been found to depend on a number of factors including fluence and the surface on which the damage site resides. It has been demonstrated that damage sites as small as a few tens of microns can be detected and tracked on optics installed a fusion-class laser, however, determining the surface of an optic on which a damage site resides in situ can be a significant challenge. In this work demonstrate that a machine-learning algorithm can successfully predict the surface location of the damage site using an expanded set of characteristics for each damage site, some of which are not historically associated with growth rate.

  9. Defect Classification Using Machine Learning

    SciTech Connect

    Carr, A; Kegelmeyer, L; Liao, Z M; Abdulla, G; Cross, D; Kegelmeyer, W P; Raviza, F; Carr, C W

    2008-10-24

    Laser-induced damage growth on the surface of fused silica optics has been extensively studied and has been found to depend on a number of factors including fluence and the surface on which the damage site resides. It has been demonstrated that damage sites as small as a few tens of microns can be detected and tracked on optics installed a fusion-class laser, however, determining the surface of an optic on which a damage site resides in situ can be a significant challenge. In this work demonstrate that a machine-learning algorithm can successfully predict the surface location of the damage site using an expanded set of characteristics for each damage site, some of which are not historically associated with growth rate.

  10. Adaptive Learning Systems: Beyond Teaching Machines

    ERIC Educational Resources Information Center

    Kara, Nuri; Sevim, Nese

    2013-01-01

    Since 1950s, teaching machines have changed a lot. Today, we have different ideas about how people learn, what instructor should do to help students during their learning process. We have adaptive learning technologies that can create much more student oriented learning environments. The purpose of this article is to present these changes and its…

  11. Machine learning for medical images analysis.

    PubMed

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods. PMID:27374127

  12. Machine vision systems using machine learning for industrial product inspection

    NASA Astrophysics Data System (ADS)

    Lu, Yi; Chen, Tie Q.; Chen, Jie; Zhang, Jian; Tisler, Anthony

    2002-02-01

    Machine vision inspection requires efficient processing time and accurate results. In this paper, we present a machine vision inspection architecture, SMV (Smart Machine Vision). SMV decomposes a machine vision inspection problem into two stages, Learning Inspection Features (LIF), and On-Line Inspection (OLI). The LIF is designed to learn visual inspection features from design data and/or from inspection products. During the OLI stage, the inspection system uses the knowledge learnt by the LIF component to inspect the visual features of products. In this paper we will present two machine vision inspection systems developed under the SMV architecture for two different types of products, Printed Circuit Board (PCB) and Vacuum Florescent Displaying (VFD) boards. In the VFD board inspection system, the LIF component learns inspection features from a VFD board and its displaying patterns. In the PCB board inspection system, the LIF learns the inspection features from the CAD file of a PCB board. In both systems, the LIF component also incorporates interactive learning to make the inspection system more powerful and efficient. The VFD system has been deployed successfully in three different manufacturing companies and the PCB inspection system is the process of being deployed in a manufacturing plant.

  13. Advanced mean-field theory of the restricted Boltzmann machine

    NASA Astrophysics Data System (ADS)

    Huang, Haiping; Toyoizumi, Taro

    2015-05-01

    Learning in restricted Boltzmann machine is typically hard due to the computation of gradients of log-likelihood function. To describe the network state statistics of the restricted Boltzmann machine, we develop an advanced mean-field theory based on the Bethe approximation. Our theory provides an efficient message-passing-based method that evaluates not only the partition function (free energy) but also its gradients without requiring statistical sampling. The results are compared with those obtained by the computationally expensive sampling-based method.

  14. Trends in extreme learning machines: a review.

    PubMed

    Huang, Gao; Huang, Guang-Bin; Song, Shiji; You, Keyou

    2015-01-01

    Extreme learning machine (ELM) has gained increasing interest from various research fields recently. In this review, we aim to report the current state of the theoretical research and practical advances on this subject. We first give an overview of ELM from the theoretical perspective, including the interpolation theory, universal approximation capability, and generalization ability. Then we focus on the various improvements made to ELM which further improve its stability, sparsity and accuracy under general or specific conditions. Apart from classification and regression, ELM has recently been extended for clustering, feature selection, representational learning and many other learning tasks. These newly emerging algorithms greatly expand the applications of ELM. From implementation aspect, hardware implementation and parallel computation techniques have substantially sped up the training of ELM, making it feasible for big data processing and real-time reasoning. Due to its remarkable efficiency, simplicity, and impressive generalization performance, ELM have been applied in a variety of domains, such as biomedical engineering, computer vision, system identification, and control and robotics. In this review, we try to provide a comprehensive view of these advances in ELM together with its future perspectives. PMID:25462632

  15. Natural Language Processing and Machine Learning (NLP/ML): Applying Advances in Biomedicine to the Earth Sciences

    NASA Astrophysics Data System (ADS)

    Duerr, R.; Myers, S.; Palmer, M.; Jenkins, C. J.; Thessen, A.; Martin, J.

    2015-12-01

    Semantics underlie many of the tools and services available from and on the web. From improving search results to enabling data mashups and other forms of interoperability, semantic technologies have proven themselves. But creating semantic resources, especially re-usable semantic resources, is extremely time consuming and labor intensive. Why? Because it is not just a matter of technology but also of obtaining rough consensus if not full agreement amongst community members on the meaning and order of things. One way to develop these resources in a more automated way would be to use NLP/ML techniques to extract the required resources from large corpora of subject-specific text such as peer-reviewed papers where presumably a rough consensus has been achieved at least about the basics of the particular discipline involved. While not generally applied to Earth Sciences, considerable resources have been spent in other fields such as medicine on these types of techniques with some success. The NSF-funded ClearEarth project is applying the techniques developed for biomedicine to the cryosphere, geology, and biology in order to spur faster development of the semantic resources needed in these fields. The first area being addressed by the project is the cryosphere, specifically sea ice nomenclature where an existing set of sea ice ontologies are being used as the "Gold Standard" against which to test and validate the NLP/ML techniques. The processes being used, lessons learned and early results will be described.

  16. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology

    PubMed Central

    Ju, Ying

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics. PMID:27478823

  17. Machine learning research 1989-90

    NASA Technical Reports Server (NTRS)

    Porter, Bruce W.; Souther, Arthur

    1990-01-01

    Multifunctional knowledge bases offer a significant advance in artificial intelligence because they can support numerous expert tasks within a domain. As a result they amortize the costs of building a knowledge base over multiple expert systems and they reduce the brittleness of each system. Due to the inevitable size and complexity of multifunctional knowledge bases, their construction and maintenance require knowledge engineering and acquisition tools that can automatically identify interactions between new and existing knowledge. Furthermore, their use requires software for accessing those portions of the knowledge base that coherently answer questions. Considerable progress was made in developing software for building and accessing multifunctional knowledge bases. A language was developed for representing knowledge, along with software tools for editing and displaying knowledge, a machine learning program for integrating new information into existing knowledge, and a question answering system for accessing the knowledge base.

  18. Machine Learning and Cosmological Simulations

    NASA Astrophysics Data System (ADS)

    Kamdar, Harshil; Turk, Matthew; Brunner, Robert

    2016-01-01

    We explore the application of machine learning (ML) to the problem of galaxy formation and evolution in a hierarchical universe. Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively evaluating the extent of the influence of dark matter halo properties on small-scale structure formation. For our analyses, we use both semi-analytical models (Millennium simulation) and N-body + hydrodynamical simulations (Illustris simulation). The ML algorithms are trained on important dark matter halo properties (inputs) and galaxy properties (outputs). The trained models are able to robustly predict the gas mass, stellar mass, black hole mass, star formation rate, $g-r$ color, and stellar metallicity. Moreover, the ML simulated galaxies obey fundamental observational constraints implying that the population of ML predicted galaxies is physically and statistically robust. Next, ML algorithms are trained on an N-body + hydrodynamical simulation and applied to an N-body only simulation (Dark Sky simulation, Illustris Dark), populating this new simulation with galaxies. We can examine how structure formation changes with different cosmological parameters and are able to mimic a full-blown hydrodynamical simulation in a computation time that is orders of magnitude smaller. We find that the set of ML simulated galaxies in Dark Sky obey the same observational constraints, further solidifying ML's place as an intriguing and promising technique in future galaxy formation studies and rapid mock galaxy catalog creation.

  19. Memristor models for machine learning.

    PubMed

    Carbajal, Juan Pablo; Dambre, Joni; Hermans, Michiel; Schrauwen, Benjamin

    2015-03-01

    In the quest for alternatives to traditional complementary metal-oxide-semiconductor, it is being suggested that digital computing efficiency and power can be improved by matching the precision to the application. Many applications do not need the high precision that is being used today. In particular, large gains in area and power efficiency could be achieved by dedicated analog realizations of approximate computing engines. In this work we explore the use of memristor networks for analog approximate computation, based on a machine learning framework called reservoir computing. Most experimental investigations on the dynamics of memristors focus on their nonvolatile behavior. Hence, the volatility that is present in the developed technologies is usually unwanted and is not included in simulation models. In contrast, in reservoir computing, volatility is not only desirable but necessary. Therefore, in this work, we propose two different ways to incorporate it into memristor simulation models. The first is an extension of Strukov's model, and the second is an equivalent Wiener model approximation. We analyze and compare the dynamical properties of these models and discuss their implications for the memory and the nonlinear processing capacity of memristor networks. Our results indicate that device variability, increasingly causing problems in traditional computer design, is an asset in the context of reservoir computing. We conclude that although both models could lead to useful memristor-based reservoir computing systems, their computational performance will differ. Therefore, experimental modeling research is required for the development of accurate volatile memristor models. PMID:25602769

  20. Machine Translation-Assisted Language Learning: Writing for Beginners

    ERIC Educational Resources Information Center

    Garcia, Ignacio; Pena, Maria Isabel

    2011-01-01

    The few studies that deal with machine translation (MT) as a language learning tool focus on its use by advanced learners, never by beginners. Yet, freely available MT engines (i.e. Google Translate) and MT-related web initiatives (i.e. Gabble-on.com) position themselves to cater precisely to the needs of learners with a limited command of a…

  1. Alternating minimization and Boltzmann machine learning.

    PubMed

    Byrne, W

    1992-01-01

    Training a Boltzmann machine with hidden units is appropriately treated in information geometry using the information divergence and the technique of alternating minimization. The resulting algorithm is shown to be closely related to gradient descent Boltzmann machine learning rules, and the close relationship of both to the EM algorithm is described. An iterative proportional fitting procedure for training machines without hidden units is described and incorporated into the alternating minimization algorithm. PMID:18276461

  2. Machine learning applications in genetics and genomics.

    PubMed

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets. PMID:25948244

  3. Man-machine interface requirements - advanced technology

    NASA Technical Reports Server (NTRS)

    Remington, R. W.; Wiener, E. L.

    1984-01-01

    Research issues and areas are identified where increased understanding of the human operator and the interaction between the operator and the avionics could lead to improvements in the performance of current and proposed helicopters. Both current and advanced helicopter systems and avionics are considered. Areas critical to man-machine interface requirements include: (1) artificial intelligence; (2) visual displays; (3) voice technology; (4) cockpit integration; and (5) pilot work loads and performance.

  4. An introduction to quantum machine learning

    NASA Astrophysics Data System (ADS)

    Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

    2015-04-01

    Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

  5. Machine Learning for Biomedical Literature Triage

    PubMed Central

    Almeida, Hayda; Meurs, Marie-Jean; Kosseim, Leila; Butler, Greg; Tsang, Adrian

    2014-01-01

    This paper presents a machine learning system for supporting the first task of the biological literature manual curation process, called triage. We compare the performance of various classification models, by experimenting with dataset sampling factors and a set of features, as well as three different machine learning algorithms (Naive Bayes, Support Vector Machine and Logistic Model Trees). The results show that the most fitting model to handle the imbalanced datasets of the triage classification task is obtained by using domain relevant features, an under-sampling technique, and the Logistic Model Trees algorithm. PMID:25551575

  6. Machine learning for Big Data analytics in plants.

    PubMed

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences. PMID:25223304

  7. Machine learning in cell biology - teaching computers to recognize phenotypes.

    PubMed

    Sommer, Christoph; Gerlich, Daniel W

    2013-12-15

    Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline. PMID:24259662

  8. Machine learning for real time remote detection

    NASA Astrophysics Data System (ADS)

    Labbé, Benjamin; Fournier, Jérôme; Henaff, Gilles; Bascle, Bénédicte; Canu, Stéphane

    2010-10-01

    Infrared systems are key to providing enhanced capability to military forces such as automatic control of threats and prevention from air, naval and ground attacks. Key requirements for such a system to produce operational benefits are real-time processing as well as high efficiency in terms of detection and false alarm rate. These are serious issues since the system must deal with a large number of objects and categories to be recognized (small vehicles, armored vehicles, planes, buildings, etc.). Statistical learning based algorithms are promising candidates to meet these requirements when using selected discriminant features and real-time implementation. This paper proposes a new decision architecture benefiting from recent advances in machine learning by using an effective method for level set estimation. While building decision function, the proposed approach performs variable selection based on a discriminative criterion. Moreover, the use of level set makes it possible to manage rejection of unknown or ambiguous objects thus preserving the false alarm rate. Experimental evidences reported on real world infrared images demonstrate the validity of our approach.

  9. Machine learning: Trends, perspectives, and prospects.

    PubMed

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. PMID:26185243

  10. Advanced geometric camera calibration for machine vision

    NASA Astrophysics Data System (ADS)

    Vo, Minh; Wang, Zhaoyang; Luu, Long; Ma, Jun

    2011-11-01

    In many machine vision applications, a crucial step is to accurately determine the relation between the image of the object and its physical dimension by performing a calibration process. Over time, various calibration techniques have been developed. Nevertheless, the existing methods cannot satisfy the ever-increasing demands for higher accuracy performance. In this letter, an advanced geometric camera calibration technique which employs a frontal image concept and a hyper-precise control point detection scheme with digital image correlation is presented. Simulation and real experimental results have successfully demonstrated the superior of the proposed technique.

  11. Machine Learning for Biological Trajectory Classification Applications

    NASA Technical Reports Server (NTRS)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  12. Extreme Learning Machines for spatial environmental data

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2015-12-01

    The use of machine learning algorithms has increased in a wide variety of domains (from finance to biocomputing and astronomy), and nowadays has a significant impact on the geoscience community. In most real cases geoscience data modelling problems are multivariate, high dimensional, variable at several spatial scales, and are generated by non-linear processes. For such complex data, the spatial prediction of continuous (or categorical) variables is a challenging task. The aim of this paper is to investigate the potential of the recently developed Extreme Learning Machine (ELM) for environmental data analysis, modelling and spatial prediction purposes. An important contribution of this study deals with an application of a generic self-consistent methodology for environmental data driven modelling based on Extreme Learning Machine. Both real and simulated data are used to demonstrate applicability of ELM at different stages of the study to understand and justify the results.

  13. Introduction to machine learning for brain imaging.

    PubMed

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. PMID:21172442

  14. Learning in brains and machines.

    PubMed

    Poggio, T; Shelton, C R

    2000-01-01

    The problem of learning is arguably at the very core of the problem of intelligence, both biological and artificial. In this paper we sketch some of our work over the last ten years in the area of supervised learning, focusing on three interlinked directions of research: theory, engineering applications (that is, making intelligent software) and neuroscience (that is, understanding the brain's mechanisms of learning). PMID:11198239

  15. Geological Mapping Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Harvey, A. S.; Fotopoulos, G.

    2016-06-01

    Remotely sensed spectral imagery, geophysical (magnetic and gravity), and geodetic (elevation) data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA), which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines) are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  16. Distributed fuzzy learning using the MULTISOFT machine.

    PubMed

    Russo, M

    2001-01-01

    Describes PARGEFREX, a distributed approach to genetic-neuro-fuzzy learning which has been implemented using the MULTISOFT machine, a low-cost form of personal computers built at the University of Messina. The performance of the serial version is hugely enhanced with the simple parallelization scheme described in the paper. Once a learning dataset is fixed, there is a very high super linear speedup in the average time needed to reach a prefixed learning error, i.e., if the number of personal computers increases by n times, the mean learning time becomes less than 1/n times. PMID:18249882

  17. Machine Learning Toolkit for Extreme Scale

    SciTech Connect

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are considered in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets

  18. Machine Learning Toolkit for Extreme Scale

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination ofmore » samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are considered in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less

  19. Using Simple Machines to Leverage Learning

    ERIC Educational Resources Information Center

    Dotger, Sharon

    2008-01-01

    What would your students say if you told them they could lift you off the ground using a block and a board? Using a simple machine, they'll find out they can, and they'll learn about work, energy, and motion in the process! In addition, this integrated lesson gives students the opportunity to investigate variables while practicing measurement…

  20. Vitrification: Machines learn to recognize glasses

    NASA Astrophysics Data System (ADS)

    Ceriotti, Michele; Vitelli, Vincenzo

    2016-05-01

    The dynamics of a viscous liquid undergo a dramatic slowdown when it is cooled to form a solid glass. Recognizing the structural changes across such a transition remains a major challenge. Machine-learning methods, similar to those Facebook uses to recognize groups of friends, have now been applied to this problem.

  1. Machine learning in soil classification.

    PubMed

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    In a number of engineering problems, e.g. in geotechnics, petroleum engineering, etc. intervals of measured series data (signals) are to be attributed a class maintaining the constraint of contiguity and standard classification methods could be inadequate. Classification in this case needs involvement of an expert who observes the magnitude and trends of the signals in addition to any a priori information that might be available. In this paper, an approach for automating this classification procedure is presented. Firstly, a segmentation algorithm is developed and applied to segment the measured signals. Secondly, the salient features of these segments are extracted using boundary energy method. Based on the measured data and extracted features to assign classes to the segments classifiers are built; they employ Decision Trees, ANN and Support Vector Machines. The methodology was tested in classifying sub-surface soil using measured data from Cone Penetration Testing and satisfactory results were obtained. PMID:16530382

  2. Machine Learning of Maritime Fog Forecast Rules.

    NASA Astrophysics Data System (ADS)

    Tag, Paul M.; Peak, James E.

    1996-05-01

    In recent years, the field of artificial intelligence has contributed significantly to the science of meteorology, most notably in the now familiar form of expert systems. Expert systems have focused on rules or heuristics by establishing, in computer code, the reasoning process of a weather forecaster predicting, for example, thunderstorms or fog. In addition to the years of effort that goes into developing such a knowledge base is the time-consuming task of extracting such knowledge and experience from experts. In this paper, the induction of rules directly from meteorological data is explored-a process called machine learning. A commercial machine learning program called C4.5, is applied to a meteorological problem, forecasting maritime fog, for which a reliable expert system has been previously developed. Two detasets are used: 1) weather ship observations originally used for testing and evaluating the expert system, and 2) buoy measurements taken off the coast of California. For both datasets, the rules produced by C4.5 are reasonable and make physical sense, thus demonstrating that an objective induction approach can reveal physical processes directly from data. For the ship database, the machine-generated rules are not as accurate as those from the expert system but are still significantly better than persistence forecasts. For the buoy data, the forecast accuracies are very high, but only slightly superior to persistence. The results indicate that the machine learning approach is a viable tool for developing meteorological expertise, but only when applied to reliable data with sufficient cases of known outcome. In those instances when such databases are available, the use of machine learning can provide useful insight that otherwise might take considerable human analysis to produce.

  3. Protein function in precision medicine: deep understanding with machine learning.

    PubMed

    Rost, Burkhard; Radivojac, Predrag; Bromberg, Yana

    2016-08-01

    Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both. PMID:27423136

  4. Paradigms for Realizing Machine Learning Algorithms.

    PubMed

    Agneeswaran, Vijay Srinivas; Tonpay, Pranay; Tiwary, Jayati

    2013-12-01

    The article explains the three generations of machine learning algorithms-with all three trying to operate on big data. The first generation tools are SAS, SPSS, etc., while second generation realizations include Mahout and RapidMiner (that work over Hadoop), and the third generation paradigms include Spark and GraphLab, among others. The essence of the article is that for a number of machine learning algorithms, it is important to look beyond the Hadoop's Map-Reduce paradigm in order to make them work on big data. A number of promising contenders have emerged in the third generation that can be exploited to realize deep analytics on big data. PMID:27447253

  5. Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

    PubMed

    Howard, Rebecca; Rattray, Magnus; Prosperi, Mattia; Custovic, Adnan

    2015-07-01

    Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as 'asthma endotypes'. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies. PMID:26143394

  6. Topics in Machine Learning for Astronomers

    NASA Astrophysics Data System (ADS)

    Cisewski, Jessi

    2016-01-01

    As astronomical datasets continue to increase in size and complexity, innovative statistical and machine learning tools are required to address the scientific questions of interest in a computationally efficient manner. I will introduce some tools that astronomers can employ for such problems with a focus on clustering and classification techniques. I will introduce standard methods, but also get into more recent developments that may be of use to the astronomical community.

  7. Machine Learning and Geometric Technique for SLAM

    NASA Astrophysics Data System (ADS)

    Bernal-Marin, Miguel; Bayro-Corrochano, Eduardo

    This paper describes a new approach for building 3D geometric maps using a laser rangefinder, a stereo camera system and a mathematical system the Conformal Geometric Algebra. The use of a known visual landmarks in the map helps to carry out a good localization of the robot. A machine learning technique is used for recognition of objects in the environment. These landmarks are found using the Viola and Jones algorithm and are represented with their position in the 3D virtual map.

  8. Man-machine cooperation in advanced teleoperation

    NASA Technical Reports Server (NTRS)

    Fiorini, Paolo; Das, Hari; Lee, Sukhan

    1993-01-01

    Teleoperation experiments at JPL have shown that advanced features in a telerobotic system are a necessary condition for good results, but that they are not sufficient to assure consistently good performance by the operators. Two or three operators are normally used during training and experiments to maintain the desired performance. An alternative to this multi-operator control station is a man-machine interface embedding computer programs that can perform some of the operator's functions. In this paper we present our first experiments with these concepts, in which we focused on the areas of real-time task monitoring and interactive path planning. In the first case, when performing a known task, the operator has an automatic aid for setting control parameters and camera views. In the second case, an interactive path planner will rank different path alternatives so that the operator will make the correct control decision. The monitoring function has been implemented with a neural network doing the real-time task segmentation. The interactive path planner was implemented for redundant manipulators to specify arm configurations across the desired path and satisfy geometric, task, and performance constraints.

  9. Prototype-based models in machine learning.

    PubMed

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of potentially high-dimensional, complex datasets. We discuss basic schemes of competitive vector quantization as well as the so-called neural gas approach and Kohonen's topology-preserving self-organizing map. Supervised learning in prototype systems is exemplified in terms of learning vector quantization. Most frequently, the familiar Euclidean distance serves as a dissimilarity measure. We present extensions of the framework to nonstandard measures and give an introduction to the use of adaptive distances in relevance learning. PMID:26800334

  10. Scaling up: Distributed machine learning with cooperation

    SciTech Connect

    Provost, F.J.; Hennessy, D.N.

    1996-12-31

    Machine-learning methods are becoming increasingly popular for automated data analysis. However, standard methods do not scale up to massive scientific and business data sets without expensive hardware. This paper investigates a practical alternative for scaling up: the use of distributed processing to take advantage of the often dormant PCs and workstations available on local networks. Each workstation runs a common rule-learning program on a subset of the data. We first show that for commonly used rule-evaluation criteria, a simple form of cooperation can guarantee that a rule will look good to the set of cooperating learners if and only if it would look good to a single learner operating with the entire data set. We then show how such a system can further capitalize on different perspectives by sharing learned knowledge for significant reduction in search effort. We demonstrate the power of the method by learning from a massive data set taken from the domain of cellular fraud detection. Finally, we provide an overview of other methods for scaling up machine learning.

  11. Dimension Reduction With Extreme Learning Machine.

    PubMed

    Kasun, Liyanaarachchi Lekamalage Chamara; Yang, Yan; Huang, Guang-Bin; Zhang, Zhengyou

    2016-08-01

    Data may often contain noise or irrelevant information, which negatively affect the generalization capability of machine learning algorithms. The objective of dimension reduction algorithms, such as principal component analysis (PCA), non-negative matrix factorization (NMF), random projection (RP), and auto-encoder (AE), is to reduce the noise or irrelevant information of the data. The features of PCA (eigenvectors) and linear AE are not able to represent data as parts (e.g. nose in a face image). On the other hand, NMF and non-linear AE are maimed by slow learning speed and RP only represents a subspace of original data. This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed, and learns the between-class scatter subspace. To this end, this paper investigates a linear and non-linear dimension reduction framework referred to as extreme learning machine AE (ELM-AE) and sparse ELM-AE (SELM-AE). In contrast to tied weight AE, the hidden neurons in ELM-AE and SELM-AE need not be tuned, and their parameters (e.g, input weights in additive neurons) are initialized using orthogonal and sparse random weights, respectively. Experimental results on USPS handwritten digit recognition data set, CIFAR-10 object recognition, and NORB object recognition data set show the efficacy of linear and non-linear ELM-AE and SELM-AE in terms of discriminative capability, sparsity, training time, and normalized mean square error. PMID:27214902

  12. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment

    PubMed Central

    2011-01-01

    Background Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. Results This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. Conclusions AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of

  13. Machine learning: how to get more out of HEP data and the Higgs Boson Machine Learning Challenge

    NASA Astrophysics Data System (ADS)

    Wolter, Marcin

    2015-09-01

    Multivariate techniques using machine learning algorithms have become an integral part in many High Energy Physics (HEP) data analyses. The article shows the gain in physics reach of the physics experiments due to the adaptation of machine learning techniques. Rapid development in the field of machine learning in the last years is a challenge for the HEP community. The open competition for machine learning experts "Higgs Boson Machine Learning Challenge" shows, that the modern techniques developed outside HEP can significantly improve the analysis of data from HEP experiments and improve the sensitivity of searches for new particles and processes.

  14. Discriminative clustering via extreme learning machine.

    PubMed

    Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng

    2015-10-01

    Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036

  15. Machine learning methods for predictive proteomics.

    PubMed

    Barla, Annalisa; Jurman, Giuseppe; Riccadonna, Samantha; Merler, Stefano; Chierici, Marco; Furlanello, Cesare

    2008-03-01

    The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies. PMID:18310105

  16. Entanglement-Based Machine Learning on a Quantum Computer

    NASA Astrophysics Data System (ADS)

    Cai, X.-D.; Wu, D.; Su, Z.-E.; Chen, M.-C.; Wang, X.-L.; Li, Li; Liu, N.-L.; Lu, C.-Y.; Pan, J.-W.

    2015-03-01

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  17. Entanglement-based machine learning on a quantum computer.

    PubMed

    Cai, X-D; Wu, D; Su, Z-E; Chen, M-C; Wang, X-L; Li, Li; Liu, N-L; Lu, C-Y; Pan, J-W

    2015-03-20

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning. PMID:25839250

  18. Extreme Learning Machine for Multilayer Perceptron.

    PubMed

    Tang, Jiexiong; Deng, Chenwei; Huang, Guang-Bin

    2016-04-01

    Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via l1 constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme. PMID:25966483

  19. Advanced machine tools, loading systems viewed

    NASA Astrophysics Data System (ADS)

    Kharkov, V. I.

    1986-03-01

    The machine-tooling complex built from a revolving lathe and a two-armed robot designed to machine short revolving bodies including parts with curvilinear and threaded surfaces from piece blanks in either small-series or series multiitem production is described. The complex consists of: (1) a model 1V340F30 revolving lathe with a vertical axis of rotation, 8-position revolving head on a cross carriage and an Elektronika NTs-31 on-line control system; (2) a gantry-style two-armed M20-Ts robot with a 20-kilogram (20 x 2) load capacity; and (3) an 8-position indexable blank table, one of whose positions is for initial unloading of finished parts. Subsequently, machined parts are set onto the position into which all of the blanks are unloaded. Complex enclosure allows adjustment and process correction during maintenance and convenient observation of the machining process.

  20. Weka-A Machine Learning Workbench for Data Mining

    NASA Astrophysics Data System (ADS)

    Frank, Eibe; Hall, Mark; Holmes, Geoffrey; Kirkby, Richard; Pfahringer, Bernhard; Witten, Ian H.; Trigg, Len

    The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient interactive graphical user interfaces are provided for data exploration, for setting up large-scale experiments on distributed computing platforms, and for designing configurations for streamed data processing. These interfaces constitute an advanced environment for experimental data mining. The system is written in Java and distributed under the terms of the GNU General Public License.

  1. [Advances in biomolecular machine: methane monooxygenases].

    PubMed

    Lu, Jixue; Wang, Shizhen; Fang, Baishan

    2015-07-01

    Methane monooxygenases (MMO), regarded as "an amazing biomolecular machine", catalyze the oxidation of methane to methanol under aerobic conditions. MMO catalyze the oxidation of methane elaborately, which is a novel way to catalyze methane to methanol. Furthermore, MMO can inspire the biomolecular machine design. In this review, we introduced MMO including structure, gene and catalytic mechanism. The history and the taxonomy of MMO were also introduced. PMID:26647577

  2. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    ERIC Educational Resources Information Center

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  3. Modeling quantum physics with machine learning

    NASA Astrophysics Data System (ADS)

    Lopez-Bezanilla, Alejandro; Arsenault, Louis-Francois; Millis, Andrew; Littlewood, Peter; von Lilienfeld, Anatole

    2014-03-01

    Machine Learning (ML) is a systematic way of inferring new results from sparse information. It directly allows for the resolution of computationally expensive sets of equations by making sense of accumulated knowledge and it is therefore an attractive method for providing computationally inexpensive 'solvers' for some of the important systems of condensed matter physics. In this talk a non-linear regression statistical model is introduced to demonstrate the utility of ML methods in solving quantum physics related problem, and is applied to the calculation of electronic transport in 1D channels. DOE contract number DE-AC02-06CH11357.

  4. Patient-centered yes/no prognosis using learning machines

    PubMed Central

    König, I.R.; Malley, J.D.; Pajevic, S.; Weimar, C.; Diener, H-C.

    2009-01-01

    In the last 15 years several machine learning approaches have been developed for classification and regression. In an intuitive manner we introduce the main ideas of classification and regression trees, support vector machines, bagging, boosting and random forests. We discuss differences in the use of machine learning in the biomedical community and the computer sciences. We propose methods for comparing machines on a sound statistical basis. Data from the German Stroke Study Collaboration is used for illustration. We compare the results from learning machines to those obtained by a published logistic regression and discuss similarities and differences. PMID:19216340

  5. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    PubMed Central

    Subbulakshmi, C. V.; Deepa, S. N.

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers. PMID:26491713

  6. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier.

    PubMed

    Subbulakshmi, C V; Deepa, S N

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO) algorithm with the extreme learning machine (ELM) classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN), proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers. PMID:26491713

  7. Machine learning of user profiles: Representational issues

    SciTech Connect

    Bloedorn, E.; Mani, I.; MacMillan, T.R.

    1996-12-31

    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (ER) problem.

  8. Mining the Kepler Data using Machine Learning

    NASA Astrophysics Data System (ADS)

    Walkowicz, Lucianne; Howe, A. R.; Nayar, R.; Turner, E. L.; Scargle, J.; Meadows, V.; Zee, A.

    2014-01-01

    Kepler's high cadence and incredible precision has provided an unprecedented view into stars and their planetary companions, revealing both expected and novel phenomena and systems. Due to the large number of Kepler lightcurves, the discovery of novel phenomena in particular has often been serendipitous in the course of searching for known forms of variability (for example, the discovery of the doubly pulsating elliptical binary KOI-54, originally identified by the transiting planet search pipeline). In this talk, we discuss progress on mining the Kepler data through both supervised and unsupervised machine learning, intended to both systematically search the Kepler lightcurves for rare or anomalous variability, and to create a variability catalog for community use. Mining the dataset in this way also allows for a quantitative identification of anomalous variability, and so may also be used as a signal-agnostic form of optical SETI. As the Kepler data are exceptionally rich, they provide an interesting counterpoint to machine learning efforts typically performed on sparser and/or noisier survey data, and will inform similar characterization carried out on future survey datasets.

  9. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. PMID:26829605

  10. Measure Transformer Semantics for Bayesian Machine Learning

    NASA Astrophysics Data System (ADS)

    Borgström, Johannes; Gordon, Andrew D.; Greenberg, Michael; Margetson, James; van Gael, Jurgen

    The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.

  11. Galaxy morphology - An unsupervised machine learning approach

    NASA Astrophysics Data System (ADS)

    Schutter, A.; Shamir, L.

    2015-09-01

    Structural properties poses valuable information about the formation and evolution of galaxies, and are important for understanding the past, present, and future universe. Here we use unsupervised machine learning methodology to analyze a network of similarities between galaxy morphological types, and automatically deduce a morphological sequence of galaxies. Application of the method to the EFIGI catalog show that the morphological scheme produced by the algorithm is largely in agreement with the De Vaucouleurs system, demonstrating the ability of computer vision and machine learning methods to automatically profile galaxy morphological sequences. The unsupervised analysis method is based on comprehensive computer vision techniques that compute the visual similarities between the different morphological types. Rather than relying on human cognition, the proposed system deduces the similarities between sets of galaxy images in an automatic manner, and is therefore not limited by the number of galaxies being analyzed. The source code of the method is publicly available, and the protocol of the experiment is included in the paper so that the experiment can be replicated, and the method can be used to analyze user-defined datasets of galaxy images.

  12. Photometric Supernova Classification with Machine Learning

    NASA Astrophysics Data System (ADS)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  13. Online Sequential Extreme Learning Machine With Kernels.

    PubMed

    Scardapane, Simone; Comminiello, Danilo; Scarpiniti, Michele; Uncini, Aurelio

    2015-09-01

    The extreme learning machine (ELM) was recently proposed as a unifying framework for different families of learning algorithms. The classical ELM model consists of a linear combination of a fixed number of nonlinear expansions of the input vector. Learning in ELM is hence equivalent to finding the optimal weights that minimize the error on a dataset. The update works in batch mode, either with explicit feature mappings or with implicit mappings defined by kernels. Although an online version has been proposed for the former, no work has been done up to this point for the latter, and whether an efficient learning algorithm for online kernel-based ELM exists remains an open problem. By explicating some connections between nonlinear adaptive filtering and ELM theory, in this brief, we present an algorithm for this task. In particular, we propose a straightforward extension of the well-known kernel recursive least-squares, belonging to the kernel adaptive filtering (KAF) family, to the ELM framework. We call the resulting algorithm the kernel online sequential ELM (KOS-ELM). Moreover, we consider two different criteria used in the KAF field to obtain sparse filters and extend them to our context. We show that KOS-ELM, with their integration, can result in a highly efficient algorithm, both in terms of obtained generalization error and training time. Empirical evaluations demonstrate interesting results on some benchmarking datasets. PMID:25561597

  14. Advances in learning for intelligent mobile robots

    NASA Astrophysics Data System (ADS)

    Hall, Ernest L.; Ghaffari, Masoud; Liao, Xiaoqun S.; Alhaj Ali, Souma M.

    2004-10-01

    Intelligent mobile robots must often operate in an unstructured environment cluttered with obstacles and with many possible action paths to accomplish a variety of tasks. Such machines have many potential useful applications in medicine, defense, industry and even the home so that the design of such machines is a challenge with great potential rewards. Even though intelligent systems may have symbiotic closure that permits them to make a decision or take an action without external inputs, sensors such as vision permit sensing of the environment and permit precise adaptation to changes. Sensing and adaptation define a reactive system. However, in many applications some form of learning is also desirable or perhaps even required. A further level of intelligence called understanding may involve not only sensing, adaptation and learning but also creative, perceptual solutions involving models of not only the eyes and brain but also the mind. The purpose of this paper is to present a discussion of recent technical advances in learning for intelligent mobile robots with examples of adaptive, creative and perceptual learning. The significance of this work is in providing a greater understanding of the applications of learning to mobile robots that could lead to important beneficial applications.

  15. Learning to Control Advanced Life Support Systems

    NASA Technical Reports Server (NTRS)

    Subramanian, Devika

    2004-01-01

    Advanced life support systems have many interacting processes and limited resources. Controlling and optimizing advanced life support systems presents unique challenges. In particular, advanced life support systems are nonlinear coupled dynamical systems and it is difficult for humans to take all interactions into account to design an effective control strategy. In this project. we developed several reinforcement learning controllers that actively explore the space of possible control strategies, guided by rewards from a user specified long term objective function. We evaluated these controllers using a discrete event simulation of an advanced life support system. This simulation, called BioSim, designed by Nasa scientists David Kortenkamp and Scott Bell has multiple, interacting life support modules including crew, food production, air revitalization, water recovery, solid waste incineration and power. They are implemented in a consumer/producer relationship in which certain modules produce resources that are consumed by other modules. Stores hold resources between modules. Control of this simulation is via adjusting flows of resources between modules and into/out of stores. We developed adaptive algorithms that control the flow of resources in BioSim. Our learning algorithms discovered several ingenious strategies for maximizing mission length by controlling the air and water recycling systems as well as crop planting schedules. By exploiting non-linearities in the overall system dynamics, the learned controllers easily out- performed controllers written by human experts. In sum, we accomplished three goals. We (1) developed foundations for learning models of coupled dynamical systems by active exploration of the state space, (2) developed and tested algorithms that learn to efficiently control air and water recycling processes as well as crop scheduling in Biosim, and (3) developed an understanding of the role machine learning in designing control systems for

  16. Machine Shop I. Learning Activity Packets (LAPs). Section D--Power Saws and Drilling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This document contains two learning activity packets (LAPs) for the "power saws and drilling machines" instructional area of a Machine Shop I course. The two LAPs cover the following topics: power saws and drill press. Each LAP contains a cover sheet that describes its purpose, an introduction, and the tasks included in the LAP; learning steps…

  17. Learning Activity Packets for Milling Machines. Unit I--Introduction to Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to identify parts and attachments of vertical and horizontal milling machines, identify work-holding devices, state safety rules, and…

  18. Teacher Leaders: Advancing Mathematics Learning

    ERIC Educational Resources Information Center

    Kinzer, Cathy J.; Rincón, Mari; Ward, Jana; Rincón, Ricardo; Gomez, Lesli

    2014-01-01

    Four elementary school instructors offer insights into their classrooms, their unique professional roles, and their leadership approaches as they reflect on their journey to advance teacher and student mathematics learning. They note a "teacher leader" serves as an example to other educators and strives to impact student learning;…

  19. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  20. Machine learning and genome annotation: a match meant to be?

    PubMed Central

    2013-01-01

    By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE. PMID:23731483

  1. Large-Scale Machine Learning for Classification and Search

    ERIC Educational Resources Information Center

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  2. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises

    ERIC Educational Resources Information Center

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2015-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead…

  3. Machine Learning for Dynamical Mean Field Theory

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-Francois; Lopez-Bezanilla, Alejandro; von Lilienfeld, O. Anatole; Littlewood, P. B.; Millis, Andy

    2014-03-01

    Machine Learning (ML), an approach that infers new results from accumulated knowledge, is in use for a variety of tasks ranging from face and voice recognition to internet searching and has recently been gaining increasing importance in chemistry and physics. In this talk, we investigate the possibility of using ML to solve the equations of dynamical mean field theory which otherwise requires the (numerically very expensive) solution of a quantum impurity model. Our ML scheme requires the relation between two functions: the hybridization function describing the bare (local) electronic structure of a material and the self-energy describing the many body physics. We discuss the parameterization of the two functions for the exact diagonalization solver and present examples, beginning with the Anderson Impurity model with a fixed bath density of states, demonstrating the advantages and the pitfalls of the method. DOE contract DE-AC02-06CH11357.

  4. On machine learning classification of otoneurological data.

    PubMed

    Juhola, Martti

    2008-01-01

    A dataset including cases of six otoneurological diseases was analysed using machine learning methods to investigate the classification problem of these diseases and to compare the effectiveness of different methods for this data. Linear discriminant analysis was the best method and next multilayer perceptron neural networks provided that the data was input into a network in the form of principal components. Nearest neighbour searching, k-means clustering and Kohonen neural networks achieved almost as good results as the former, but decision trees slightly worse. Thus, these methods fared well, but Naïve Bayes rule could not be used since some data matrices were singular. Otoneurological cases subject to the six diseases given can be reliably distinguished. PMID:18487733

  5. Tracking medical genetic literature through machine learning.

    PubMed

    Bornstein, Aaron T; McLoughlin, Matthew H; Aguilar, Jesus; Wong, Wendy S W; Solomon, Benjamin D

    2016-08-01

    There has been remarkable progress in identifying the causes of genetic conditions as well as understanding how changes in specific genes cause disease. Though difficult (and often superficial) to parse, an interesting tension involves emphasis on basic research aimed to dissect normal and abnormal biology versus more clearly clinical and therapeutic investigations. To examine one facet of this question and to better understand progress in Mendelian-related research, we developed an algorithm that classifies medical literature into three categories (Basic, Clinical, and Management) and conducted a retrospective analysis. We built a supervised machine learning classification model using the Azure Machine Learning (ML) Platform and analyzed the literature (1970-2014) from NCBI's Entrez Gene2Pubmed Database (http://www.ncbi.nlm.nih.gov/gene) using genes from the NHGRI's Clinical Genomics Database (http://research.nhgri.nih.gov/CGD/). We applied our model to 376,738 articles: 288,639 (76.6%) were classified as Basic, 54,178 (14.4%) as Clinical, and 24,569 (6.5%) as Management. The average classification accuracy was 92.2%. The rate of Clinical publication was significantly higher than Basic or Management. The rate of publication of article types differed significantly when divided into key eras: Human Genome Project (HGP) planning phase (1984-1990); HGP launch (1990) to publication (2001); following HGP completion to the "Next Generation" advent (2009); the era following 2009. In conclusion, in addition to the findings regarding the pace and focus of genetic progress, our algorithm produced a database that can be used in a variety of contexts including automating the identification of management-related literature. PMID:27268407

  6. Quantum learning and universal quantum matching machine

    NASA Astrophysics Data System (ADS)

    Sasaki, Masahide; Carlini, Alberto

    2002-08-01

    Suppose that three kinds of quantum systems are given in some unknown states |f>⊗N, |g1>⊗K, and |g2>⊗K, and we want to decide which template state |g1> or |g2>, each representing the feature of the pattern class C1 or C2, respectively, is closest to the input feature state |f>. This is an extension of the pattern matching problem into the quantum domain. Assuming that these states are known a priori to belong to a certain parametric family of pure qubit systems, we derive two kinds of matching strategies. The first one is a semiclassical strategy that is obtained by the natural extension of conventional matching strategies and consists of a two-stage procedure: identification (estimation) of the unknown template states to design the classifier (learning process to train the classifier) and classification of the input system into the appropriate pattern class based on the estimated results. The other is a fully quantum strategy without any intermediate measurement, which we might call as the universal quantum matching machine. We present the Bayes optimal solutions for both strategies in the case of K=1, showing that there certainly exists a fully quantum matching procedure that is strictly superior to the straightforward semiclassical extension of the conventional matching strategy based on the learning process.

  7. Optimizing transition states via kernel-based machine learning.

    PubMed

    Pozun, Zachary D; Hansen, Katja; Sheppard, Daniel; Rupp, Matthias; Müller, Klaus-Robert; Henkelman, Graeme

    2012-05-01

    We present a method for optimizing transition state theory dividing surfaces with support vector machines. The resulting dividing surfaces require no a priori information or intuition about reaction mechanisms. To generate optimal dividing surfaces, we apply a cycle of machine-learning and refinement of the surface by molecular dynamics sampling. We demonstrate that the machine-learned surfaces contain the relevant low-energy saddle points. The mechanisms of reactions may be extracted from the machine-learned surfaces in order to identify unexpected chemically relevant processes. Furthermore, we show that the machine-learned surfaces significantly increase the transmission coefficient for an adatom exchange involving many coupled degrees of freedom on a (100) surface when compared to a distance-based dividing surface. PMID:22583204

  8. DREAM: diabetic retinopathy analysis using machine learning.

    PubMed

    Roychowdhury, Sohini; Koozekanani, Dara D; Parhi, Keshab K

    2014-09-01

    This paper presents a computer-aided screening system (DREAM) that analyzes fundus images with varying illumination and fields of view, and generates a severity grade for diabetic retinopathy (DR) using machine learning. Classifiers such as the Gaussian Mixture model (GMM), k-nearest neighbor (kNN), support vector machine (SVM), and AdaBoost are analyzed for classifying retinopathy lesions from nonlesions. GMM and kNN classifiers are found to be the best classifiers for bright and red lesion classification, respectively. A main contribution of this paper is the reduction in the number of features used for lesion classification by feature ranking using Adaboost where 30 top features are selected out of 78. A novel two-step hierarchical classification approach is proposed where the nonlesions or false positives are rejected in the first step. In the second step, the bright lesions are classified as hard exudates and cotton wool spots, and the red lesions are classified as hemorrhages and micro-aneurysms. This lesion classification problem deals with unbalanced datasets and SVM or combination classifiers derived from SVM using the Dempster-Shafer theory are found to incur more classification error than the GMM and kNN classifiers due to the data imbalance. The DR severity grading system is tested on 1200 images from the publicly available MESSIDOR dataset. The DREAM system achieves 100% sensitivity, 53.16% specificity, and 0.904 AUC, compared to the best reported 96% sensitivity, 51% specificity, and 0.875 AUC, for classifying images as with or without DR. The feature reduction further reduces the average computation time for DR severity per image from 59.54 to 3.46 s. PMID:25192577

  9. ATCA for Machines-- Advanced Telecommunications Computing Architecture

    SciTech Connect

    Larsen, R.S.; /SLAC

    2008-04-22

    The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.

  10. Studying depression using imaging and machine learning methods

    PubMed Central

    Patel, Meenal J.; Khalaf, Alexander; Aizenstein, Howard J.

    2015-01-01

    Depression is a complex clinical entity that can pose challenges for clinicians regarding both accurate diagnosis and effective timely treatment. These challenges have prompted the development of multiple machine learning methods to help improve the management of this disease. These methods utilize anatomical and physiological data acquired from neuroimaging to create models that can identify depressed patients vs. non-depressed patients and predict treatment outcomes. This article (1) presents a background on depression, imaging, and machine learning methodologies; (2) reviews methodologies of past studies that have used imaging and machine learning to study depression; and (3) suggests directions for future depression-related studies. PMID:26759786

  11. Precision Machining and Technology; Machine Shop Work--Advanced: 9557.04.

    ERIC Educational Resources Information Center

    Dade County Public Schools, Miami, FL.

    The course outline has been prepared as a guide to assist the instructor in systematically planning and presenting a variety of meaningful lessons to facilitate the necessary training for the machine shop student. The material is designed to enable the student to learn the manipulative skills and related knowledge necessary to understand the jig…

  12. Abrasives and Grinding Machines; Machine Shop Work--Advanced: 9557.02.

    ERIC Educational Resources Information Center

    Dade County Public Schools, Miami, FL.

    The course outline has been prepared as a guide to assist the instructor in systematically planning and presenting a variety of meaningful lessons to facilitate the necessary training for the machine shop student. The material contained in the outline is designed to enable the student to learn the manipulative skills and related knowledge…

  13. Machine Tool Advanced Skills Technology Program (MAST). Overview and Methodology.

    ERIC Educational Resources Information Center

    Texas State Technical Coll., Waco.

    The Machine Tool Advanced Skills Technology Program (MAST) is a geographical partnership of six of the nation's best two-year colleges located in the six states that have about one-third of the density of metals-related industries in the United States. The purpose of the MAST grant is to develop and implement a national training model to overcome…

  14. Sensors, controls, and man-machine interface for advanced teleoperation

    NASA Technical Reports Server (NTRS)

    Bejczy, A. K.

    1980-01-01

    Some advances are reviewed which have been made in teleoperator (i.e., mechanical activities performed by mechanical devices at a remote site under remote control) technology through introduction of sensors, computers, automation, and new man-machine interface devices and techniques for remote manipulator control. The state of the art is summarized and some basic problems and challenging developments are examined.

  15. Machine Learning: Quality Control of HST Grism Spectra

    NASA Astrophysics Data System (ADS)

    Stoehr, F.; Walsh, J.; Kuntschner, H.; Rosati, P.; Fosbury, R.; Kümmel, M.; Haase, J.; Hook, R.; Lombardi, M.; Nilsson, K.; Rosa, M.

    2011-07-01

    The Pipeline for Hubble Legacy Archive Grism data (PHLAG) had been used to extract more than 70000 wavelength and flux calibrated 1D spectra. They were obtained from 153 fields observed in G800L grism spectroscopy mode with the Advanced Camera for Surveys on the Hubble Space Telescope. This number of spectra is far too large to allow detailed visual inspection for quality control on reasonable time-scales. As a solution, we use machine learning techniques to classify spectra into "good" and "bad" based on a careful visual inspection of only about 3% of the full sample. A final visual skim through the set of "good" spectra was made to remove catastrophic failures. The remaining 47919 spectra form the largest set of slitless high-level spectroscopic data products publicly released to date.

  16. Machine learning optimization of cross docking accuracy.

    PubMed

    Bjerrum, Esben J

    2016-06-01

    Performance of small molecule automated docking programs has conceptually been divided into docking -, scoring -, ranking - and screening power, which focuses on the crystal pose prediction, affinity prediction, ligand ranking and database screening capabilities of the docking program, respectively. Benchmarks show that different docking programs can excel in individual benchmarks which suggests that the scoring function employed by the programs can be optimized for a particular task. Here the scoring function of Smina is re-optimized towards enhancing the docking power using a supervised machine learning approach and a manually curated database of ligands and cross docking receptor pairs. The optimization method does not need associated binding data for the receptor-ligand examples used in the data set and works with small train sets. The re-optimization of the weights for the scoring function results in a similar docking performance with regard to docking power towards a cross docking test set. A ligand decoy based benchmark indicates a better discrimination between poses with high and low RMSD. The reported parameters for Smina are compatible with Autodock Vina and represent ready-to-use alternative parameters for researchers who aim at pose prediction rather than affinity prediction. PMID:27179709

  17. Many-body physics via machine learning

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-Francois; von Lilienfeld, O. Anatole; Millis, Andrew J.

    We demonstrate a method for the use of machine learning (ML) to solve the equations of many-body physics, which are functional equations linking a bare to an interacting Green's function (or self-energy) offering transferable power of prediction for physical quantities for both the forward and the reverse engineering problem of materials. Functions are represented by coefficients in an orthogonal polynomial expansion and kernel ridge regression is used. The method is demonstrated using as an example a database built from Dynamical Mean Field theory (DMFT) calculations on the three dimensional Hubbard model. We discuss the extension to a database for real materials. We also discuss some new area of investigation concerning high throughput predictions for real materials by offering a perspective of how our scheme is general enough for applications to other problems involving the inversion of integral equations from the integrated knowledge such as the analytical continuation of the Green's function and the reconstruction of lattice structures from X-ray spectra. Office of Science of the U.S. Department of Energy under SubContract DOE No. 3F-3138 and FG-ER04169.

  18. Predicting increased blood pressure using machine learning.

    PubMed

    Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Gomes, Cristiano Mauro Assis; Soares, Telma de Jesus; Dos Reis, Luciana Araujo; Santos, Joselito

    2014-01-01

    The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R (2) (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R (2) (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power. PMID:24669313

  19. Learning Machine, Vietnamese Based Human-Computer Interface.

    ERIC Educational Resources Information Center

    Northwest Regional Educational Lab., Portland, OR.

    The sixth session of IT@EDU98 consisted of seven papers on the topic of the learning machine--Vietnamese based human-computer interface, and was chaired by Phan Viet Hoang (Informatics College, Singapore). "Knowledge Based Approach for English Vietnamese Machine Translation" (Hoang Kiem, Dinh Dien) presents the knowledge base approach, which…

  20. Learn about Physical Science: Simple Machines. [CD-ROM].

    ERIC Educational Resources Information Center

    2000

    This CD-ROM, designed for students in grades K-2, explores the world of simple machines. It allows students to delve into the mechanical world and learn the ways in which simple machines make work easier. Animated demonstrations are provided of the lever, pulley, wheel, screw, wedge, and inclined plane. Activities include practical matching and…

  1. Machine learning challenges in Mars rover traverse science

    NASA Technical Reports Server (NTRS)

    Castano, R.; Judd, M.; Anderson, R. C.; Estlin, T.

    2003-01-01

    The successful implementation of machine learning in autonomous rover traverse science requires addressing challenges that range from the analytical technical realm, to the fuzzy, philosophical domain of entrenched belief systems within scientists and mission managers.

  2. Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and promises

    PubMed Central

    Bone, Daniel; Goodwin, Matthew S.; Black, Matthew P.; Lee, Chi-Chun; Audhkhasi, Kartik; Narayanan, Shrikanth

    2014-01-01

    Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to misinformed conclusions. To illustrate this concern, the current paper critically evaluates and attempts to reproduce results from two studies (Wall et al., 2012a; Wall et al., 2012b) that claim to drastically reduce time to diagnose autism using machine learning. Our failure to generate comparable findings to those reported by Wall and colleagues using larger and more balanced data underscores several conceptual and methodological problems associated with these studies. We conclude with proposed best-practices when using machine learning in autism research, and highlight some especially promising areas for collaborative work at the intersection of computational and behavioral science. PMID:25294649

  3. Shedding Light on Synergistic Chemical Genetic Connections with Machine Learning.

    PubMed

    Ekins, Sean; Siqueira-Neto, Jair Lage

    2015-12-23

    Machine learning can be used to predict compounds acting synergistically, and this could greatly expand the universe of available potential treatments for diseases that are currently hidden in the dark chemical matter. PMID:27136350

  4. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    PubMed Central

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-01-01

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further, a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. While this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well. PMID:26876223

  5. Machine learning strategy for accelerated design of polymer dielectrics

    DOE PAGESBeta

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-02-15

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further,more » a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. Furthermore, while this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well.« less

  6. Machine Learning Strategy for Accelerated Design of Polymer Dielectrics

    NASA Astrophysics Data System (ADS)

    Mannodi-Kanakkithodi, Arun; Pilania, Ghanshyam; Huan, Tran Doan; Lookman, Turab; Ramprasad, Rampi

    2016-02-01

    The ability to efficiently design new and advanced dielectric polymers is hampered by the lack of sufficient, reliable data on wide polymer chemical spaces, and the difficulty of generating such data given time and computational/experimental constraints. Here, we address the issue of accelerating polymer dielectrics design by extracting learning models from data generated by accurate state-of-the-art first principles computations for polymers occupying an important part of the chemical subspace. The polymers are ‘fingerprinted’ as simple, easily attainable numerical representations, which are mapped to the properties of interest using a machine learning algorithm to develop an on-demand property prediction model. Further, a genetic algorithm is utilised to optimise polymer constituent blocks in an evolutionary manner, thus directly leading to the design of polymers with given target properties. While this philosophy of learning to make instant predictions and design is demonstrated here for the example of polymer dielectrics, it is equally applicable to other classes of materials as well.

  7. Machine learning on Parkinson's disease? Let's translate into clinical practice.

    PubMed

    Cerasa, Antonio

    2016-06-15

    Machine learning techniques represent the third-generation of clinical neuroimaging studies where the principal interest is not related to describe anatomical changes of a neurological disorder, but to evaluate if a multivariate approach may use these abnormalities to predict the correct classification of previously unseen clinical cohort. In the next few years, Machine learning will revolutionize clinical practice of Parkinson's disease, but enthusiasm should be turned down before removing some important barriers. PMID:26743974

  8. Protocol for secure quantum machine learning at a distant place

    NASA Astrophysics Data System (ADS)

    Bang, Jeongho; Lee, Seung-Woo; Jeong, Hyunseok

    2015-10-01

    The application of machine learning to quantum information processing has recently attracted keen interest, particularly for the optimization of control parameters in quantum tasks without any pre-programmed knowledge. By adapting the machine learning technique, we present a novel protocol in which an arbitrarily initialized device at a learner's location is taught by a provider located at a distant place. The protocol is designed such that any external learner who attempts to participate in or disrupt the learning process can be prohibited or noticed. We numerically demonstrate that our protocol works faithfully for single-qubit operation devices. A trade-off between the inaccuracy and the learning time is also analyzed.

  9. Development of E-Learning Materials for Machining Safety Education

    NASA Astrophysics Data System (ADS)

    Nakazawa, Tsuyoshi; Mita, Sumiyoshi; Matsubara, Masaaki; Takashima, Takeo; Tanaka, Koichi; Izawa, Satoru; Kawamura, Takashi

    We developed two e-learning materials for Manufacturing Practice safety education: movie learning materials and hazard-detection learning materials. Using these video and sound media, students can learn how to operate machines safely with movie learning materials, which raise the effectiveness of preparation and review for manufacturing practice. Using these materials, students can realize safety operation well. Students can apply knowledge learned in lectures to the detection of hazards and use study methods for hazard detection during machine operation using the hazard-detection learning materials. Particularly, the hazard-detection learning materials raise students‧ safety consciousness and increase students‧ comprehension of knowledge from lectures and comprehension of operations during Manufacturing Practice.

  10. Machine Learning Assessments of Soil Drying

    NASA Astrophysics Data System (ADS)

    Coopersmith, E. J.; Minsker, B. S.; Wenzel, C.; Gilmore, B. J.

    2011-12-01

    Agricultural activities require the use of heavy equipment and vehicles on unpaved farmlands. When soil conditions are wet, equipment can cause substantial damage, leaving deep ruts. In extreme cases, implements can sink and become mired, causing considerable delays and expense to extricate the equipment. Farm managers, who are often located remotely, cannot assess sites before allocating equipment, causing considerable difficulty in reliably assessing conditions of countless sites with any reliability and frequency. For example, farmers often trace serpentine paths of over one hundred miles each day to assess the overall status of various tracts of land spanning thirty, forty, or fifty miles in each direction. One means of assessing the moisture content of a field lies in the strategic positioning of remotely-monitored in situ sensors. Unfortunately, land owners are often reluctant to place sensors across their properties due to the significant monetary cost and complexity. This work aspires to overcome these limitations by modeling the process of wetting and drying statistically - remotely assessing field readiness using only information that is publically accessible. Such data includes Nexrad radar and state climate network sensors, as well as Twitter-based reports of field conditions for validation. Three algorithms, classification trees, k-nearest-neighbors, and boosted perceptrons are deployed to deliver statistical field readiness assessments of an agricultural site located in Urbana, IL. Two of the three algorithms performed with 92-94% accuracy, with the majority of misclassifications falling within the calculated margins of error. This demonstrates the feasibility of using a machine learning framework with only public data, knowledge of system memory from previous conditions, and statistical tools to assess "readiness" without the need for real-time, on-site physical observation. Future efforts will produce a workflow assimilating Nexrad, climate network

  11. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  12. Man-machine interface builders at the Advanced Photon Source

    SciTech Connect

    Anderson, M.D.

    1991-12-31

    Argonne National Laboratory is constructing a 7-GeV Advanced Photon Source for use as a synchrotron radiation source in basic and applied research. The controls and computing environment for this accelerator complex includes graphical operator interfaces to the machine based on Motif, X11, and PHIGS/PEX. Construction and operation of the control system for this accelerator relies upon interactive interface builder and diagram/editor type tools, as well as a run-time environment for the constructed displays which communicate with the physical machine via network connections. This paper discusses our experience with several commercial CUI builders, the inadequacies found in these, motivation for the development of an application- specific builder, and design and implementation strategies employed in the development of our own Man-Machine Interface builder. 5 refs.

  13. Man-machine interface builders at the Advanced Photon Source

    SciTech Connect

    Anderson, M.D.

    1991-01-01

    Argonne National Laboratory is constructing a 7-GeV Advanced Photon Source for use as a synchrotron radiation source in basic and applied research. The controls and computing environment for this accelerator complex includes graphical operator interfaces to the machine based on Motif, X11, and PHIGS/PEX. Construction and operation of the control system for this accelerator relies upon interactive interface builder and diagram/editor type tools, as well as a run-time environment for the constructed displays which communicate with the physical machine via network connections. This paper discusses our experience with several commercial CUI builders, the inadequacies found in these, motivation for the development of an application- specific builder, and design and implementation strategies employed in the development of our own Man-Machine Interface builder. 5 refs.

  14. Learning Activity Packets for Milling Machines. Unit III--Vertical Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to set up and operate a vertical mill. Tasks addressed in the LAP include mounting and removing cutters and cutter holders for vertical…

  15. Learning Activity Packets for Milling Machines. Unit II--Horizontal Milling Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) outlines the study activities and performance tasks covered in a related curriculum guide on milling machines. The course of study in this LAP is intended to help students learn to set up and operate a horizontal mill. Tasks addressed in the LAP include mounting style "A" or "B" arbors and adjusting arbor…

  16. In silico machine learning methods in drug development.

    PubMed

    Dobchev, Dimitar A; Pillai, Girinath G; Karelson, Mati

    2014-01-01

    Machine learning (ML) computational methods for predicting compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties are being increasingly applied in drug discovery and evaluation. Recently, machine learning techniques such as artificial neural networks, support vector machines and genetic programming have been explored for predicting inhibitors, antagonists, blockers, agonists, activators and substrates of proteins related to specific therapeutic targets. These methods are particularly useful for screening compound libraries of diverse chemical structures, "noisy" and high-dimensional data to complement QSAR methods, and in cases of unavailable receptor 3D structure to complement structure-based methods. A variety of studies have demonstrated the potential of machine-learning methods for predicting compounds as potential drug candidates. The present review is intended to give an overview of the strategies and current progress in using machine learning methods for drug design and the potential of the respective model development tools. We also regard a number of applications of the machine learning algorithms based on common classes of diseases. PMID:25262800

  17. Stellar classification from single-band imaging using machine learning

    NASA Astrophysics Data System (ADS)

    Kuntzer, T.; Tewes, M.; Courbin, F.

    2016-06-01

    Information on the spectral types of stars is of great interest in view of the exploitation of space-based imaging surveys. In this article, we investigate the classification of stars into spectral types using only the shape of their diffraction pattern in a single broad-band image. We propose a supervised machine learning approach to this endeavour, based on principal component analysis (PCA) for dimensionality reduction, followed by artificial neural networks (ANNs) estimating the spectral type. Our analysis is performed with image simulations mimicking the Hubble Space Telescope (HST) Advanced Camera for Surveys (ACS) in the F606W and F814W bands, as well as the Euclid VIS imager. We first demonstrate this classification in a simple context, assuming perfect knowledge of the point spread function (PSF) model and the possibility of accurately generating mock training data for the machine learning. We then analyse its performance in a fully data-driven situation, in which the training would be performed with a limited subset of bright stars from a survey, and an unknown PSF with spatial variations across the detector. We use simulations of main-sequence stars with flat distributions in spectral type and in signal-to-noise ratio, and classify these stars into 13 spectral subclasses, from O5 to M5. Under these conditions, the algorithm achieves a high success rate both for Euclid and HST images, with typical errors of half a spectral class. Although more detailed simulations would be needed to assess the performance of the algorithm on a specific survey, this shows that stellar classification from single-band images is well possible.

  18. Learning Processes in Man, Machine and Society

    ERIC Educational Resources Information Center

    Malita, Mircea

    1977-01-01

    Deciphering the learning mechanism which exists in man remains to be solved. This article examines the learning process with respect to association and cybernetics. It is recommended that research should focus on the transdisciplinary processes of learning which could become the next key concept in the science of man. (Author/MA)

  19. Building Artificial Vision Systems with Machine Learning

    SciTech Connect

    LeCun, Yann

    2011-02-23

    Three questions pose the next challenge for Artificial Intelligence (AI), robotics, and neuroscience. How do we learn perception (e.g. vision)? How do we learn representations of the perceptual world? How do we learn visual categories from just a few examples?

  20. Data Triage of Astronomical Transients: A Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Rebbapragada, U.

    This talk presents real-time machine learning systems for triage of big data streams generated by photometric and image-differencing pipelines. Our first system is a transient event detection system in development for the Palomar Transient Factory (PTF), a fully-automated synoptic sky survey that has demonstrated real-time discovery of optical transient events. The system is tasked with discriminating between real astronomical objects and bogus objects, which are usually artifacts of the image differencing pipeline. We performed a machine learning forensics investigation on PTF’s initial system that led to training data improvements that decreased both false positive and negative rates. The second machine learning system is a real-time classification engine of transients and variables in development for the Australian Square Kilometre Array Pathfinder (ASKAP), an upcoming wide-field radio survey with unprecedented ability to investigate the radio transient sky. The goal of our system is to classify light curves into known classes with as few observations as possible in order to trigger follow-up on costlier assets. We discuss the violation of standard machine learning assumptions incurred by this task, and propose the use of ensemble and hierarchical machine learning classifiers that make predictions most robustly.

  1. Application of Learning Machines and Combinatorial Algorithms in Water Resources Management and Hydrologic Sciences

    SciTech Connect

    Khalil, Abedalrazq F.; Kaheil, Yasir H.; Gill, Kashif; Mckee, Mac

    2010-01-01

    Contemporary and water resources engineering and management rely increasingly on pattern recognition techniques that have the ability to capitalize on the unrelenting accumulation of data that is made possible by modern information technology and remote sensing methods. In response to the growing information needs of modern water systems, advanced computational models and tools have been devised to identify and extract relevant information from the mass of data that is now available. This chapter presents innovative applications from computational learning science within the fields of hydrology, hydrogeology, hydroclimatology, and water management. The success of machine learning is evident from the growing number of studies involving the application of Artificial Neural Networks (ANN), Support Vector Machines (SVM), Relevance Vector Machines (RVM), and Locally Weighted Projection Regression (LWPR) to address various issues in hydrologic sciences. The applications that will be discussed within the chapter employ the abovementioned machine learning techniques for intelligent modeling of reservoir operations, temporal downscaling of precipitation, spatial downscaling of soil moisture and evapotranspiration, comparisons of various techniques for groundwater quality modeling, and forecasting of chaotic time series behavior. Combinatorial algorithms to capture the intrinsic complexities in the modeled phenomena and to overcome disparate scales are developed; for example, learning machines have been coupled with geostatistical techniques, non-homogenous hidden Markov models, wavelets, and evolutionary computing techniques. This chapter does not intend to be exhaustive; it reviews the progress that has been made over the past decade in the use of learning machines in applied hydrologic sciences and presents a summary of future needs and challenges for further advancement of these methods.

  2. Machine learning and predictive data analytics enabling metrology and process control in IC fabrication

    NASA Astrophysics Data System (ADS)

    Rana, Narender; Zhang, Yunlin; Wall, Donald; Dirahoui, Bachir; Bailey, Todd C.

    2015-03-01

    Integrate circuit (IC) technology is going through multiple changes in terms of patterning techniques (multiple patterning, EUV and DSA), device architectures (FinFET, nanowire, graphene) and patterning scale (few nanometers). These changes require tight controls on processes and measurements to achieve the required device performance, and challenge the metrology and process control in terms of capability and quality. Multivariate data with complex nonlinear trends and correlations generally cannot be described well by mathematical or parametric models but can be relatively easily learned by computing machines and used to predict or extrapolate. This paper introduces the predictive metrology approach which has been applied to three different applications. Machine learning and predictive analytics have been leveraged to accurately predict dimensions of EUV resist patterns down to 18 nm half pitch leveraging resist shrinkage patterns. These patterns could not be directly and accurately measured due to metrology tool limitations. Machine learning has also been applied to predict the electrical performance early in the process pipeline for deep trench capacitance and metal line resistance. As the wafer goes through various processes its associated cost multiplies. It may take days to weeks to get the electrical performance readout. Predicting the electrical performance early on can be very valuable in enabling timely actionable decision such as rework, scrap, feedforward, feedback predicted information or information derived from prediction to improve or monitor processes. This paper provides a general overview of machine learning and advanced analytics application in the advanced semiconductor development and manufacturing.

  3. Recognition of printed Arabic text using machine learning

    NASA Astrophysics Data System (ADS)

    Amin, Adnan

    1998-04-01

    Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.

  4. Acceleration of saddle-point searches with machine learning.

    PubMed

    Peterson, Andrew A

    2016-08-21

    In atomistic simulations, the location of the saddle point on the potential-energy surface (PES) gives important information on transitions between local minima, for example, via transition-state theory. However, the search for saddle points often involves hundreds or thousands of ab initio force calls, which are typically all done at full accuracy. This results in the vast majority of the computational effort being spent calculating the electronic structure of states not important to the researcher, and very little time performing the calculation of the saddle point state itself. In this work, we describe how machine learning (ML) can reduce the number of intermediate ab initio calculations needed to locate saddle points. Since machine-learning models can learn from, and thus mimic, atomistic simulations, the saddle-point search can be conducted rapidly in the machine-learning representation. The saddle-point prediction can then be verified by an ab initio calculation; if it is incorrect, this strategically has identified regions of the PES where the machine-learning representation has insufficient training data. When these training data are used to improve the machine-learning model, the estimates greatly improve. This approach can be systematized, and in two simple example problems we demonstrate a dramatic reduction in the number of ab initio force calls. We expect that this approach and future refinements will greatly accelerate searches for saddle points, as well as other searches on the potential energy surface, as machine-learning methods see greater adoption by the atomistics community. PMID:27544086

  5. Can Machine Learning Methods Predict Extubation Outcome in Premature Infants as well as Clinicians?

    PubMed Central

    Mueller, Martina; Almeida, Jonas S.; Stanislaus, Romesh; Wagner, Carol L.

    2014-01-01

    Rationale Though treatment of the prematurely born infant breathing with assistance of a mechanical ventilator has much advanced in the past decades, predicting extubation outcome at a given point in time remains challenging. Numerous studies have been conducted to identify predictors for extubation outcome; however, the rate of infants failing extubation attempts has not declined. Objective To develop a decision-support tool for the prediction of extubation outcome in premature infants using a set of machine learning algorithms Methods A dataset assembled from 486 premature infants on mechanical ventilation was used to develop predictive models using machine learning algorithms such as artificial neural networks (ANN), support vector machine (SVM), naïve Bayesian classifier (NBC), boosted decision trees (BDT), and multivariable logistic regression (MLR). Performance of all models was evaluated using area under the curve (AUC). Results For some of the models (ANN, MLR and NBC) results were satisfactory (AUC: 0.63–0.76); however, two algorithms (SVM and BDT) showed poor performance with AUCs of ~0.5. Conclusion Clinician's predictions still outperform machine learning due to the complexity of the data and contextual information that may not be captured in clinical data used as input for the development of the machine learning algorithms. Inclusion of preprocessing steps in future studies may improve the performance of prediction models. PMID:25419493

  6. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations.

    PubMed

    Torkzaban, Bahareh; Kayvanjoo, Amir Hossein; Ardalan, Arman; Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations. PMID:26599001

  7. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations

    PubMed Central

    Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two ‘4-targeted’ and ‘16-targeted’ experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations. PMID:26599001

  8. The cerebellum: a neuronal learning machine?

    NASA Technical Reports Server (NTRS)

    Raymond, J. L.; Lisberger, S. G.; Mauk, M. D.

    1996-01-01

    Comparison of two seemingly quite different behaviors yields a surprisingly consistent picture of the role of the cerebellum in motor learning. Behavioral and physiological data about classical conditioning of the eyelid response and motor learning in the vestibulo-ocular reflex suggests that (i) plasticity is distributed between the cerebellar cortex and the deep cerebellar nuclei; (ii) the cerebellar cortex plays a special role in learning the timing of movement; and (iii) the cerebellar cortex guides learning in the deep nuclei, which may allow learning to be transferred from the cortex to the deep nuclei. Because many of the similarities in the data from the two systems typify general features of cerebellar organization, the cerebellar mechanisms of learning in these two systems may represent principles that apply to many motor systems.

  9. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    PubMed

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  10. Predicting Market Impact Costs Using Nonparametric Machine Learning Models

    PubMed Central

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  11. Machine Learning Search for Gamma-Ray Burst Afterglows in Optical Images

    NASA Astrophysics Data System (ADS)

    Topinka, M.

    2016-06-01

    Thanks to the advances in robotic telescopes, time domain astronomy leads to a large number of transient events detected in images every night. Data mining and machine learning tools used for object classification are presented. The goal is to automatically classify transient events for both further follow-up by a larger telescope and for statistical studies of transient events. Special attention is given to the identification of gamma-ray burst afterglows. Machine learning techniques are used to identify GROND gamma-ray burst afterglow among the astrophysical objects present in the SDSS archival images based on the g'-r', r'-i' and i'-z' color indices. The performance of the support vector machine, random forest and neural network algorithms is compared. A joint meta-classifier, built on top of the individual classifiers, can identify GRB afterglows with the overall accuracy of ≳ 90%.

  12. Risk prediction with machine learning and regression methods.

    PubMed

    Steyerberg, Ewout W; van der Ploeg, Tjeerd; Van Calster, Ben

    2014-07-01

    This is a discussion of issues in risk prediction based on the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler. PMID:24615859

  13. RECONCILE: a machine-learning coreference resolution system

    2007-12-10

    RECONCILE is a noun phrase conference resolution system: it identifies noun phrases in a text document and determines which subsets refer to each real world entity referenced in the text. The heart of the system is a combination of supervised and unsupervised machine learning systems. It uses a machine learning algorithm (chosen from an extensive suite, including Weka) for training noun phrase coreference classifier models and implements a variety of clustering algorithms to coordinate themore » pairwise classifications. A number of features have been implemented, including all of the features employed in Ng & Cardie [2002].« less

  14. 3D Visualization of Machine Learning Algorithms with Astronomical Data

    NASA Astrophysics Data System (ADS)

    Kent, Brian R.

    2016-01-01

    We present innovative machine learning (ML) methods using unsupervised clustering with minimum spanning trees (MSTs) to study 3D astronomical catalogs. Utilizing Python code to build trees based on galaxy catalogs, we can render the results with the visualization suite Blender to produce interactive 360 degree panoramic videos. The catalogs and their ML results can be explored in a 3D space using mobile devices, tablets or desktop browsers. We compare the statistics of the MST results to a number of machine learning methods relating to optimization and efficiency.

  15. Energy landscapes for a machine learning application to series data

    NASA Astrophysics Data System (ADS)

    Ballard, Andrew J.; Stevenson, Jacob D.; Das, Ritankar; Wales, David J.

    2016-03-01

    Methods developed to explore and characterise potential energy landscapes are applied to the corresponding landscapes obtained from optimisation of a cost function in machine learning. We consider neural network predictions for the outcome of local geometry optimisation in a triatomic cluster, where four distinct local minima exist. The accuracy of the predictions is compared for fits using data from single and multiple points in the series of atomic configurations resulting from local geometry optimisation and for alternative neural networks. The machine learning solution landscapes are visualised using disconnectivity graphs, and signatures in the effective heat capacity are analysed in terms of distributions of local minima and their properties.

  16. Task 8.6 -- Advanced man machine interface (MMI)

    SciTech Connect

    1997-12-31

    The Solar/DOE ATS engine program seeks to improve the utilization of turbomachinery resources through the development of an Advanced Man Machine Interface (MMI). The program goals include timely and succinct feedback to the operations personnel to enhance their decision making process. As part of the Solar ATS Phase 2 technology development program, enabling technologies, including graphics environments, communications technology, and operating systems were explored to determine their viability to support the overall MMI requirements. This report discusses the research and prototyping effort, as well as the conclusions reached.

  17. Learning Activity Packets for Grinding Machines. Unit I--Grinding Machines.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the first unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  18. Refining fuzzy logic controllers with machine learning

    NASA Technical Reports Server (NTRS)

    Berenji, Hamid R.

    1994-01-01

    In this paper, we describe the GARIC (Generalized Approximate Reasoning-Based Intelligent Control) architecture, which learns from its past performance and modifies the labels in the fuzzy rules to improve performance. It uses fuzzy reinforcement learning which is a hybrid method of fuzzy logic and reinforcement learning. This technology can simplify and automate the application of fuzzy logic control to a variety of systems. GARIC has been applied in simulation studies of the Space Shuttle rendezvous and docking experiments. It has the potential of being applied in other aerospace systems as well as in consumer products such as appliances, cameras, and cars.

  19. Outsmarting neural networks: an alternative paradigm for machine learning

    SciTech Connect

    Protopopescu, V.; Rao, N.S.V.

    1996-10-01

    We address three problems in machine learning, namely: (i) function learning, (ii) regression estimation, and (iii) sensor fusion, in the Probably and Approximately Correct (PAC) framework. We show that, under certain conditions, one can reduce the three problems above to the regression estimation. The latter is usually tackled with artificial neural networks (ANNs) that satisfy the PAC criteria, but have high computational complexity. We propose several computationally efficient PAC alternatives to ANNs to solve the regression estimation. Thereby we also provide efficient PAC solutions to the function learning and sensor fusion problems. The approach is based on cross-fertilizing concepts and methods from statistical estimation, nonlinear algorithms, and the theory of computational complexity, and is designed as part of a new, coherent paradigm for machine learning.

  20. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution

    PubMed Central

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis. PMID:26958271

  1. Probability and Statistics in Astronomical Machine Learning and Data Minin

    NASA Astrophysics Data System (ADS)

    Scargle, Jeffrey

    2012-03-01

    Statistical issues peculiar to astronomy have implications for machine learning and data mining. It should be obvious that statistics lies at the heart of machine learning and data mining. Further it should be no surprise that the passive observational nature of astronomy, the concomitant lack of sampling control, and the uniqueness of its realm (the whole universe!) lead to some special statistical issues and problems. As described in the Introduction to this volume, data analysis technology is largely keeping up with major advances in astrophysics and cosmology, even driving many of them. And I realize that there are many scientists with good statistical knowledge and instincts, especially in the modern era I like to call the Age of Digital Astronomy. Nevertheless, old impediments still lurk, and the aim of this chapter is to elucidate some of them. Many experiences with smart people doing not-so-smart things (cf. the anecdotes collected in the Appendix here) have convinced me that the cautions given here need to be emphasized. Consider these four points: 1. Data analysis often involves searches of many cases, for example, outcomes of a repeated experiment, for a feature of the data. 2. The feature comprising the goal of such searches may not be defined unambiguously until the search is carried out, or perhaps vaguely even then. 3. The human visual system is very good at recognizing patterns in noisy contexts. 4. People are much easier to convince of something they want to believe, or already believe, as opposed to unpleasant or surprising facts. One can argue that all four are good things during the initial, exploratory phases of most data analysis. They represent the curiosity and creativity of the scientific process, especially during the exploration of data collections from new observational programs such as all-sky surveys in wavelengths not accessed before or sets of images of a planetary surface not yet explored. On the other hand, confirmatory scientific

  2. PDT: Photometric DeTrending Algorithm Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Kim, Dae-Won

    2016-05-01

    PDT removes systematic trends in light curves. It finds clusters of light curves that are highly correlated using machine learning, constructs one master trend per cluster and detrends an individual light curve using the constructed master trends by minimizing residuals while constraining coefficients to be positive.

  3. Machine learning techniques for fault isolation and sensor placement

    NASA Technical Reports Server (NTRS)

    Carnes, James R.; Fisher, Douglas H.

    1993-01-01

    Fault isolation and sensor placement are vital for monitoring and diagnosis. A sensor conveys information about a system's state that guides troubleshooting if problems arise. We are using machine learning methods to uncover behavioral patterns over snapshots of system simulations that will aid fault isolation and sensor placement, with an eye towards minimality, fault coverage, and noise tolerance.

  4. Machine learning of fault characteristics from rocket engine simulation data

    NASA Technical Reports Server (NTRS)

    Ke, Min; Ali, Moonis

    1990-01-01

    Transformation of data into knowledge through conceptual induction has been the focus of our research described in this paper. We have developed a Machine Learning System (MLS) to analyze the rocket engine simulation data. MLS can provide to its users fault analysis, characteristics, and conceptual descriptions of faults, and the relationships of attributes and sensors. All the results are critically important in identifying faults.

  5. Acquiring Software Design Schemas: A Machine Learning Perspective

    NASA Technical Reports Server (NTRS)

    Harandi, Mehdi T.; Lee, Hing-Yan

    1991-01-01

    In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.

  6. Relative optical navigation around small bodies via Extreme Learning Machine

    NASA Astrophysics Data System (ADS)

    Law, Andrew M.

    To perform close proximity operations under a low-gravity environment, relative and absolute positions are vital information to the maneuver. Hence navigation is inseparably integrated in space travel. Extreme Learning Machine (ELM) is presented as an optical navigation method around small celestial bodies. Optical Navigation uses visual observation instruments such as a camera to acquire useful data and determine spacecraft position. The required input data for operation is merely a single image strip and a nadir image. ELM is a machine learning Single Layer feed-Forward Network (SLFN), a type of neural network (NN). The algorithm is developed on the predicate that input weights and biases can be randomly assigned and does not require back-propagation. The learned model is the output layer weights which are used to calculate a prediction. Together, Extreme Learning Machine Optical Navigation (ELM OpNav) utilizes optical images and ELM algorithm to train the machine to navigate around a target body. In this thesis the asteroid, Vesta, is the designated celestial body. The trained ELMs estimate the position of the spacecraft during operation with a single data set. The results show the approach is promising and potentially suitable for on-board navigation.

  7. Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆

    PubMed Central

    Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-01-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969

  8. AstroML: Python-powered Machine Learning for Astronomy

    NASA Astrophysics Data System (ADS)

    Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.

    2014-01-01

    As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.

  9. Effects of Plasma Transfusion on Perioperative Bleeding Complications: A Machine Learning Approach

    PubMed Central

    Ngufor, Che; Murphree, Dennis; Upadhyaya, Sudhindra; Madde, Nageswar; Kor, Daryl; Pathak, Jyotishman

    2016-01-01

    Perioperative bleeding (PB) is associated with increased patient morbidity and mortality, and results in substantial health care resource utilization. To assess bleeding risk, a routine practice in most centers is to use indicators such as elevated values of the International Normalized Ratio (INR). For patients with elevated INR, the routine therapy option is plasma transfusion. However, the predictive accuracy of INR and the value of plasma transfusion still remains unclear. Accurate methods are therefore needed to identify early the patients with increased risk of bleeding. The goal of this work is to apply advanced machine learning methods to study the relationship between preoperative plasma transfusion (PPT) and PB in patients with elevated INR undergoing noncardiac surgery. The problem is cast under the framework of causal inference where robust meaningful measures to quantify the effect of PPT on PB are estimated. Results show that both machine learning and standard statistical methods generally agree that PPT negatively impacts PB and other important patient outcomes. However, machine learning methods show significant results, and machine learning boosting methods are found to make less errors in predicting PB. PMID:26262146

  10. Health Informatics via Machine Learning for the Clinical Management of Patients

    PubMed Central

    Niehaus, K. E.; Charlton, P.; Colopy, G. W.

    2015-01-01

    Summary Objectives To review how health informatics systems based on machine learning methods have impacted the clinical management of patients, by affecting clinical practice. Methods We reviewed literature from 2010-2015 from databases such as Pubmed, IEEE xplore, and INSPEC, in which methods based on machine learning are likely to be reported. We bring together a broad body of literature, aiming to identify those leading examples of health informatics that have advanced the methodology of machine learning. While individual methods may have further examples that might be added, we have chosen some of the most representative, informative exemplars in each case. Results Our survey highlights that, while much research is taking place in this high-profile field, examples of those that affect the clinical management of patients are seldom found. We show that substantial progress is being made in terms of methodology, often by data scientists working in close collaboration with clinical groups. Conclusions Health informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale. PMID:26293849

  11. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines

    PubMed Central

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  12. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines.

    PubMed

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  13. A 128-Channel Extreme Learning Machine-Based Neural Decoder for Brain Machine Interfaces.

    PubMed

    Chen, Yi; Yao, Enyi; Basu, Arindam

    2016-06-01

    Currently, state-of-the-art motor intention decoding algorithms in brain-machine interfaces are mostly implemented on a PC and consume significant amount of power. A machine learning coprocessor in 0.35- μm CMOS for the motor intention decoding in the brain-machine interfaces is presented in this paper. Using Extreme Learning Machine algorithm and low-power analog processing, it achieves an energy efficiency of 3.45 pJ/MAC at a classification rate of 50 Hz. The learning in second stage and corresponding digitally stored coefficients are used to increase robustness of the core analog processor. The chip is verified with neural data recorded in monkey finger movements experiment, achieving a decoding accuracy of 99.3% for movement type. The same coprocessor is also used to decode time of movement from asynchronous neural spikes. With time-delayed feature dimension enhancement, the classification accuracy can be increased by 5% with limited number of input channels. Further, a sparsity promoting training scheme enables reduction of number of programmable weights by ≈ 2X. PMID:26672048

  14. Efficiently Ranking Hyphotheses in Machine Learning

    NASA Technical Reports Server (NTRS)

    Chien, Steve

    1997-01-01

    This paper considers the problem of learning the ranking of a set of alternatives based upon incomplete information (e.g. a limited number of observations). At each decision cycle, the system can output a complete ordering on the hypotheses or decide to gather additional information (e.g. observation) at some cost.

  15. Committee of machine learning predictors of hydrological models uncertainty

    NASA Astrophysics Data System (ADS)

    Kayastha, Nagendra; Solomatine, Dimitri

    2014-05-01

    In prediction of uncertainty based on machine learning methods, the results of various sampling schemes namely, Monte Carlo sampling (MCS), generalized likelihood uncertainty estimation (GLUE), Markov chain Monte Carlo (MCMC), shuffled complex evolution metropolis algorithm (SCEMUA), differential evolution adaptive metropolis (DREAM), particle swarm optimization (PSO) and adaptive cluster covering (ACCO)[1] used to build a predictive models. These models predict the uncertainty (quantiles of pdf) of a deterministic output from hydrological model [2]. Inputs to these models are the specially identified representative variables (past events precipitation and flows). The trained machine learning models are then employed to predict the model output uncertainty which is specific for the new input data. For each sampling scheme three machine learning methods namely, artificial neural networks, model tree, locally weighted regression are applied to predict output uncertainties. The problem here is that different sampling algorithms result in different data sets used to train different machine learning models which leads to several models (21 predictive uncertainty models). There is no clear evidence which model is the best since there is no basis for comparison. A solution could be to form a committee of all models and to sue a dynamic averaging scheme to generate the final output [3]. This approach is applied to estimate uncertainty of streamflows simulation from a conceptual hydrological model HBV in the Nzoia catchment in Kenya. [1] N. Kayastha, D. L. Shrestha and D. P. Solomatine. Experiments with several methods of parameter uncertainty estimation in hydrological modeling. Proc. 9th Intern. Conf. on Hydroinformatics, Tianjin, China, September 2010. [2] D. L. Shrestha, N. Kayastha, and D. P. Solomatine, and R. Price. Encapsulation of parameteric uncertainty statistics by various predictive machine learning models: MLUE method, Journal of Hydroinformatic, in press

  16. Stacking for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-08-01

    We present an analysis of a general machine learning technique called `stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We show how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organizing maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9 per cent and 21 per cent on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4 per cent and 2.5 per cent for the explored metrics and comes at almost no additional computational cost.

  17. Stacking for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Zitlau, Roman; Hoyle, Ben; Paech, Kerstin; Weller, Jochen; Rau, Markus Michael; Seitz, Stella

    2016-08-01

    We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organising maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9% and 21% on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4% and 2.5% for the explored metrics and comes at almost no additional computational cost.

  18. Protein secondary structure prediction using logic-based machine learning.

    PubMed

    Muggleton, S; King, R D; Sternberg, M J

    1992-10-01

    Many attempts have been made to solve the problem of predicting protein secondary structure from the primary sequence but the best performance results are still disappointing. In this paper, the use of a machine learning algorithm which allows relational descriptions is shown to lead to improved performance. The Inductive Logic Programming computer program, Golem, was applied to learning secondary structure prediction rules for alpha/alpha domain type proteins. The input to the program consisted of 12 non-homologous proteins (1612 residues) of known structure, together with a background knowledge describing the chemical and physical properties of the residues. Golem learned a small set of rules that predict which residues are part of the alpha-helices--based on their positional relationships and chemical and physical properties. The rules were tested on four independent non-homologous proteins (416 residues) giving an accuracy of 81% (+/- 2%). This is an improvement, on identical data, over the previously reported result of 73% by King and Sternberg (1990, J. Mol. Biol., 216, 441-457) using the machine learning program PROMIS, and of 72% using the standard Garnier-Osguthorpe-Robson method. The best previously reported result in the literature for the alpha/alpha domain type is 76%, achieved using a neural net approach. Machine learning also has the advantage over neural network and statistical methods in producing more understandable results. PMID:1480619

  19. Combining data mining and machine learning for effective user profiling

    SciTech Connect

    Fawcett, T.; Provost, F.

    1996-12-31

    This paper describes the automatic design of methods for detecting fraudulent behavior. Much of the design is accomplished using a series of machine learning methods. In particular, we combine data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior. Specifically, we use a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls. These indicators are used to create profilers, which then serve as features to a system that combines evidence from multiple profilers to generate high-confidence alarms. Experiments indicate that this automatic approach performs nearly as well as the best hand-tuned methods for detecting fraud.

  20. Research on knowledge representation, machine learning, and knowledge acquisition

    NASA Technical Reports Server (NTRS)

    Buchanan, Bruce G.

    1987-01-01

    Research in knowledge representation, machine learning, and knowledge acquisition performed at Knowledge Systems Lab. is summarized. The major goal of the research was to develop flexible, effective methods for representing the qualitative knowledge necessary for solving large problems that require symbolic reasoning as well as numerical computation. The research focused on integrating different representation methods to describe different kinds of knowledge more effectively than any one method can alone. In particular, emphasis was placed on representing and using spatial information about three dimensional objects and constraints on the arrangement of these objects in space. Another major theme is the development of robust machine learning programs that can be integrated with a variety of intelligent systems. To achieve this goal, learning methods were designed, implemented and experimented within several different problem solving environments.

  1. Machine learning bandgaps of double perovskites

    PubMed Central

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-01

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance. PMID:26783247

  2. Machine learning bandgaps of double perovskites

    NASA Astrophysics Data System (ADS)

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-01

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance.

  3. Machine learning bandgaps of double perovskites

    NASA Astrophysics Data System (ADS)

    Pilania, Ghanshyam; Mannodi-Kanakkithodi, Arun; Uberuaga, Blas; Ramprasad, Rampi; Gubernatis, James; Lookman, Turab

    The ability to make rapid and accurate predictions of bandgaps for double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps for double perovskites. After evaluating a set of nearly 1.2 million features, we identify several elemental features of the constituent atomic species as the most crucial and relevant predictors. The developed models are validated and tested using the best practices of data science (on a dataset of more than 1300 double perovskite bandgaps) and further analyzed to rationalize their prediction performance. Los Alamos National Laboratory LDRD program and the U.S. Department of Energy, Office of Science, Basic Energy Sciences.

  4. Applying machine learning to electronic form filling

    NASA Astrophysics Data System (ADS)

    Hermens, Leonard A.; Schlimmer, Jeffrey C.

    1993-03-01

    Forms of all types are used in businesses and government agencies and most of them are filled in by hand. Yet much time and effort has been expended to automate form-filling by programming specific systems on computers. The high cost of programmers and other resources prohibits many organizations from benefitting from efficient office automation. A learning apprentice can be used for such repetitious form-filling tasks. In this paper, we establish the need for learning apprentices, describe a framework for such a system, explain the difficulties of form-filling, and present empirical results of a form-filling system used in our department from September 1991 to April 1992. The form-filling apprentice saves up to 84% in keystroke effort and correctly predicts nearly 90% of the values on the form.

  5. Machine learning bandgaps of double perovskites

    DOE PAGESBeta

    Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T.

    2016-01-19

    The ability to make rapid and accurate predictions on bandgaps of double perovskites is of much practical interest for a range of applications. While quantum mechanical computations for high-fidelity bandgaps are enormously computation-time intensive and thus impractical in high throughput studies, informatics-based statistical learning approaches can be a promising alternative. Here we demonstrate a systematic feature-engineering approach and a robust learning framework for efficient and accurate predictions of electronic bandgaps of double perovskites. After evaluating a set of more than 1.2 million features, we identify lowest occupied Kohn-Sham levels and elemental electronegativities of the constituent atomic species as the mostmore » crucial and relevant predictors. As a result, the developed models are validated and tested using the best practices of data science and further analyzed to rationalize their prediction performance.« less

  6. Intracortical Brain-Machine Interfaces Advance Sensorimotor Neuroscience

    PubMed Central

    Schroeder, Karen E.; Chestek, Cynthia A.

    2016-01-01

    Brain-machine interfaces (BMIs) decode brain activity to control external devices. Over the past two decades, the BMI community has grown tremendously and reached some impressive milestones, including the first human clinical trials using chronically implanted intracortical electrodes. It has also contributed experimental paradigms and important findings to basic neuroscience. In this review, we discuss neuroscience achievements stemming from BMI research, specifically that based upon upper limb prosthetic control with intracortical microelectrodes. We will focus on three main areas: first, we discuss progress in neural coding of reaches in motor cortex, describing recent results linking high dimensional representations of cortical activity to muscle activation. Next, we describe recent findings on learning and plasticity in motor cortex on various time scales. Finally, we discuss how bidirectional BMIs have led to better understanding of somatosensation in and related to motor cortex. PMID:27445663

  7. Advanced Training Technologies and Learning Environments

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K. (Compiler); Malone, John B. (Compiler)

    1999-01-01

    This document contains the proceedings of the Workshop on Advanced Training Technologies and Learning Environments held at NASA Langley Research Center, Hampton, Virginia, March 9-10, 1999. The workshop was jointly sponsored by the University of Virginia's Center for Advanced Computational Technology and NASA. Workshop attendees were from NASA, other government agencies, industry, and universities. The objective of the workshop was to assess the status and effectiveness of different advanced training technologies and learning environments.

  8. Revisiting Warfarin Dosing Using Machine Learning Techniques

    PubMed Central

    Sharabiani, Ashkan; Bress, Adam; Douzali, Elnaz; Darabi, Houshang

    2015-01-01

    Determining the appropriate dosage of warfarin is an important yet challenging task. Several prediction models have been proposed to estimate a therapeutic dose for patients. The models are either clinical models which contain clinical and demographic variables or pharmacogenetic models which additionally contain the genetic variables. In this paper, a new methodology for warfarin dosing is proposed. The patients are initially classified into two classes. The first class contains patients who require doses of >30 mg/wk and the second class contains patients who require doses of ≤30 mg/wk. This phase is performed using relevance vector machines. In the second phase, the optimal dose for each patient is predicted by two clinical regression models that are customized for each class of patients. The prediction accuracy of the model was 11.6 in terms of root mean squared error (RMSE) and 8.4 in terms of mean absolute error (MAE). This was 15% and 5% lower than IWPC and Gage models (which are the most widely used models in practice), respectively, in terms of RMSE. In addition, the proposed model was compared with fixed-dose approach of 35 mg/wk, and the model proposed by Sharabiani et al. and its outperformance were proved in terms of both MAE and RMSE. PMID:26146514

  9. Learning by Design: Good Video Games as Learning Machines

    ERIC Educational Resources Information Center

    Gee, James Paul

    2005-01-01

    This article asks how good video and computer game designers manage to get new players to learn long, complex and difficult games. The short answer is that designers of good games have hit on excellent methods for getting people to learn and to enjoy learning. The longer answer is more complex. Integral to this answer are the good principles of…

  10. Machine learning strategies for systems with invariance properties

    SciTech Connect

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    2016-01-01

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.

  11. Machine learning strategies for systems with invariance properties

    NASA Astrophysics Data System (ADS)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  12. Machine learning strategies for systems with invariance properties

    DOE PAGESBeta

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    2016-05-06

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neuralmore » networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.« less

  13. Developing a PLC-friendly state machine model: lessons learned

    NASA Astrophysics Data System (ADS)

    Pessemier, Wim; Deconinck, Geert; Raskin, Gert; Saey, Philippe; Van Winckel, Hans

    2014-07-01

    Modern Programmable Logic Controllers (PLCs) have become an attractive platform for controlling real-time aspects of astronomical telescopes and instruments due to their increased versatility, performance and standardization. Likewise, vendor-neutral middleware technologies such as OPC Unified Architecture (OPC UA) have recently demonstrated that they can greatly facilitate the integration of these industrial platforms into the overall control system. Many practical questions arise, however, when building multi-tiered control systems that consist of PLCs for low level control, and conventional software and platforms for higher level control. How should the PLC software be structured, so that it can rely on well-known programming paradigms on the one hand, and be mapped to a well-organized OPC UA interface on the other hand? Which programming languages of the IEC 61131-3 standard closely match the problem domains of the abstraction levels within this structure? How can the recent additions to the standard (such as the support for namespaces and object-oriented extensions) facilitate a model based development approach? To what degree can our applications already take advantage of the more advanced parts of the OPC UA standard, such as the high expressiveness of the semantic modeling language that it defines, or the support for events, aggregation of data, automatic discovery, ... ? What are the timing and concurrency problems to be expected for the higher level tiers of the control system due to the cyclic execution of control and communication tasks by the PLCs? We try to answer these questions by demonstrating a semantic state machine model that can readily be implemented using IEC 61131 and OPC UA. One that does not aim to capture all possible states of a system, but rather one that attempts to organize the course-grained structure and behaviour of a system. In this paper we focus on the intricacies of this seemingly simple task, and on the lessons that we

  14. Advanced coordinate measuring machine at Sandia National Laboratories/California

    SciTech Connect

    Pilkey, R.D.; Klevgard, P.A.

    1993-03-01

    Sandia National Laboratories/California has acquired a new Moore M-48V CNC five-axis universal coordinate measuring machine (CMM). Site preparation, acceptance testing, and initial performance results are discussed. Unique features of the machine include a ceramic ram and vacuum evacuated laser pathways (VELPS). The implementation of a VELPS system on the machine imposed certain design requirements and entailed certain start-up problems. The machine`s projected capabilities, workload, and research possibilities are outlined.

  15. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework. PMID:25807571

  16. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines.

    PubMed

    Neftci, Emre O; Pedroni, Bruno U; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware. PMID:27445650

  17. Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines

    PubMed Central

    Neftci, Emre O.; Pedroni, Bruno U.; Joshi, Siddharth; Al-Shedivat, Maruan; Cauwenberghs, Gert

    2016-01-01

    Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware. PMID:27445650

  18. Multivariate Mapping of Environmental Data Using Extreme Learning Machines

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2014-05-01

    In most real cases environmental data are multivariate, highly variable at several spatio-temporal scales, and are generated by nonlinear and complex phenomena. Mapping - spatial predictions of such data, is a challenging problem. Machine learning algorithms, being universal nonlinear tools, have demonstrated their efficiency in modelling of environmental spatial and space-time data (Kanevski et al. 2009). Recently, a new approach in machine learning - Extreme Learning Machine (ELM), has gained a great popularity. ELM is a fast and powerful approach being a part of the machine learning algorithm category. Developed by G.-B. Huang et al. (2006), it follows the structure of a multilayer perceptron (MLP) with one single-hidden layer feedforward neural networks (SLFNs). The learning step of classical artificial neural networks, like MLP, deals with the optimization of weights and biases by using gradient-based learning algorithm (e.g. back-propagation algorithm). Opposed to this optimization phase, which can fall into local minima, ELM generates randomly the weights between the input layer and the hidden layer and also the biases in the hidden layer. By this initialization, it optimizes just the weight vector between the hidden layer and the output layer in a single way. The main advantage of this algorithm is the speed of the learning step. In a theoretical context and by growing the number of hidden nodes, the algorithm can learn any set of training data with zero error. To avoid overfitting, cross-validation method or "true validation" (by randomly splitting data into training, validation and testing subsets) are recommended in order to find an optimal number of neurons. With its universal property and solid theoretical basis, ELM is a good machine learning algorithm which can push the field forward. The present research deals with an extension of ELM to multivariate output modelling and application of ELM to the real data case study - pollution of the sediments in

  19. A comparative analysis of support vector machines and extreme learning machines.

    PubMed

    Liu, Xueyi; Gao, Chuanhou; Li, Ping

    2012-09-01

    The theory of extreme learning machines (ELMs) has recently become increasingly popular. As a new learning algorithm for single-hidden-layer feed-forward neural networks, an ELM offers the advantages of low computational cost, good generalization ability, and ease of implementation. Hence the comparison and model selection between ELMs and other kinds of state-of-the-art machine learning approaches has become significant and has attracted many research efforts. This paper performs a comparative analysis of the basic ELMs and support vector machines (SVMs) from two viewpoints that are different from previous works: one is the Vapnik-Chervonenkis (VC) dimension, and the other is their performance under different training sample sizes. It is shown that the VC dimension of an ELM is equal to the number of hidden nodes of the ELM with probability one. Additionally, their generalization ability and computational complexity are exhibited with changing training sample size. ELMs have weaker generalization ability than SVMs for small sample but can generalize as well as SVMs for large sample. Remarkably, great superiority in computational speed especially for large-scale sample problems is found in ELMs. The results obtained can provide insight into the essential relationship between them, and can also serve as complementary knowledge for their past experimental and theoretical comparisons. PMID:22572469

  20. Predicting Coronal Mass Ejections Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Bobra, M. G.; Ilonidis, S.

    2016-04-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections (CMEs). Usually, solar active regions that produce large flares will also produce a CME, but this is not always true. Despite advances in numerical modeling, it is still unclear which circumstances will produce a CME. Therefore, it is worthwhile to empirically determine which features distinguish flares associated with CMEs from flares that are not. At this time, no extensive study has used physically meaningful features of active regions to distinguish between these two populations. As such, we attempt to do so by using features derived from (1) photospheric vector magnetic field data taken by the Solar Dynamics Observatory’s Helioseismic and Magnetic Imager instrument and (2) X-ray flux data from the Geostationary Operational Environmental Satellite’s X-ray Flux instrument. We build a catalog of active regions that either produced both a flare and a CME (the positive class) or simply a flare (the negative class). We then use machine-learning algorithms to (1) determine which features distinguish these two populations, and (2) forecast whether an active region that produces an M- or X-class flare will also produce a CME. We compute the True Skill Statistic, a forecast verification metric, and find that it is a relatively high value of ∼0.8 ± 0.2. We conclude that a combination of six parameters, which are all intensive in nature, will capture most of the relevant information contained in the photospheric magnetic field.

  1. Icing detection from geostationary satellite data using machine learning approaches

    NASA Astrophysics Data System (ADS)

    Lee, J.; Ha, S.; Sim, S.; Im, J.

    2015-12-01

    Icing can cause a significant structural damage to aircraft during flight, resulting in various aviation accidents. Icing studies have been typically performed using two approaches: one is a numerical model-based approach and the other is a remote sensing-based approach. The model based approach diagnoses aircraft icing using numerical atmospheric parameters such as temperature, relative humidity, and vertical thermodynamic structure. This approach tends to over-estimate icing according to the literature. The remote sensing-based approach typically uses meteorological satellite/ground sensor data such as Geostationary Operational Environmental Satellite (GOES) and Dual-Polarization radar data. This approach detects icing areas by applying thresholds to parameters such as liquid water path and cloud optical thickness derived from remote sensing data. In this study, we propose an aircraft icing detection approach which optimizes thresholds for L1B bands and/or Cloud Optical Thickness (COT) from Communication, Ocean and Meteorological Satellite-Meteorological Imager (COMS MI) and newly launched Himawari-8 Advanced Himawari Imager (AHI) over East Asia. The proposed approach uses machine learning algorithms including decision trees (DT) and random forest (RF) for optimizing thresholds of L1B data and/or COT. Pilot Reports (PIREPs) from South Korea and Japan were used as icing reference data. Results show that RF produced a lower false alarm rate (1.5%) and a higher overall accuracy (98.8%) than DT (8.5% and 75.3%), respectively. The RF-based approach was also compared with the existing COMS MI and GOES-R icing mask algorithms. The agreements of the proposed approach with the existing two algorithms were 89.2% and 45.5%, respectively. The lower agreement with the GOES-R algorithm was possibly due to the high uncertainty of the cloud phase product from COMS MI.

  2. The ligand binding mechanism to purine nucleoside phosphorylase elucidated via molecular dynamics and machine learning

    PubMed Central

    Decherchi, Sergio; Berteotti, Anna; Bottegoni, Giovanni; Rocchia, Walter; Cavalli, Andrea

    2015-01-01

    The study of biomolecular interactions between a drug and its biological target is of paramount importance for the design of novel bioactive compounds. In this paper, we report on the use of molecular dynamics (MD) simulations and machine learning to study the binding mechanism of a transition state analogue (DADMe–immucillin-H) to the purine nucleoside phosphorylase (PNP) enzyme. Microsecond-long MD simulations allow us to observe several binding events, following different dynamical routes and reaching diverse binding configurations. These simulations are used to estimate kinetic and thermodynamic quantities, such as kon and binding free energy, obtaining a good agreement with available experimental data. In addition, we advance a hypothesis for the slow-onset inhibition mechanism of DADMe–immucillin-H against PNP. Combining extensive MD simulations with machine learning algorithms could therefore be a fruitful approach for capturing key aspects of drug–target recognition and binding. PMID:25625196

  3. Robust airway extraction based on machine learning and minimum spanning tree

    NASA Astrophysics Data System (ADS)

    Inoue, Tsutomu; Kitamura, Yoshiro; Li, Yuanzhong; Ito, Wataru

    2013-02-01

    Recent advances in MDCT have improved the quality of 3D images. Virtual Bronchoscopy has been used before and during the bronchoscopic examination for the biopsy. However, Virtual Bronchoscopy has become widely used only for the examination of proximal airway diseases. The reason is that conventional airway extraction methods often fail to extract peripheral airways with low image contrast. In this paper, we propose a machine learning based method which can improve the extraction robustness remarkably. The method consists of 4 steps. In the first step, we use Hessian analysis to detect as many airway candidates as possible. In the second, false positives are reduced effectively by introducing a machine learning method. In the third, an airway tree is constructed from the airway candidates by utilizing a minimum spanning tree algorithm. In the fourth, we extract airway regions by using Graph cuts. Experimental results evaluated by a standardized evaluation framework show that our method can extract peripheral airways very well.

  4. Machine learning classification of SDSS transient survey images

    NASA Astrophysics Data System (ADS)

    du Buisson, L.; Sivanandam, N.; Bassett, Bruce A.; Smith, M.

    2015-12-01

    We show that multiple machine learning algorithms can match human performance in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS) supernova survey into real objects and artefacts. This is a first step in any transient science pipeline and is currently still done by humans, but future surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate fully machine-enabled solutions. Using features trained from eigenimage analysis (principal component analysis, PCA) of single-epoch g, r and i difference images, we can reach a completeness (recall) of 96 per cent, while only incorrectly classifying at most 18 per cent of artefacts as real objects, corresponding to a precision (purity) of 84 per cent. In general, random forests performed best, followed by the k-nearest neighbour and the SkyNet artificial neural net algorithms, compared to other methods such as naive Bayes and kernel support vector machine. Our results show that PCA-based machine learning can match human success levels and can naturally be extended by including multiple epochs of data, transient colours and host galaxy information which should allow for significant further improvements, especially at low signal-to-noise.

  5. Predicting Methylphenidate Response in ADHD Using Machine Learning Approaches

    PubMed Central

    Kim, Jae-Won; Sharma, Vinod

    2015-01-01

    Background: There are no objective, biological markers that can robustly predict methylphenidate response in attention deficit hyperactivity disorder. This study aimed to examine whether applying machine learning approaches to pretreatment demographic, clinical questionnaire, environmental, neuropsychological, neuroimaging, and genetic information can predict therapeutic response following methylphenidate administration. Methods: The present study included 83 attention deficit hyperactivity disorder youth. At baseline, parents completed the ADHD Rating Scale-IV and Disruptive Behavior Disorder rating scale, and participants undertook the continuous performance test, Stroop color word test, and resting-state functional MRI scans. The dopamine transporter gene, dopamine D4 receptor gene, alpha-2A adrenergic receptor gene (ADRA2A) and norepinephrine transporter gene polymorphisms, and blood lead and urine cotinine levels were also measured. The participants were enrolled in an 8-week, open-label trial of methylphenidate. Four different machine learning algorithms were used for data analysis. Results: Support vector machine classification accuracy was 84.6% (area under receiver operating characteristic curve 0.84) for predicting methylphenidate response. The age, weight, ADRA2A MspI and DraI polymorphisms, lead level, Stroop color word test performance, and oppositional symptoms of Disruptive Behavior Disorder rating scale were identified as the most differentiating subset of features. Conclusions: Our results provide preliminary support to the translational development of support vector machine as an informative method that can assist in predicting treatment response in attention deficit hyperactivity disorder, though further work is required to provide enhanced levels of classification performance. PMID:25964505

  6. Optimizing extreme learning machine for hyperspectral image classification

    NASA Astrophysics Data System (ADS)

    Li, Jiaojiao; Du, Qian; Li, Wei; Li, Yunsong

    2015-01-01

    Extreme learning machine (ELM) is of great interest to the machine learning society due to its extremely simple training step. Its performance sensitivity to the number of hidden neurons is studied under the context of hyperspectral remote sensing image classification. An empirical linear relationship between the number of training samples and the number of hidden neurons is proposed. Such a relationship can be easily estimated with two small training sets and extended to large training sets to greatly reduce computational cost. The kernel version of ELM (KELM) is also implemented with the radial basis function kernel, and such a linear relationship is still suitable. The experimental results demonstrated that when the number of hidden neurons is appropriate, the performance of ELM may be slightly lower than the linear SVM, but the performance of KELM can be comparable to the kernel version of SVM (KSVM). The computational cost of ELM and KELM is much lower than that of the linear SVM and KSVM, respectively.

  7. Survey of Machine Learning Methods for Database Security

    NASA Astrophysics Data System (ADS)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  8. Introduction to machine learning: k-nearest neighbors

    PubMed Central

    2016-01-01

    Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance. PMID:27386492

  9. Closure modeling using field inversion and machine learning

    NASA Astrophysics Data System (ADS)

    Duraisamy, Karthik

    2015-11-01

    The recent acceleration in computational power and measurement resolution has made possible the availability of extreme scale simulations and data sets. In this work, a modeling paradigm that seeks to comprehensively harness large scale data is introduced, with the aim of improving closure models. Full-field inversion (in contrast to parameter estimation) is used to obtain corrective, spatially distributed functional terms, offering a route to directly address model-form errors. Once the inference has been performed over a number of problems that are representative of the deficient physics in the closure model, machine learning techniques are used to reconstruct the model corrections in terms of variables that appear in the closure model. These machine-learned functional forms are then used to augment the closure model in predictive computations. The approach is demonstrated to be able to successfully reconstruct functional corrections and yield predictions with quantified uncertainties in a range of turbulent flows.

  10. Robust Extreme Learning Machine With its Application to Indoor Positioning.

    PubMed

    Lu, Xiaoxuan; Zou, Han; Zhou, Hongming; Xie, Lihua; Huang, Guang-Bin

    2016-01-01

    The increasing demands of location-based services have spurred the rapid development of indoor positioning system and indoor localization system interchangeably (IPSs). However, the performance of IPSs suffers from noisy measurements. In this paper, two kinds of robust extreme learning machines (RELMs), corresponding to the close-to-mean constraint, and the small-residual constraint, have been proposed to address the issue of noisy measurements in IPSs. Based on whether the feature mapping in extreme learning machine is explicit, we respectively provide random-hidden-nodes and kernelized formulations of RELMs by second order cone programming. Furthermore, the computation of the covariance in feature space is discussed. Simulations and real-world indoor localization experiments are extensively carried out and the results demonstrate that the proposed algorithms can not only improve the accuracy and repeatability, but also reduce the deviation and worst case error of IPSs compared with other baseline algorithms. PMID:26684258

  11. Introduction to machine learning: k-nearest neighbors.

    PubMed

    Zhang, Zhongheng

    2016-06-01

    Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance. PMID:27386492

  12. Stochastic Local Interaction (SLI) model: Bridging machine learning and geostatistics

    NASA Astrophysics Data System (ADS)

    Hristopulos, Dionissios T.

    2015-12-01

    Machine learning and geostatistics are powerful mathematical frameworks for modeling spatial data. Both approaches, however, suffer from poor scaling of the required computational resources for large data applications. We present the Stochastic Local Interaction (SLI) model, which employs a local representation to improve computational efficiency. SLI combines geostatistics and machine learning with ideas from statistical physics and computational geometry. It is based on a joint probability density function defined by an energy functional which involves local interactions implemented by means of kernel functions with adaptive local kernel bandwidths. SLI is expressed in terms of an explicit, typically sparse, precision (inverse covariance) matrix. This representation leads to a semi-analytical expression for interpolation (prediction), which is valid in any number of dimensions and avoids the computationally costly covariance matrix inversion.

  13. Explanatory approach for evaluation of machine learning-induced knowledge.

    PubMed

    Zorman, Milan; Verlic, M

    2009-01-01

    Progress in biomedical research has resulted in an explosive growth of data. Use of the world wide web for sharing data has opened up possibilities for exhaustive data mining analysis. Symbolic machine learning approaches used in data mining, especially ensemble approaches, produce large sets of patterns that need to be evaluated. Manual evaluation of all patterns by a human expert is almost impossible. We propose a new approach to the evaluation of machine learning-induced knowledge by introducing a pre-evaluation step. Pre-evaluation is the automatic evaluation of patterns obtained from the data mining phase, using text mining techniques and sentiment analysis. It is used as a filter for patterns according to the support found in online resources, such as publicly-available repositories of scientific papers and reports related to the problem. The domain expert can then more easily distinguish between patterns or rules that are potential candidates for new knowledge. PMID:19930862

  14. Prototype Vector Machine for Large Scale Semi-Supervised Learning

    SciTech Connect

    Zhang, Kai; Kwok, James T.; Parvin, Bahram

    2009-04-29

    Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.

  15. Application of machine learning to structural molecular biology.

    PubMed

    Sternberg, M J; King, R D; Lewis, R A; Muggleton, S

    1994-06-29

    A technique of machine learning, inductive logic programming implemented in the program GOLEM, has been applied to three problems in structural molecular biology. These problems are: the prediction of protein secondary structure; the identification of rules governing the arrangement of beta-sheets strands in the tertiary folding of proteins; and the modelling of a quantitative structure activity relationship (QSAR) of a series of drugs. For secondary structure prediction and the QSAR, GOLEM yielded predictions comparable with contemporary approaches including neural networks. Rules for beta-strand arrangement are derived and it is planned to contrast their accuracy with those obtained by human inspection. In all three studies GOLEM discovered rules that provided insight into the stereochemistry of the system. We conclude machine learning used together with human intervention will provide a powerful tool to discover patterns in biological sequences and structures. PMID:7800706

  16. Smarter Instruments, Smarter Archives: Machine Learning for Tactical Science

    NASA Astrophysics Data System (ADS)

    Thompson, D. R.; Kiran, R.; Allwood, A.; Altinok, A.; Estlin, T.; Flannery, D.

    2014-12-01

    There has been a growing interest by Earth and Planetary Sciences in machine learning, visualization and cyberinfrastructure to interpret ever-increasing volumes of instrument data. Such tools are commonly used to analyze archival datasets, but they can also play a valuable real-time role during missions. Here we discuss ways that machine learning can benefit tactical science decisions during Earth and Planetary Exploration. Machine learning's potential begins at the instrument itself. Smart instruments endowed with pattern recognition can immediately recognize science features of interest. This allows robotic explorers to optimize their limited communications bandwidth, triaging science products and prioritizing the most relevant data. Smart instruments can also target their data collection on the fly, using principles of experimental design to reduce redundancy and generally improve sampling efficiency for time-limited operations. Moreover, smart instruments can respond immediately to transient or unexpected phenomena. Examples include detections of cometary plumes, terrestrial floods, or volcanism. We show recent examples of smart instruments from 2014 tests including: aircraft and spacecraft remote sensing instruments that recognize cloud contamination, field tests of a "smart camera" for robotic surface geology, and adaptive data collection by X-Ray fluorescence spectrometers. Machine learning can also assist human operators when tactical decision making is required. Terrestrial scenarios include airborne remote sensing, where the decision to re-fly a transect must be made immediately. Planetary scenarios include deep space encounters or planetary surface exploration, where the number of command cycles is limited and operators make rapid daily decisions about where next to collect measurements. Visualization and modeling can reveal trends, clusters, and outliers in new data. This can help operators recognize instrument artifacts or spot anomalies in real time

  17. Machine-learning-assisted materials discovery using failed experiments

    NASA Astrophysics Data System (ADS)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic–organic hybrid materials such as organically templated metal oxides, metal–organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure–property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully

  18. Machine-learning-assisted materials discovery using failed experiments.

    PubMed

    Raccuglia, Paul; Elbert, Katherine C; Adler, Philip D F; Falk, Casey; Wenny, Malia B; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A; Schrier, Joshua; Norquist, Alexander J

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions--failed or unsuccessful hydrothermal syntheses--collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions

  19. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  20. Machine Learning for Flood Prediction in Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.

    2015-12-01

    With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.

  1. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-10-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.

  2. Machine Shop Suggested Job and Task Sheets. Part II. 21 Advanced Jobs.

    ERIC Educational Resources Information Center

    Texas A and M Univ., College Station. Vocational Instructional Services.

    This volume consists of advanced job and task sheets adaptable for use in the regular vocational industrial education programs for the training of machinists and machine shop operators. Twenty-one advanced machine shop job sheets are included. Some or all of this material is provided for each job: an introductory sheet with aim, checking…

  3. Machine Learning Techniques in Optimal Design

    NASA Technical Reports Server (NTRS)

    Cerbone, Giuseppe

    1992-01-01

    to the problem, is then obtained by solving in parallel each of the sub-problems in the set and computing the one with the minimum cost. In addition to speeding up the optimization process, our use of learning methods also relieves the expert from the burden of identifying rules that exactly pinpoint optimal candidate sub-problems. In real engineering tasks it is usually too costly to the engineers to derive such rules. Therefore, this paper also contributes to a further step towards the solution of the knowledge acquisition bottleneck [Feigenbaum, 1977] which has somewhat impaired the construction of rulebased expert systems.

  4. Some Principles of Learning and Learning with the Aid of Machines.

    ERIC Educational Resources Information Center

    Dolyatovskii, V. A.; Sotnikov, E. M.

    A translated Soviet document describes some theories of learning, and the practical problems of developing a teaching machine--as taught in an Industrial Electronics course (in the automation and telemechanics curriculum). The point is stressed that the growing number of students at institutions of higher learning in the Soviet Union, up forty…

  5. Advanced coordinate measuring machine at Sandia National Laboratories/California

    SciTech Connect

    Pilkey, R.D.; Klevgard, P.A.

    1993-03-01

    Sandia National Laboratories/California has acquired a new Moore M-48V CNC five-axis universal coordinate measuring machine (CMM). Site preparation, acceptance testing, and initial performance results are discussed. Unique features of the machine include a ceramic ram and vacuum evacuated laser pathways (VELPS). The implementation of a VELPS system on the machine imposed certain design requirements and entailed certain start-up problems. The machine's projected capabilities, workload, and research possibilities are outlined.

  6. Security Aspects of Smart Cards vs. Embedded Security in Machine-to-Machine (M2M) Advanced Mobile Network Applications

    NASA Astrophysics Data System (ADS)

    Meyerstein, Mike; Cha, Inhyok; Shah, Yogendra

    The Third Generation Partnership Project (3GPP) standardisation group currently discusses advanced applications of mobile networks such as Machine-to-Machine (M2M) communication. Several security issues arise in these contexts which warrant a fresh look at mobile networks’ security foundations, resting on smart cards. This paper contributes a security/efficiency analysis to this discussion and highlights the role of trusted platform technology to approach these issues.

  7. Machine-z: rapid machine-learned redshift indicator for Swift gamma-ray bursts

    NASA Astrophysics Data System (ADS)

    Ukwatta, T. N.; Woźniak, P. R.; Gehrels, N.

    2016-06-01

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce `machine-z', a redshift prediction algorithm and a `high-z' classifier for Swift GRBs based on machine learning. Our method relies exclusively on canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ˜100 per cent recall. The most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.

  8. Knowledge discovery via machine learning for neurodegenerative disease researchers.

    PubMed

    Ozyurt, I Burak; Brown, Gregory G

    2009-01-01

    Ever-increasing size of the biomedical literature makes more precise information retrieval and tapping into implicit knowledge in scientific literature a necessity. In this chapter, first, three new variants of the expectation-maximization (EM) method for semisupervised document classification (Machine Learning 39:103-134, 2000) are introduced to refine biomedical literature meta-searches. The retrieval performance of a multi-mixture per class EM variant with Agglomerative Information Bottleneck clustering (Slonim and Tishby (1999) Agglomerative information bottleneck. In Proceedings of NIPS-12) using Davies-Bouldin cluster validity index (IEEE Transactions on Pattern Analysis and Machine Intelligence 1:224-227, 1979), rivaled the state-of-the-art transductive support vector machines (TSVM) (Joachims (1999) Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning (ICML)). Moreover, the multi-mixture per class EM variant refined search results more quickly with more than one order of magnitude improvement in execution time compared with TSVM. A second tool, CRFNER, uses conditional random fields (Lafferty et al. (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-2001) to recognize 15 types of named entities from schizophrenia abstracts outperforming ABNER (Settles (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004 International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA)) in biological named entity recognition and reaching F(1) performance of 82.5% on the second set of named entities. PMID:19623491

  9. Mining the Galaxy Zoo Database: Machine Learning Applications

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.

    2010-01-01

    The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.

  10. Automatic pathology classification using a single feature machine learning support - vector machines

    NASA Astrophysics Data System (ADS)

    Yepes-Calderon, Fernando; Pedregosa, Fabian; Thirion, Bertrand; Wang, Yalin; Lepore, Natasha

    2014-03-01

    Magnetic Resonance Imaging (MRI) has been gaining popularity in the clinic in recent years as a safe in-vivo imaging technique. As a result, large troves of data are being gathered and stored daily that may be used as clinical training sets in hospitals. While numerous machine learning (ML) algorithms have been implemented for Alzheimer's disease classification, their outputs are usually difficult to interpret in the clinical setting. Here, we propose a simple method of rapid diagnostic classification for the clinic using Support Vector Machines (SVM)1 and easy to obtain geometrical measurements that, together with a cortical and sub-cortical brain parcellation, create a robust framework capable of automatic diagnosis with high accuracy. On a significantly large imaging dataset consisting of over 800 subjects taken from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, classification-success indexes of up to 99.2% are reached with a single measurement.

  11. Equivalence between learning in noisy perceptrons and tree committee machines

    NASA Astrophysics Data System (ADS)

    Copelli, Mauro; Kinouchi, Osame; Caticha, Nestor

    1996-06-01

    We study learning from single presentation of examples (on-line learning) in single-layer perceptrons and tree committee machines (TCMs). Lower bounds for the perceptron generalization error as a function of the noise level ɛ in the teacher output are calculated. We find that local learning in a TCM with K hidden units is simply related to learning in a simple perceptron with a corresponding noise level ɛ(K). For a large number of examples and finite K the generalization error decays as α-1CM, where αCM is the number of examples per adjustable weight in the TCM. We also show that on-line learning is possible even in the K-->∞ limit, but with the generalization error decaying as α-1/2CM. The simple Hebb rule can also be applied to the TCM, but now the error decays as α-1/2CM for finite K and α-1/4CM for K-->∞. Exponential decay of the generalization error in both the noisy perceptron learning and in the TCM is obtained by using the learning by queries strategy.

  12. Geological applications of machine learning on hyperspectral remote sensing data

    NASA Astrophysics Data System (ADS)

    Tse, C. H.; Li, Yi-liang; Lam, Edmund Y.

    2015-02-01

    The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.

  13. Machine Learning of Protein Interactions in Fungal Secretory Pathways.

    PubMed

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja; Rousu, Juho

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities. PMID:27441920

  14. Machine Learning of Protein Interactions in Fungal Secretory Pathways

    PubMed Central

    Kludas, Jana; Arvas, Mikko; Castillo, Sandra; Pakula, Tiina; Oja, Merja; Brouard, Céline; Jäntti, Jussi; Penttilä, Merja

    2016-01-01

    In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker’s yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities. PMID:27441920

  15. A new machine learning classifier for high dimensional healthcare data.

    PubMed

    Padman, Rema; Bai, Xue; Airoldi, Edoardo M

    2007-01-01

    Data sets with many discrete variables and relatively few cases arise in health care, commerce, information security, and many other domains. Learning effective and efficient prediction models from such data sets is a challenging task. In this paper, we propose a new approach that combines Metaheuristic search and Bayesian Networks to learn a graphical Markov Blanket-based classifier from data. The Tabu Search enhanced Markov Blanket (TS/MB) procedure is based on the use of restricted neighborhoods in a general Bayesian Network constrained by the Markov condition, called Markov Blanket Neighborhoods. Computational results from two real world healthcare data sets indicate that the TS/MB procedure converges fast and is able to find a parsimonious model with substantially fewer predictor variables than in the full data set. Furthermore, it has comparable or better prediction performance when compared against several machine learning methods, and provides insight into possible causal relations among the variables. PMID:17911800

  16. Identifying hosts of families of viruses: a machine learning approach.

    PubMed

    Raj, Anil; Dewar, Michael; Palacios, Gustavo; Rabadan, Raul; Wiggins, Christopher H

    2011-01-01

    Identifying emerging viral pathogens and characterizing their transmission is essential to developing effective public health measures in response to an epidemic. Phylogenetics, though currently the most popular tool used to characterize the likely host of a virus, can be ambiguous when studying species very distant to known species and when there is very little reliable sequence information available in the early stages of the outbreak of disease. Motivated by an existing framework for representing biological sequence information, we learn sparse, tree-structured models, built from decision rules based on subsequences, to predict viral hosts from protein sequence data using popular discriminative machine learning tools. Furthermore, the predictive motifs robustly selected by the learning algorithm are found to show strong host-specificity and occur in highly conserved regions of the viral proteome. PMID:22174744

  17. Mapping of Estimations and Prediction Intervals Using Extreme Learning Machines

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2015-04-01

    Due to the large amount and complexity of data available nowadays in environmental sciences, we face the need to apply more robust methodology allowing analyses and understanding of the phenomena under study. One particular but very important aspect of this understanding is the reliability of generated prediction models. From the data collection to the prediction map, several sources of error can occur and affect the final result. Theses sources are mainly identified as uncertainty in data (data noise), and uncertainty in the model. Their combination leads to the so-called prediction interval. Quantifying these two categories of uncertainty allows a finer understanding of phenomena under study and a better assessment of the prediction accuracy. The present research deals with a methodology combining a machine learning algorithm (ELM - Extreme Learning Machine) with a bootstrap-based procedure. Developed by G.-B. Huang et al. (2006), ELM is an artificial neural network following the structure of a multilayer perceptron (MLP) with one single hidden layer. Compared to classical MLP, ELM has the ability to learn faster without loss of accuracy, and need only one hyper-parameter to be fitted (that is the number of nodes in the hidden layer). The key steps of the proposed method are as following: sample from the original data a variety of subsets using bootstrapping; from these subsets, train and validate ELM models; and compute residuals. Then, the same procedure is performed a second time with only the squared training residuals. Finally, taking into account the two modeling levels allows developing the mean prediction map, the model uncertainty variance, and the data noise variance. The proposed approach is illustrated using geospatial data. References Efron B., and Tibshirani R. 1986, Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical accuracy, Statistical Science, vol. 1: 54-75. Huang G.-B., Zhu Q.-Y., and Siew C.-K. 2006

  18. Predicting outcome in clinically isolated syndrome using machine learning

    PubMed Central

    Wottschel, V.; Alexander, D.C.; Kwok, P.P.; Chard, D.T.; Stromillo, M.L.; De Stefano, N.; Thompson, A.J.; Miller, D.H.; Ciccarelli, O.

    2014-01-01

    We aim to determine if machine learning techniques, such as support vector machines (SVMs), can predict the occurrence of a second clinical attack, which leads to the diagnosis of clinically-definite Multiple Sclerosis (CDMS) in patients with a clinically isolated syndrome (CIS), on the basis of single patient's lesion features and clinical/demographic characteristics. Seventy-four patients at onset of CIS were scanned and clinically reviewed after one and three years. CDMS was used as the gold standard against which SVM classification accuracy was tested. Radiological features related to lesional characteristics on conventional MRI were defined a priori and used in combination with clinical/demographic features in an SVM. Forward recursive feature elimination with 100 bootstraps and a leave-one-out cross-validation was used to find the most predictive feature combinations. 30 % and 44 % of patients developed CDMS within one and three years, respectively. The SVMs correctly predicted the presence (or the absence) of CDMS in 71.4 % of patients (sensitivity/specificity: 77 %/66 %) at 1 year, and in 68 % (60 %/76 %) at 3 years on average over all bootstraps. Combinations of features consistently gave a higher accuracy in predicting outcome than any single feature. Machine-learning-based classifications can be used to provide an “individualised” prediction of conversion to MS from subjects' baseline scans and clinical characteristics, with potential to be incorporated into routine clinical practice. PMID:25610791

  19. Galaxy Image Processing and Morphological Classification Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Kates-Harbeck, Julian

    2012-03-01

    This work uses data from the Sloan Digital Sky Survey (SDSS) and the Galaxy Zoo Project for classification of galaxy morphologies via machine learning. SDSS imaging data together with reliable human classifications from Galaxy Zoo provide the training set and test set for the machine learning architectures. Classification is performed with hand-picked, pre-computed features from SDSS as well as with the raw imaging data from SDSS that was available to humans in the Galaxy Zoo project. With the hand-picked features and a logistic regression classifier, 95.21% classification accuracy and an area under the ROC curve of 0.986 are attained. In the case of the raw imaging data, the images are first processed to remove background noise, image artifacts, and celestial objects other than the galaxy of interest. They are then rotated onto their principle axis of variance to guarantee rotational invariance. The processed images are used to compute color information, up to 4^th order central normalized moments, and radial intensity profiles. These features are used to train a support vector machine with a 3^rd degree polynomial kernel, which achieves a classification accuracy of 95.89% with an ROC area of 0.943.

  20. Machine Learning Approaches: From Theory to Application in Schizophrenia

    PubMed Central

    Veronese, Elisa; Castellani, Umberto; Peruzzo, Denis; Bellani, Marcella; Brambilla, Paolo

    2013-01-01

    In recent years, machine learning approaches have been successfully applied for analysis of neuroimaging data, to help in the context of disease diagnosis. We provide, in this paper, an overview of recent support vector machine-based methods developed and applied in psychiatric neuroimaging for the investigation of schizophrenia. In particular, we focus on the algorithms implemented by our group, which have been applied to classify subjects affected by schizophrenia and healthy controls, comparing them in terms of accuracy results with other recently published studies. First we give a description of the basic terminology used in pattern recognition and machine learning. Then we separately summarize and explain each study, highlighting the main features that characterize each method. Finally, as an outcome of the comparison of the results obtained applying the described different techniques, conclusions are drawn in order to understand how much automatic classification approaches can be considered a useful tool in understanding the biological underpinnings of schizophrenia. We then conclude by discussing the main implications achievable by the application of these methods into clinical practice. PMID:24489603

  1. Big Data and Machine Learning in Plastic Surgery: A New Frontier in Surgical Innovation.

    PubMed

    Kanevsky, Jonathan; Corban, Jason; Gaster, Richard; Kanevsky, Ari; Lin, Samuel; Gilardino, Mirko

    2016-05-01

    Medical decision-making is increasingly based on quantifiable data. From the moment patients come into contact with the health care system, their entire medical history is recorded electronically. Whether a patient is in the operating room or on the hospital ward, technological advancement has facilitated the expedient and reliable measurement of clinically relevant health metrics, all in an effort to guide care and ensure the best possible clinical outcomes. However, as the volume and complexity of biomedical data grow, it becomes challenging to effectively process "big data" using conventional techniques. Physicians and scientists must be prepared to look beyond classic methods of data processing to extract clinically relevant information. The purpose of this article is to introduce the modern plastic surgeon to machine learning and computational interpretation of large data sets. What is machine learning? Machine learning, a subfield of artificial intelligence, can address clinically relevant problems in several domains of plastic surgery, including burn surgery; microsurgery; and craniofacial, peripheral nerve, and aesthetic surgery. This article provides a brief introduction to current research and suggests future projects that will allow plastic surgeons to explore this new frontier of surgical science. PMID:27119951

  2. Machine Learning Assisted Design of Highly Active Peptides for Drug Discovery

    PubMed Central

    Giguère, Sébastien; Laviolette, François; Marchand, Mario; Tremblay, Denise; Moineau, Sylvain; Liang, Xinxia; Biron, Éric; Corbeil, Jacques

    2015-01-01

    The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/. PMID:25849257

  3. Classifying Structures in the ISM with Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, A. A.; Williams, J. P.

    2011-01-01

    The processes which govern molecular cloud evolution and star formation often sculpt structures in the ISM: filaments, pillars, shells, outflows, etc. Because of their morphological complexity, these objects are often identified manually. Manual classification has several disadvantages; the process is subjective, not easily reproducible, and does not scale well to handle increasingly large datasets. We have explored to what extent machine learning algorithms can be trained to autonomously identify specific morphological features in molecular cloud datasets. We show that the Support Vector Machine algorithm can successfully locate filaments and outflows blended with other emission structures. When the objects of interest are morphologically distinct from the surrounding emission, this autonomous classification achieves >90% accuracy. We have developed a set of IDL-based tools to apply this technique to other datasets.

  4. Teaching an Old Log New Tricks with Machine Learning.

    PubMed

    Schnell, Krista; Puri, Colin; Mahler, Paul; Dukatz, Carl

    2014-03-01

    To most people, the log file would not be considered an exciting area in technology today. However, these relatively benign, slowly growing data sources can drive large business transformations when combined with modern-day analytics. Accenture Technology Labs has built a new framework that helps to expand existing vendor solutions to create new methods of gaining insights from these benevolent information springs. This framework provides a systematic and effective machine-learning mechanism to understand, analyze, and visualize heterogeneous log files. These techniques enable an automated approach to analyzing log content in real time, learning relevant behaviors, and creating actionable insights applicable in traditionally reactive situations. Using this approach, companies can now tap into a wealth of knowledge residing in log file data that is currently being collected but underutilized because of its overwhelming variety and volume. By using log files as an important data input into the larger enterprise data supply chain, businesses have the opportunity to enhance their current operational log management solution and generate entirely new business insights-no longer limited to the realm of reactive IT management, but extending from proactive product improvement to defense from attacks. As we will discuss, this solution has immediate relevance in the telecommunications and security industries. However, the most forward-looking companies can take it even further. How? By thinking beyond the log file and applying the same machine-learning framework to other log file use cases (including logistics, social media, and consumer behavior) and any other transactional data source. PMID:27447306

  5. Effective feature selection for image steganalysis using extreme learning machine

    NASA Astrophysics Data System (ADS)

    Feng, Guorui; Zhang, Haiyan; Zhang, Xinpeng

    2014-11-01

    Image steganography delivers secret data by slight modifications of the cover. To detect these data, steganalysis tries to create some features to embody the discrepancy between the cover and steganographic images. Therefore, the urgent problem is how to design an effective classification architecture for given feature vectors extracted from the images. We propose an approach to automatically select effective features based on the well-known JPEG steganographic methods. This approach, referred to as extreme learning machine revisited feature selection (ELM-RFS), can tune input weights in terms of the importance of input features. This idea is derived from cross-validation learning and one-dimensional (1-D) search. While updating input weights, we seek the energy decreasing direction using the leave-one-out (LOO) selection. Furthermore, we optimize the 1-D energy function instead of directly discarding the least significant feature. Since recent Liu features can gain considerable low detection errors compared to a previous JPEG steganalysis, the experimental results demonstrate that the new approach results in less classification error than other classifiers such as SVM, Kodovsky ensemble classifier, direct ELM-LOO learning, kernel ELM, and conventional ELM in Liu features. Furthermore, ELM-RFS achieves a similar performance with a deep Boltzmann machine using less training time.

  6. Machine learning approach for objective inpainting quality assessment

    NASA Astrophysics Data System (ADS)

    Frantc, V. A.; Voronin, V. V.; Marchuk, V. I.; Sherstobitov, A. I.; Agaian, S.; Egiazarian, K.

    2014-05-01

    This paper focuses on a machine learning approach for objective inpainting quality assessment. Inpainting has received a lot of attention in recent years and quality assessment is an important task to evaluate different image reconstruction approaches. Quantitative metrics for successful image inpainting currently do not exist; researchers instead are relying upon qualitative human comparisons in order to evaluate their methodologies and techniques. We present an approach for objective inpainting quality assessment based on natural image statistics and machine learning techniques. Our method is based on observation that when images are properly normalized or transferred to a transform domain, local descriptors can be modeled by some parametric distributions. The shapes of these distributions are different for noninpainted and inpainted images. Approach permits to obtain a feature vector strongly correlated with a subjective image perception by a human visual system. Next, we use a support vector regression learned on assessed by human images to predict perceived quality of inpainted images. We demonstrate how our predicted quality value repeatably correlates with a qualitative opinion in a human observer study.

  7. A machine learning approach to computer-aided molecular design.

    PubMed

    Bolis, G; Di Pace, L; Fabrocini, F

    1991-12-01

    Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one--the specialization step--the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase--the generalization step--the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process. PMID:1818094

  8. Intra-and-Inter Species Biomass Prediction in a Plantation Forest: Testing the Utility of High Spatial Resolution Spaceborne Multispectral RapidEye Sensor and Advanced Machine Learning Algorithms

    PubMed Central

    Dube, Timothy; Mutanga, Onisimo; Adam, Elhadi; Ismail, Riyad

    2014-01-01

    The quantification of aboveground biomass using remote sensing is critical for better understanding the role of forests in carbon sequestration and for informed sustainable management. Although remote sensing techniques have been proven useful in assessing forest biomass in general, more is required to investigate their capabilities in predicting intra-and-inter species biomass which are mainly characterised by non-linear relationships. In this study, we tested two machine learning algorithms, Stochastic Gradient Boosting (SGB) and Random Forest (RF) regression trees to predict intra-and-inter species biomass using high resolution RapidEye reflectance bands as well as the derived vegetation indices in a commercial plantation. The results showed that the SGB algorithm yielded the best performance for intra-and-inter species biomass prediction; using all the predictor variables as well as based on the most important selected variables. For example using the most important variables the algorithm produced an R2 of 0.80 and RMSE of 16.93 t·ha−1 for E. grandis; R2 of 0.79, RMSE of 17.27 t·ha−1 for P. taeda and R2 of 0.61, RMSE of 43.39 t·ha−1 for the combined species data sets. Comparatively, RF yielded plausible results only for E. dunii (R2 of 0.79; RMSE of 7.18 t·ha−1). We demonstrated that although the two statistical methods were able to predict biomass accurately, RF produced weaker results as compared to SGB when applied to combined species dataset. The result underscores the relevance of stochastic models in predicting biomass drawn from different species and genera using the new generation high resolution RapidEye sensor with strategically positioned bands. PMID:25140631

  9. Machine learning for the New York City power grid.

    PubMed

    Rudin, Cynthia; Waltz, David; Anderson, Roger N; Boulanger, Albert; Salleb-Aouissi, Ansaf; Chow, Maggie; Dutta, Haimonti; Gross, Philip N; Huang, Bert; Ierome, Steve; Isaac, Delfina F; Kressner, Arthur; Passonneau, Rebecca J; Radeva, Axinia; Wu, Leon

    2012-02-01

    Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint, terminator, and transformer rankings, 3) feeder Mean Time Between Failure (MTBF) estimates, and 4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or realtime, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City’s electrical grid. PMID:21576741

  10. Kernel-based machine learning techniques for infrasound signal classification

    NASA Astrophysics Data System (ADS)

    Tuma, Matthias; Igel, Christian; Mialle, Pierrick

    2014-05-01

    Infrasound monitoring is one of four remote sensing technologies continuously employed by the CTBTO Preparatory Commission. The CTBTO's infrasound network is designed to monitor the Earth for potential evidence of atmospheric or shallow underground nuclear explosions. Upon completion, it will comprise 60 infrasound array stations distributed around the globe, of which 47 were certified in January 2014. Three stages can be identified in CTBTO infrasound data processing: automated processing at the level of single array stations, automated processing at the level of the overall global network, and interactive review by human analysts. At station level, the cross correlation-based PMCC algorithm is used for initial detection of coherent wavefronts. It produces estimates for trace velocity and azimuth of incoming wavefronts, as well as other descriptive features characterizing a signal. Detected arrivals are then categorized into potentially treaty-relevant versus noise-type signals by a rule-based expert system. This corresponds to a binary classification task at the level of station processing. In addition, incoming signals may be grouped according to their travel path in the atmosphere. The present work investigates automatic classification of infrasound arrivals by kernel-based pattern recognition methods. It aims to explore the potential of state-of-the-art machine learning methods vis-a-vis the current rule-based and task-tailored expert system. To this purpose, we first address the compilation of a representative, labeled reference benchmark dataset as a prerequisite for both classifier training and evaluation. Data representation is based on features extracted by the CTBTO's PMCC algorithm. As classifiers, we employ support vector machines (SVMs) in a supervised learning setting. Different SVM kernel functions are used and adapted through different hyperparameter optimization routines. The resulting performance is compared to several baseline classifiers. All

  11. Machine learning approaches in medical image analysis: From detection to diagnosis.

    PubMed

    de Bruijne, Marleen

    2016-10-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols, learning from weak labels, and interpretation and evaluation of results. PMID:27481324

  12. Applying machine learning classification techniques to automate sky object cataloguing

    NASA Astrophysics Data System (ADS)

    Fayyad, Usama M.; Doyle, Richard J.; Weir, W. Nick; Djorgovski, Stanislav

    1993-08-01

    We describe the application of an Artificial Intelligence machine learning techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Mt. Palomar Northern Sky Survey is nearly completed. This survey provides comprehensive coverage of the northern celestial hemisphere in the form of photographic plates. The plates are being transformed into digitized images whose quality will probably not be surpassed in the next ten to twenty years. The images are expected to contain on the order of 107 galaxies and 108 stars. Astronomers wish to determine which of these sky objects belong to various classes of galaxies and stars. Unfortunately, the size of this data set precludes analysis in an exclusively manual fashion. Our approach is to develop a software system which integrates the functions of independently developed techniques for image processing and data classification. Digitized sky images are passed through image processing routines to identify sky objects and to extract a set of features for each object. These routines are used to help select a useful set of attributes for classifying sky objects. Then GID3 (Generalized ID3) and O-B Tree, two inductive learning techniques, learns classification decision trees from examples. These classifiers will then be applied to new data. These developmnent process is highly interactive, with astronomer input playing a vital role. Astronomers refine the feature set used to construct sky object descriptions, and evaluate the performance of the automated classification technique on new data. This paper gives an overview of the machine learning techniques with an emphasis on their general applicability, describes the details of our specific application, and reports the initial encouraging results. The results indicate that our machine learning approach is well-suited to the problem. The primary benefit of the approach is increased data reduction throughput. Another benefit is

  13. Coordinated machine learning and decision support for situation awareness.

    SciTech Connect

    Draelos, Timothy John; Zhang, Peng-Chu.; Wunsch, Donald C.; Seiffertt, John; Conrad, Gregory N.; Brannon, Nathan Gregory

    2007-09-01

    For applications such as force protection, an effective decision maker needs to maintain an unambiguous grasp of the environment. Opportunities exist to leverage computational mechanisms for the adaptive fusion of diverse information sources. The current research employs neural networks and Markov chains to process information from sources including sensors, weather data, and law enforcement. Furthermore, the system operator's input is used as a point of reference for the machine learning algorithms. More detailed features of the approach are provided, along with an example force protection scenario.

  14. A machine learning classification broker for the LSST transient database

    NASA Astrophysics Data System (ADS)

    Borne, K. D.

    2008-03-01

    We describe the largest data-producing astronomy project in the coming decade - the LSST (Large Synoptic Survey Telescope). The enormous data output, database contents, knowledge discovery, and community science expected from this project will impose massive data challenges on the astronomical research community. One of these challenge areas is the rapid machine learning, data mining, and classification of all novel astronomical events from each 3-gigapixel (6-GB) image obtained every 20 seconds throughout every night for the project duration of 10 years. We describe these challenges and a particular implementation of a classification broker for this data fire hose.

  15. Software Development and Testing for Machine Learning Studies

    NASA Astrophysics Data System (ADS)

    Makino, Takaki; Aihara, Kazuyuki

    It is not easy to test software used in studies of machine learning with statistical frameworks. In particular, software for randomized algorithms such as Monte Carlo methods compromises testing process. Combined with underestimation of the importance of software testing in academic fields, many software programs without appropriate validation are being used and causing problems. In this article, we discuss the importance of writing test codes for software used in research, and present a practical way for testing, focusing on programs using Monte Carlo methods.

  16. Machine Learning and the Starship - A Match Made in Heaven

    NASA Astrophysics Data System (ADS)

    Galea, P.

    The computer control system of an unmanned interstellar craft must deal with a variety of complex problems. For example, upon reaching the destination star, the computer may need to make assessments of the planets and other objects to prioritize the most `interesting', and assign appropriate probes to each. These decisions would normally be regarded as intelligent if they were made by humans. This paper looks at machine learning technologies currently deployed in non-aerospace contexts, such as book recommendation systems, dating websites and social network analysis, and investigates the ways in which they can be adapted for applications in the starship. This paper is a submission of the Project Icarus Study Group.

  17. Machine-z: Rapid machine-learned redshift indicator for Swift gamma-ray bursts

    DOE PAGESBeta

    Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

    2016-06-01

    Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce ‘machine-z’, a redshift prediction algorithm and a ‘high-z’ classifier for Swift GRBs based on machine learning. Our method relies exclusively onmore » canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ~100 per cent recall. As a result, the most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.« less

  18. Semi-supervised and unsupervised extreme learning machines.

    PubMed

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency. PMID:25415946

  19. GeneRIF indexing: sentence selection based on machine learning

    PubMed Central

    2013-01-01

    Background A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. Results We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. Conclusions The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species. PMID:23725347

  20. A survey of machine learning methods for secondary and supersecondary protein structure prediction.

    PubMed

    Ho, Hui Kian; Zhang, Lei; Ramamohanarao, Kotagiri; Martin, Shawn

    2013-01-01

    In this chapter we provide a survey of protein secondary and supersecondary structure prediction using methods from machine learning. Our focus is on machine learning methods applicable to β-hairpin and β-sheet prediction, but we also discuss methods for more general supersecondary structure prediction. We provide background on the secondary and supersecondary structures that we discuss, the features used to describe them, and the basic theory behind the machine learning methods used. We survey the machine learning methods available for secondary and supersecondary structure prediction and compare them where possible. PMID:22987348

  1. A quantum speedup in machine learning: finding an N-bit Boolean function for a classification

    NASA Astrophysics Data System (ADS)

    Yoo, Seokwon; Bang, Jeongho; Lee, Changhyoup; Lee, Jinhyoung

    2014-10-01

    We compare quantum and classical machines designed for learning an N-bit Boolean function in order to address how a quantum system improves the machine learning behavior. The machines of the two types consist of the same number of operations and control parameters, but only the quantum machines utilize the quantum coherence naturally induced by unitary operators. We show that quantum superposition enables quantum learning that is faster than classical learning by expanding the approximate solution regions, i.e., the acceptable regions. This is also demonstrated by means of numerical simulations with a standard feedback model, namely random search, and a practical model, namely differential evolution.

  2. Estimation of alpine skier posture using machine learning techniques.

    PubMed

    Nemec, Bojan; Petrič, Tadej; Babič, Jan; Supej, Matej

    2014-01-01

    High precision Global Navigation Satellite System (GNSS) measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier's neck. A key issue is how to estimate other more relevant parameters of the skier's body, like the center of mass (COM) and ski trajectories. Previously, these parameters were estimated by modeling the skier's body with an inverted-pendulum model that oversimplified the skier's body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier's body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing. PMID:25313492

  3. Machine Learning Based Road Detection from High Resolution Imagery

    NASA Astrophysics Data System (ADS)

    Lv, Ye; Wang, Guofeng; Hu, Xiangyun

    2016-06-01

    At present, remote sensing technology is the best weapon to get information from the earth surface, and it is very useful in geo- information updating and related applications. Extracting road from remote sensing images is one of the biggest demand of rapid city development, therefore, it becomes a hot issue. Roads in high-resolution images are more complex, patterns of roads vary a lot, which becomes obstacles for road extraction. In this paper, a machine learning based strategy is presented. The strategy overall uses the geometry features, radiation features, topology features and texture features. In high resolution remote sensing images, the images cover a great scale of landscape, thus, the speed of extracting roads is slow. So, roads' ROIs are firstly detected by using Houghline detection and buffering method to narrow down the detecting area. As roads in high resolution images are normally in ribbon shape, mean-shift and watershed segmentation methods are used to extract road segments. Then, Real Adaboost supervised machine learning algorithm is used to pick out segments that contain roads' pattern. At last, geometric shape analysis and morphology methods are used to prune and restore the whole roads' area and to detect the centerline of roads.

  4. Machine Learning for Quantum Metrology and Quantum Control

    NASA Astrophysics Data System (ADS)

    Sanders, Barry; Zahedinejad, Ehsan; Palittapongarnpim, Pantita

    Generating quantum metrological procedures and quantum gate designs, subject to constraints such as temporal or particle-number bounds or limits on the number of control parameters, are typically hard computationally. Although greedy machine learning algorithms are ubiquitous for tackling these problems, the severe constraints listed above limit the efficacy of such approaches. Our aim is to devise heuristic machine learning techniques to generate tractable procedures for adaptive quantum metrology and quantum gate design. In particular we have modified differential evolution to generate adaptive interferometric-phase quantum metrology procedures for up to 100 photons including loss and noise, and we have generated policies for designing single-shot high-fidelity three-qubit gates in superconducting circuits by avoided level crossings. Although quantum metrology and quantum control are regarded as disparate, we have developed a unified framework for these two subjects, and this unification enables us to transfer insights and breakthroughs from one of the topics to the other. Thanks to NSERC, AITF and 1000 Talent Plan.

  5. Overlay improvements using a real time machine learning algorithm

    NASA Astrophysics Data System (ADS)

    Schmitt-Weaver, Emil; Kubis, Michael; Henke, Wolfgang; Slotboom, Daan; Hoogenboom, Tom; Mulkens, Jan; Coogans, Martyn; ten Berge, Peter; Verkleij, Dick; van de Mast, Frank

    2014-04-01

    While semiconductor manufacturing is moving towards the 14nm node using immersion lithography, the overlay requirements are tightened to below 5nm. Next to improvements in the immersion scanner platform, enhancements in the overlay optimization and process control are needed to enable these low overlay numbers. Whereas conventional overlay control methods address wafer and lot variation autonomously with wafer pre exposure alignment metrology and post exposure overlay metrology, we see a need to reduce these variations by correlating more of the TWINSCAN system's sensor data directly to the post exposure YieldStar metrology in time. In this paper we will present the results of a study on applying a real time control algorithm based on machine learning technology. Machine learning methods use context and TWINSCAN system sensor data paired with post exposure YieldStar metrology to recognize generic behavior and train the control system to anticipate on this generic behavior. Specific for this study, the data concerns immersion scanner context, sensor data and on-wafer measured overlay data. By making the link between the scanner data and the wafer data we are able to establish a real time relationship. The result is an inline controller that accounts for small changes in scanner hardware performance in time while picking up subtle lot to lot and wafer to wafer deviations introduced by wafer processing.

  6. Machine Learning Approaches to Rare Events Sampling and Estimation

    NASA Astrophysics Data System (ADS)

    Elsheikh, A. H.

    2014-12-01

    Given the severe impacts of rare events, we try to quantitatively answer the following two questions: How can we estimate the probability of a rare event? And what are the factors affecting these probabilities? We utilize machine learning classification methods to define the failure boundary (in the stochastic space) corresponding to a specific threshold of a rare event. The training samples for the classification algorithm are obtained using multilevel splitting and Monte Carlo (MC) simulations. Once the training of the classifier is performed, a full MC simulation can be performed efficiently using the classifier as a reduced order model replacing the full physics simulator.We apply the proposed method on a standard benchmark for CO2 leakage through an abandoned well. In this idealized test case, CO2 is injected into a deep aquifer and then spreads within the aquifer and, upon reaching an abandoned well; it rises to a shallower aquifer. In current study, we try to evaluate the probability of leakage of a pre-defined amount of the injected CO2 given a heavy tailed distribution of the leaky well permeability. We show that machine learning based approaches significantly outperform direct MC and multi-level splitting methods in terms of efficiency and precision. The proposed algorithm's efficiency and reliability enabled us to perform a sensitivity analysis to the different modeling assumptions including the different prior distributions on the probability of CO2 leakage.

  7. Machine learning of molecular electronic properties in chemical compound space

    NASA Astrophysics Data System (ADS)

    Montavon, Grégoire; Rupp, Matthias; Gobre, Vivekanand; Vazquez-Mayagoitia, Alvaro; Hansen, Katja; Tkatchenko, Alexandre; Müller, Klaus-Robert; Anatole von Lilienfeld, O.

    2013-09-01

    The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure-property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e. nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ‘quantum machine’ is similar, and sometimes superior, to modern quantum-chemical methods—at negligible computational cost.

  8. Estimation of Alpine Skier Posture Using Machine Learning Techniques

    PubMed Central

    Nemec, Bojan; Petrič, Tadej; Babič, Jan; Supej, Matej

    2014-01-01

    High precision Global Navigation Satellite System (GNSS) measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier's neck. A key issue is how to estimate other more relevant parameters of the skier's body, like the center of mass (COM) and ski trajectories. Previously, these parameters were estimated by modeling the skier's body with an inverted-pendulum model that oversimplified the skier's body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier's body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing. PMID:25313492

  9. Analyzing angle crashes at unsignalized intersections using machine learning techniques.

    PubMed

    Abdel-Aty, Mohamed; Haleem, Kirolos

    2011-01-01

    A recently developed machine learning technique, multivariate adaptive regression splines (MARS), is introduced in this study to predict vehicles' angle crashes. MARS has a promising prediction power, and does not suffer from interpretation complexity. Negative Binomial (NB) and MARS models were fitted and compared using extensive data collected on unsignalized intersections in Florida. Two models were estimated for angle crash frequency at 3- and 4-legged unsignalized intersections. Treating crash frequency as a continuous response variable for fitting a MARS model was also examined by considering the natural logarithm of the crash frequency. Finally, combining MARS with another machine learning technique (random forest) was explored and discussed. The fitted NB angle crash models showed several significant factors that contribute to angle crash occurrence at unsignalized intersections such as, traffic volume on the major road, the upstream distance to the nearest signalized intersection, the distance between successive unsignalized intersections, median type on the major approach, percentage of trucks on the major approach, size of the intersection and the geographic location within the state. Based on the mean square prediction error (MSPE) assessment criterion, MARS outperformed the corresponding NB models. Also, using MARS for predicting continuous response variables yielded more favorable results than predicting discrete response variables. The generated MARS models showed the most promising results after screening the covariates using random forest. Based on the results of this study, MARS is recommended as an efficient technique for predicting crashes at unsignalized intersections (angle crashes in this study). PMID:21094345

  10. Effective and efficient optics inspection approach using machine learning algorithms

    SciTech Connect

    Abdulla, G; Kegelmeyer, L; Liao, Z; Carr, W

    2010-11-02

    The Final Optics Damage Inspection (FODI) system automatically acquires and utilizes the Optics Inspection (OI) system to analyze images of the final optics at the National Ignition Facility (NIF). During each inspection cycle up to 1000 images acquired by FODI are examined by OI to identify and track damage sites on the optics. The process of tracking growing damage sites on the surface of an optic can be made more effective by identifying and removing signals associated with debris or reflections. The manual process to filter these false sites is daunting and time consuming. In this paper we discuss the use of machine learning tools and data mining techniques to help with this task. We describe the process to prepare a data set that can be used for training and identifying hardware reflections in the image data. In order to collect training data, the images are first automatically acquired and analyzed with existing software and then relevant features such as spatial, physical and luminosity measures are extracted for each site. A subset of these sites is 'truthed' or manually assigned a class to create training data. A supervised classification algorithm is used to test if the features can predict the class membership of new sites. A suite of self-configuring machine learning tools called 'Avatar Tools' is applied to classify all sites. To verify, we used 10-fold cross correlation and found the accuracy was above 99%. This substantially reduces the number of false alarms that would otherwise be sent for more extensive investigation.

  11. Effective and efficient optics inspection approach using machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Abdulla, Ghaleb M.; Kegelmeyer, Laura Mascio; Liao, Zhi M.; Carr, Wren

    2010-11-01

    The Final Optics Damage Inspection (FODI) system automatically acquires and utilizes the Optics Inspection (OI) system to analyze images of the final optics at the National Ignition Facility (NIF). During each inspection cycle up to 1000 images acquired by FODI are examined by OI to identify and track damage sites on the optics. The process of tracking growing damage sites on the surface of an optic can be made more effective by identifying and removing signals associated with debris or reflections. The manual process to filter these false sites is daunting and time consuming. In this paper we discuss the use of machine learning tools and data mining techniques to help with this task. We describe the process to prepare a data set that can be used for training and identifying hardware reflections in the image data. In order to collect training data, the images are first automatically acquired and analyzed with existing software and then relevant features such as spatial, physical and luminosity measures are extracted for each site. A subset of these sites is "truthed" or manually assigned a class to create training data. A supervised classification algorithm is used to test if the features can predict the class membership of new sites. A suite of self-configuring machine learning tools called "Avatar Tools" is applied to classify all sites. To verify, we used 10-fold cross correlation and found the accuracy was above 99%. This substantially reduces the number of false alarms that would otherwise be sent for more extensive investigation.

  12. Forecasting daily streamflow using online sequential extreme learning machines

    NASA Astrophysics Data System (ADS)

    Lima, Aranildo R.; Cannon, Alex J.; Hsieh, William W.

    2016-06-01

    While nonlinear machine methods have been widely used in environmental forecasting, in situations where new data arrive continually, the need to make frequent model updates can become cumbersome and computationally costly. To alleviate this problem, an online sequential learning algorithm for single hidden layer feedforward neural networks - the online sequential extreme learning machine (OSELM) - is automatically updated inexpensively as new data arrive (and the new data can then be discarded). OSELM was applied to forecast daily streamflow at two small watersheds in British Columbia, Canada, at lead times of 1-3 days. Predictors used were weather forecast data generated by the NOAA Global Ensemble Forecasting System (GEFS), and local hydro-meteorological observations. OSELM forecasts were tested with daily, monthly or yearly model updates. More frequent updating gave smaller forecast errors, including errors for data above the 90th percentile. Larger datasets used in the initial training of OSELM helped to find better parameters (number of hidden nodes) for the model, yielding better predictions. With the online sequential multiple linear regression (OSMLR) as benchmark, we concluded that OSELM is an attractive approach as it easily outperformed OSMLR in forecast accuracy.

  13. Calibrating Building Energy Models Using Supercomputer Trained Machine Learning Agents

    SciTech Connect

    Sanyal, Jibonananda; New, Joshua Ryan; Edwards, Richard; Parker, Lynne Edwards

    2014-01-01

    Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrofit purposes. EnergyPlus is the flagship Department of Energy software that performs BEM for different types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manually by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building energy modeling unfeasible for smaller projects. In this paper, we describe the Autotune research which employs machine learning algorithms to generate agents for the different kinds of standard reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of EnergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-effective calibration of building models.

  14. Prediction of brain tumor progression using a machine learning technique

    NASA Astrophysics Data System (ADS)

    Shen, Yuzhong; Banerjee, Debrup; Li, Jiang; Chandler, Adam; Shen, Yufei; McKenzie, Frederic D.; Wang, Jihong

    2010-03-01

    A machine learning technique is presented for assessing brain tumor progression by exploring six patients' complete MRI records scanned during their visits in the past two years. There are ten MRI series, including diffusion tensor image (DTI), for each visit. After registering all series to the corresponding DTI scan at the first visit, annotated normal and tumor regions were overlaid. Intensity value of each pixel inside the annotated regions were then extracted across all of the ten MRI series to compose a 10 dimensional vector. Each feature vector falls into one of three categories:normal, tumor, and normal but progressed to tumor at a later time. In this preliminary study, we focused on the trend of brain tumor progression during three consecutive visits, i.e., visit A, B, and C. A machine learning algorithm was trained using the data containing information from visit A to visit B, and the trained model was used to predict tumor progression from visit A to visit C. Preliminary results showed that prediction for brain tumor progression is feasible. An average of 80.9% pixel-wise accuracy was achieved for tumor progression prediction at visit C.

  15. Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff

    2016-01-01

    Galaxy clusters are a rich source of information for examining fundamental astrophysical processes and cosmological parameters, however, employing clusters as cosmological probes requires accurate mass measurements derived from cluster observables. We study dynamical mass measurements of galaxy clusters contaminated by interlopers, and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create a mock catalog from Multidark's publicly-available N-body MDPL1 simulation where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power law scaling relation to infer cluster mass from galaxy line of sight (LOS) velocity dispersion. The presence of interlopers in the catalog produces a wide, flat fractional mass error distribution, with width = 2.13. We employ the Support Distribution Machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (width = 0.67). Remarkably, SDM applied to contaminated clusters is better able to recover masses than even a scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.

  16. Predicting submicron air pollution indicators: a machine learning approach.

    PubMed

    Pandey, Gaurav; Zhang, Bin; Jian, Le

    2013-05-01

    The regulation of air pollutant levels is rapidly becoming one of the most important tasks for the governments of developing countries, especially China. Submicron particles, such as ultrafine particles (UFP, aerodynamic diameter ≤ 100 nm) and particulate matter ≤ 1.0 micrometers (PM1.0), are an unregulated emerging health threat to humans, but the relationships between the concentration of these particles and meteorological and traffic factors are poorly understood. To shed some light on these connections, we employed a range of machine learning techniques to predict UFP and PM1.0 levels based on a dataset consisting of observations of weather and traffic variables recorded at a busy roadside in Hangzhou, China. Based upon the thorough examination of over twenty five classifiers used for this task, we find that it is possible to predict PM1.0 and UFP levels reasonably accurately and that tree-based classification models (Alternating Decision Tree and Random Forests) perform the best for both these particles. In addition, weather variables show a stronger relationship with PM1.0 and UFP levels, and thus cannot be ignored for predicting submicron particle levels. Overall, this study has demonstrated the potential application value of systematically collecting and analysing datasets using machine learning techniques for the prediction of submicron sized ambient air pollutants. PMID:23535697

  17. Edge detection in grayscale imagery using machine learning

    SciTech Connect

    Glocer, K. A.; Perkins, S. J.

    2004-01-01

    Edge detection can be formulated as a binary classification problem at the pixel level with the goal of identifying individual pixels as either on-edge or off-edge. To solve this classification problem we use both fixed and adaptive feature selection in conjunction with a support vector machine. This approach provides a direct data-driven solution and does not require the intermediate step of learning a distribution to perform a likelihood-based classification. Furthermore, the approach can readily be adapted for other image processing tasks. The algorithm was tested on a data set of 50 object images, each associated with a hand-drawn 'ground truth' image. We computed ROC curves to evaluate the performance of the general feature extraction and machine learning approach, and compared that to the standard Canny edge detector and with recent work on statistical edge detection. Using a direct pixel-by-pixel error metric enabled us to compare to the statistical edge detection approach, and our algorithm compared favorably. Using a more 'natural' metric enabled comparision with work by the authors of the image data set, and our algorithm performed comparably to the suite of state-of-art edge detectors in that study.

  18. Machine Learning Estimates of Natural Product Conformational Energies

    PubMed Central

    Rupp, Matthias; Bauer, Matthias R.; Wilcken, Rainer; Lange, Andreas; Reutlinger, Michael; Boeckler, Frank M.; Schneider, Gisbert

    2014-01-01

    Machine learning has been used for estimation of potential energy surfaces to speed up molecular dynamics simulations of small systems. We demonstrate that this approach is feasible for significantly larger, structurally complex molecules, taking the natural product Archazolid A, a potent inhibitor of vacuolar-type ATPase, from the myxobacterium Archangium gephyra as an example. Our model estimates energies of new conformations by exploiting information from previous calculations via Gaussian process regression. Predictive variance is used to assess whether a conformation is in the interpolation region, allowing a controlled trade-off between prediction accuracy and computational speed-up. For energies of relaxed conformations at the density functional level of theory (implicit solvent, DFT/BLYP-disp3/def2-TZVP), mean absolute errors of less than 1 kcal/mol were achieved. The study demonstrates that predictive machine learning models can be developed for structurally complex, pharmaceutically relevant compounds, potentially enabling considerable speed-ups in simulations of larger molecular structures. PMID:24453952

  19. Parsimonious kernel extreme learning machine in primal via Cholesky factorization.

    PubMed

    Zhao, Yong-Ping

    2016-08-01

    Recently, extreme learning machine (ELM) has become a popular topic in machine learning community. By replacing the so-called ELM feature mappings with the nonlinear mappings induced by kernel functions, two kernel ELMs, i.e., P-KELM and D-KELM, are obtained from primal and dual perspectives, respectively. Unfortunately, both P-KELM and D-KELM possess the dense solutions in direct proportion to the number of training data. To this end, a constructive algorithm for P-KELM (CCP-KELM) is first proposed by virtue of Cholesky factorization, in which the training data incurring the largest reductions on the objective function are recruited as significant vectors. To reduce its training cost further, PCCP-KELM is then obtained with the application of a probabilistic speedup scheme into CCP-KELM. Corresponding to CCP-KELM, a destructive P-KELM (CDP-KELM) is presented using a partial Cholesky factorization strategy, where the training data incurring the smallest reductions on the objective function after their removals are pruned from the current set of significant vectors. Finally, to verify the efficacy and feasibility of the proposed algorithms in this paper, experiments on both small and large benchmark data sets are investigated. PMID:27203553

  20. Machine Learning for Knowledge Extraction from PHR Big Data.

    PubMed

    Poulymenopoulou, Michaela; Malamateniou, Flora; Vassilacopoulos, George

    2014-01-01

    Cloud computing, Internet of things (IOT) and NoSQL database technologies can support a new generation of cloud-based PHR services that contain heterogeneous (unstructured, semi-structured and structured) patient data (health, social and lifestyle) from various sources, including automatically transmitted data from Internet connected devices of patient living space (e.g. medical devices connected to patients at home care). The patient data stored in such PHR systems constitute big data whose analysis with the use of appropriate machine learning algorithms is expected to improve diagnosis and treatment accuracy, to cut healthcare costs and, hence, to improve the overall quality and efficiency of healthcare provided. This paper describes a health data analytics engine which uses machine learning algorithms for analyzing cloud based PHR big health data towards knowledge extraction to support better healthcare delivery as regards disease diagnosis and prognosis. This engine comprises of the data preparation, the model generation and the data analysis modules and runs on the cloud taking advantage from the map/reduce paradigm provided by Apache Hadoop. PMID:25000009

  1. A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology

    PubMed Central

    Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi

    2013-01-01

    Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease. PMID:24228248

  2. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.

    PubMed

    Koo, Ching Lee; Liew, Mei Jing; Mohamad, Mohd Saberi; Salleh, Abdul Hakim Mohamed

    2013-01-01

    Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease. PMID:24228248

  3. Application of machine learning using support vector machines for crater detection from Martian digital topography data

    NASA Astrophysics Data System (ADS)

    Salamunićcar, Goran; Lončarić, Sven

    In our previous work, in order to extend the GT-57633 catalogue [PSS, 56 (15), 1992-2008] with still uncatalogued impact-craters, the following has been done [GRS, 48 (5), in press, doi:10.1109/TGRS.2009.2037750]: (1) the crater detection algorithm (CDA) based on digital elevation model (DEM) was developed; (2) using 1/128° MOLA data, this CDA proposed 414631 crater-candidates; (3) each crater-candidate was analyzed manually; and (4) 57592 were confirmed as correct detections. The resulting GT-115225 catalog is the significant result of this effort. However, to check such a large number of crater-candidates manually was a demanding task. This was the main motivation for work on improvement of the CDA in order to provide better classification of craters as true and false detections. To achieve this, we extended the CDA with the machine learning capability, using support vector machines (SVM). In the first step, the CDA (re)calculates numerous terrain morphometric attributes from DEM. For this purpose, already existing modules of the CDA from our previous work were reused in order to be capable to prepare these attributes. In addition, new attributes were introduced such as ellipse eccentricity and tilt. For machine learning purpose, the CDA is additionally extended to provide 2-D topography-profile and 3-D shape for each crater-candidate. The latter two are a performance problem because of the large number of crater-candidates in combination with the large number of attributes. As a solution, we developed a CDA architecture wherein it is possible to combine the SVM with a radial basis function (RBF) or any other kernel (for initial set of attributes), with the SVM with linear kernel (for the cases when 2-D and 3-D data are included as well). Another challenge is that, in addition to diversity of possible crater types, there are numerous morphological differences between the smallest (mostly very circular bowl-shaped craters) and the largest (multi-ring) impact

  4. Analysis of Pollution Patterns Using Unsupervised Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Kanevski, M.; Timonin, V.; Pozdnoukhov, A.; Maignan, M.

    2009-04-01

    The research presents an application of Machine Learning Algorithms, mainly unsupervised learning techniques like self-organising Kohonen maps (SOM), to study spatial patterns of multivariate environmental spatial data. SOM are well-known neural networks widely used for high-dimensional data analysis, modelling (clustering and classification), and visualization. Self-organising maps belong to the unsupervised machine learning algorithms providing solutions to clustering, classification or density modelling problems using unlabeled data. SOM are efficiently used for the dimensionality reduction and for the visualisation of high-dimensional data (projection into a two-dimensional space). Unlabeled data are points/vectors in a high-dimensional feature space that have some attributes (or coordinates) but have no target values, neither continuous (as in a regression problem) nor discrete labels (as in the case of classification problem). The main task of SOM is to "group" or to "range" in some manner these input vectors and to try to catch regularities (to find patterns) in data by preserving topological structure and by using some well defined similarity measures. A generic methodology presented in this study consists of detailed spatial exploratory data analysis using statistical and geostatistical tools, analysis and modelling of spatial (cross)-correlations anisotropic structures, and application of SOM as a nonlinear modelling and visualisation tool. The case study considers multivariate data of sediments contamination by heavy metals (eight spatially distributes pollutants) in Geneva Lake. The most important modelling task is formulated as a problem of revealing structures or coherent clusters in this multivariate data set that would shed some light on the underlying phenomena of the contamination. Three major clusters, clearly spatially separated, were detected and explained by using the SOM technique.

  5. Machine Learning Techniques for Combining Multi-Model Climate Projections (Invited)

    NASA Astrophysics Data System (ADS)

    Monteleoni, C.

    2013-12-01

    The threat of climate change is one of the greatest challenges currently facing society. Given the profound impact machine learning has made on the natural sciences to which it has been applied, such as the field of bioinformatics, machine learning is poised to accelerate discovery in climate science. Recent advances in the fledgling field of climate informatics have demonstrated the promise of machine learning techniques for problems in climate science. A key problem in climate science is how to combine the projections of the multi-model ensemble of global climate models that inform the Intergovernmental Panel on Climate Change (IPCC). I will present three approaches to this problem. Our Tracking Climate Models (TCM) work demonstrated the promise of an algorithm for online learning with expert advice, for this task. Given temperature projections and hindcasts from 20 IPCC global climate models, and over 100 years of historical temperature data, TCM generated predictions that tracked the changing sequence of which model currently predicts best. On historical data, at both annual and monthly time-scales, and in future simulations, TCM consistently outperformed the average over climate models, the existing benchmark in climate science, at both global and continental scales. We then extended TCM to take into account climate model projections at higher spatial resolutions, and to model geospatial neighborhood influence between regions. Our second algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model from which TCM is derived, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best climate model at a given location. We recently applied a third technique, sparse matrix completion, in which we create a sparse (incomplete) matrix from climate model projections/hindcasts and observed temperature data, and apply a matrix completion algorithm to recover it, yielding

  6. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star–galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star–galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star–galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  7. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star-galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star-galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star-galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  8. Seminar for High School Students “Practice on Manufacturing Technology by Advanced Machine Tools”

    NASA Astrophysics Data System (ADS)

    Marui, Etsuo; Yamawaki, Masao; Taga, Yuken; Omoto, Ken'ichi; Miyaji, Reiji; Ogura, Takahiro; Tsubata, Yoko; Sakai, Toshimasa

    The seminar ‘Practice on Manufacturing Technology by Advanced Machine Tools’ for high school students was held at the supporting center for technology education of Gifu University, under the sponsorship of the Japan Society of Mechanical Engineers. This seminar was held, hoping that many students become interested in manufacturing through the experience of the seminar. Operating CNC milling machine and CNC wire-cut electric discharge machine, they made original nameplates. Participants made the program to control CNC machine tools themselves. In this report, some valuable results obtained through such experience are explained.

  9. ALPS: Advanced Learning Packages, 1978-1979.

    ERIC Educational Resources Information Center

    San Juan Unified School District, Carmichael, CA.

    The document describes the ALPS (Advanced Learning Packages) program for teaching gifted students. Introductory materials provide information on teacher requirements, school requirements, ALPS teacher orientation responsibilities, orientation week, field trip procedures, gifted money available, ALPS costs, ALPS evaluations, the Structure of…

  10. Advancing Research on Undergraduate Science Learning

    ERIC Educational Resources Information Center

    Singer, Susan Rundell

    2013-01-01

    This special issue of "Journal of Research in Science Teaching" reflects conclusions and recommendations in the "Discipline-Based Education Research" (DBER) report and makes a substantial contribution to advancing the field. Research on undergraduate science learning is currently a loose affiliation of related fields. The…

  11. Statistical and Machine-Learning Classifier Framework to Improve Pulse Shape Discrimination System Design

    SciTech Connect

    Wurtz, R.; Kaplan, A.

    2015-10-28

    Pulse shape discrimination (PSD) is a variety of statistical classifier. Fully-­realized statistical classifiers rely on a comprehensive set of tools for designing, building, and implementing. PSD advances rely on improvements to the implemented algorithm. PSD advances can be improved by using conventional statistical classifier or machine learning methods. This paper provides the reader with a glossary of classifier-­building elements and their functions in a fully-­designed and operational classifier framework that can be used to discover opportunities for improving PSD classifier projects. This paper recommends reporting the PSD classifier’s receiver operating characteristic (ROC) curve and its behavior at a gamma rejection rate (GRR) relevant for realistic applications.

  12. Visual Tracking Based on Extreme Learning Machine and Sparse Representation

    PubMed Central

    Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

    2015-01-01

    The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359

  13. Image quality assessment with manifold and machine learning

    NASA Astrophysics Data System (ADS)

    Charrier, Christophe; Lebrun, Gilles; Lezoray, Olivier

    2009-01-01

    A crucial step in image compression is the evaluation of its performance, and more precisely the available way to measure the final quality of the compressed image. In this paper, a machine learning expert, providing a final class number is designed. The quality measure is based on a learned classification process in order to respect the one of human observers. Instead of computing a final note, our method classifies the quality using the quality scale recommended by the UIT. This quality scale contains 5 ranks ordered from 1 (the worst quality) to 5 (the best quality). This was done constructing a vector containing many visual attributes. Finally, the final features vector contains more than 40 attibutes. Unfortunatley, no study about the existing interactions between the used visual attributes has been done. A feature selection algorithm could be interesting but the selection is highly related to the further used classifier. Therefore, we prefer to perform dimensionality reduction instead of feature selection. Manifold Learning methods are used to provide a low-dimensional new representation from the initial high dimensional feature space. The classification process is performed on this new low-dimensional representation of the images. Obtained results are compared to the one obtained without applying the dimension reduction process to judge the efficiency of the method.

  14. Automatic programming of binary morphological machines by PAC learning

    NASA Astrophysics Data System (ADS)

    Barrera, Junior; Tomita, Nina S.; Correa da Silva, Flavio S.; Terada, Routo

    1995-08-01

    Binary image analysis problems can be solved by set operators implemented as programs for a binary morphological machine (BMM). This is a very general and powerful approach to solve this type of problem. However, the design of these programs is not a task manageable by nonexperts on mathematical morphology. In order to overcome this difficulty we have worked on tools that help users describe their goals at higher levels of abstraction and to translate them into BMM programs. Some of these tools are based on the representation of the goals of the user as a collection of input-output pairs of images and the estimation of the target operator from these data. PAC learning is a well suited methodology for this task, since in this theory 'concepts' are represented as Boolean functions that are equivalent to set operators. In order to apply this technique in practice we must have efficient learning algorithms. In this paper we introduce two PAC learning algorithms, both are based on the minimal representation of Boolean functions, which has a straightforward translation to the canonical decomposition of set operators. The first algorithm is based on the classical Quine-McCluskey algorithm for the simplification of Boolean functions, and the second one is based on a new idea for the construction of Boolean functions: the incremental splitting of intervals. We also present a comparative complexity analysis of the two algorithms. Finally, we give some application examples.

  15. Visual tracking based on extreme learning machine and sparse representation.

    PubMed

    Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen

    2015-01-01

    The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359

  16. Advances Towards Synthetic Machines at the Molecular and Nanoscale Level

    PubMed Central

    Konstas, Kristina; Langford, Steven J.; Latter, Melissa J.

    2010-01-01

    The fabrication of increasingly smaller machines to the nanometer scale can be achieved by either a “top-down” or “bottom-up” approach. While the former is reaching its limits of resolution, the latter is showing promise for the assembly of molecular components, in a comparable approach to natural systems, to produce functioning ensembles in a controlled and predetermined manner. In this review we focus on recent progress in molecular systems that act as molecular machine prototypes such as switches, motors, vehicles and logic operators. PMID:20640163

  17. An iterative learning control method with application for CNC machine tools

    SciTech Connect

    Kim, D.I.; Kim, S.

    1996-01-01

    A proportional, integral, and derivative (PID) type iterative learning controller is proposed for precise tracking control of industrial robots and computer numerical controller (CNC) machine tools performing repetitive tasks. The convergence of the output error by the proposed learning controller is guaranteed under a certain condition even when the system parameters are not known exactly and unknown external disturbances exist. As the proposed learning controller is repeatedly applied to the industrial robot or the CNC machine tool with the path-dependent repetitive task, the distance difference between the desired path and the actual tracked or machined path, which is one of the most significant factors in the evaluation of control performance, is progressively reduced. The experimental results demonstrate that the proposed learning controller can improve machining accuracy when the CNC machine tool performs repetitive machining tasks.

  18. A Sustainable Model for Integrating Current Topics in Machine Learning Research into the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Georgiopoulos, M.; DeMara, R. F.; Gonzalez, A. J.; Wu, A. S.; Mollaghasemi, M.; Gelenbe, E.; Kysilka, M.; Secretan, J.; Sharma, C. A.; Alnsour, A. J.

    2009-01-01

    This paper presents an integrated research and teaching model that has resulted from an NSF-funded effort to introduce results of current Machine Learning research into the engineering and computer science curriculum at the University of Central Florida (UCF). While in-depth exposure to current topics in Machine Learning has traditionally occurred…

  19. Enhancement of plant metabolite fingerprinting by machine learning.

    PubMed

    Scott, Ian M; Vermeer, Cornelia P; Liakata, Maria; Corol, Delia I; Ward, Jane L; Lin, Wanchang; Johnson, Helen E; Whitehead, Lynne; Kular, Baldeep; Baker, John M; Walsh, Sean; Dave, Anuja; Larson, Tony R; Graham, Ian A; Wang, Trevor L; King, Ross D; Draper, John; Beale, Michael H

    2010-08-01

    Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by (1)H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, (1)H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted

  20. Detecting falls with wearable sensors using machine learning techniques.

    PubMed

    Özdemir, Ahmet Turan; Barshan, Billur

    2014-01-01

    Falls are a serious public health problem and possibly life threatening for people in fall risk groups. We develop an automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions. Each unit comprises three tri-axial devices (accelerometer, gyroscope, and magnetometer/compass). Fourteen volunteers perform a standardized set of movements including 20 voluntary falls and 16 activities of daily living (ADLs), resulting in a large dataset with 2520 trials. To reduce the computational complexity of training and testing the classifiers, we focus on the raw data for each sensor in a 4 s time window around the point of peak total acceleration of the waist sensor, and then perform feature extraction and reduction. Most earlier studies on fall detection employ rule-based approaches that rely on simple thresholding of the sensor outputs. We successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs). We compare the performance and the computational complexity of the classifiers and achieve the best results with the k-NN classifier and LSM, with sensitivity, specificity, and accuracy all above 99%. These classifiers also have acceptable computational requirements for training and testing. Our approach would be applicable in real-world scenarios where data records of indeterminate length, containing multiple activities in sequence, are recorded. PMID:24945676

  1. Classification of ROTSE Variable Stars using Machine Learning

    NASA Astrophysics Data System (ADS)

    Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration

    2001-12-01

    We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.

  2. Ventricular fibrillation and tachycardia classification using a machine learning approach.

    PubMed

    Li, Qiao; Rajagopalan, Cadathur; Clifford, Gari D

    2014-06-01

    Correct detection and classification of ventricular fibrillation (VF) and rapid ventricular tachycardia (VT) is of pivotal importance for an automatic external defibrillator and patient monitoring. In this paper, a VF/VT classification algorithm using a machine learning method, a support vector machine, is proposed. A total of 14 metrics were extracted from a specific window length of the electrocardiogram (ECG). A genetic algorithm was then used to select the optimal variable combinations. Three annotated public domain ECG databases (the American Heart Association Database, the Creighton University Ventricular Tachyarrhythmia Database, and the MIT-BIH Malignant Ventricular Arrhythmia Database) were used as training, test, and validation datasets. Different window sizes, varying from 1 to 10 s were tested. An accuracy (Ac) of 98.1%, sensitivity (Se) of 98.4%, and specificity (Sp) of 98.0% were obtained on the in-sample training data with 5 s-window size and two selected metrics. On the out-of-sample validation data, an Ac of 96.3% ± 3.4%, Se of 96.2% ± 2.7%, and Sp of 96.2% ± 4.6% were obtained by fivefold cross validation. The results surpass those of current reported methods. PMID:23899591

  3. Application of Machine Learning to the Prediction of Vegetation Health

    NASA Astrophysics Data System (ADS)

    Burchfield, Emily; Nay, John J.; Gilligan, Jonathan

    2016-06-01

    This project applies machine learning techniques to remotely sensed imagery to train and validate predictive models of vegetation health in Bangladesh and Sri Lanka. For both locations, we downloaded and processed eleven years of imagery from multiple MODIS datasets which were combined and transformed into two-dimensional matrices. We applied a gradient boosted machines model to the lagged dataset values to forecast future values of the Enhanced Vegetation Index (EVI). The predictive power of raw spectral data MODIS products were compared across time periods and land use categories. Our models have significantly more predictive power on held-out datasets than a baseline. Though the tool was built to increase capacity to monitor vegetation health in data scarce regions like South Asia, users may include ancillary spatiotemporal datasets relevant to their region of interest to increase predictive power and to facilitate interpretation of model results. The tool can automatically update predictions as new MODIS data is made available by NASA. The tool is particularly well-suited for decision makers interested in understanding and predicting vegetation health dynamics in countries in which environmental data is scarce and cloud cover is a significant concern.

  4. Modeling the Swift BAT Trigger Algorithm with Machine Learning

    NASA Astrophysics Data System (ADS)

    Graff, Philip B.; Lien, Amy Y.; Baker, John G.; Sakamoto, Takanori

    2016-02-01

    To draw inferences about gamma-ray burst (GRB) source populations based on Swift observations, it is essential to understand the detection efficiency of the Swift burst alert telescope (BAT). This study considers the problem of modeling the Swift/BAT triggering algorithm for long GRBs, a computationally expensive procedure, and models it using machine learning algorithms. A large sample of simulated GRBs from Lien et al. is used to train various models: random forests, boosted decision trees (with AdaBoost), support vector machines, and artificial neural networks. The best models have accuracies of ≳97% (≲3% error), which is a significant improvement on a cut in GRB flux, which has an accuracy of 89.6% (10.4% error). These models are then used to measure the detection efficiency of Swift as a function of redshift z, which is used to perform Bayesian parameter estimation on the GRB rate distribution. We find a local GRB rate density of {n}0∼ {0.48}-0.23+0.41 {{{Gpc}}}-3 {{{yr}}}-1 with power-law indices of {n}1∼ {1.7}-0.5+0.6 and {n}2∼ -{5.9}-0.1+5.7 for GRBs above and below a break point of {z}1∼ {6.8}-3.2+2.8. This methodology is able to improve upon earlier studies by more accurately modeling Swift detection and using this for fully Bayesian model fitting.

  5. Structure classification of AB solids via machine learning

    NASA Astrophysics Data System (ADS)

    Guberntis, J. E.; Pilania, G.; Lookman, T.

    2015-03-01

    We explored the use of machine learning methods, specifically support vector machines and various forms of cross-validation, for the task of classifying the crystal structures of the octet AB solids. We partitioned a set of 75 solids into rocksalt and non-rocksalt structures and thus performed a binary classification task. We found that using the standard indices (rσ ,rπ) , suggested by St. John and Bloch several decades ago, enabled an average success in classification of 92 % . Our main new result is our finding that using just rσ and the excess Born effective charge ΔZA of the A atom,computed by DFT, enabled an average success of 98 % , prompting us to propose (rσ , ΔZA) as a replacement for the St. John-Bloch pair. In general, we found that adding one or two other features to the St. John-Bloch pair, unless they include the excess Born effective charge, generally decreases the average success rate. Supported by the Department of Energy.

  6. Combining satellite imagery and machine learning to predict poverty.

    PubMed

    Jean, Neal; Burke, Marshall; Xie, Michael; Davis, W Matthew; Lobell, David B; Ermon, Stefano

    2016-08-19

    Reliable data on economic livelihoods remain scarce in the developing world, hampering efforts to study these outcomes and to design policies that improve them. Here we demonstrate an accurate, inexpensive, and scalable method for estimating consumption expenditure and asset wealth from high-resolution satellite imagery. Using survey and satellite data from five African countries--Nigeria, Tanzania, Uganda, Malawi, and Rwanda--we show how a convolutional neural network can be trained to identify image features that can explain up to 75% of the variation in local-level economic outcomes. Our method, which requires only publicly available data, could transform efforts to track and target poverty in developing countries. It also demonstrates how powerful machine learning techniques can be applied in a setting with limited training data, suggesting broad potential application across many scientific domains. PMID:27540167

  7. A global prediction of seafloor sediment porosity using machine learning

    NASA Astrophysics Data System (ADS)

    Martin, Kylara M.; Wood, Warren T.; Becker, Joseph J.

    2015-12-01

    Porosity (void ratio) is a critical parameter in models of acoustic propagation, bearing strength, and many other seafloor phenomena. However, like many seafloor phenomena, direct measurements are expensive and sparse. We show here how porosity everywhere at the seafloor can be estimated using a machine learning technique (specifically, Random Forests). Such techniques use sparsely acquired direct samples and dense grids of other parameters to produce a statistically optimal estimate where direct measurements are lacking. Our porosity estimate is both qualitatively more consistent with geologic principles than the results produced by interpolation and quantitatively more accurate than results produced by interpolation or regression methods. We present here a seafloor porosity estimate on a 5 arc min, pixel registered grid, produced using widely available, densely sampled grids of other seafloor properties. These techniques represent the only practical means of estimating seafloor properties in inaccessible regions of the seafloor (e.g., the Arctic).

  8. Monitoring frog communities: An application of machine learning

    SciTech Connect

    Taylor, A.; Watson, G.; Grigg, G.; McCallum, H.

    1996-12-31

    Automatic recognition of animal vocalizations would be a valuable tool for a variety of biological research and environmental monitoring applications. We report the development of a software system which can recognize the vocalizations of 22 species of frogs which occur in an area of northern Australia. This software system will be used in unattended operation to monitor the effect on frog populations of the introduced Cane Toad. The system is based around classification of local peaks in the spectrogram of the audio signal using Quinlan`s machine learning system, C4.5. Unreliable identifications of peaks are aggregated together using a hierarchical structure of segments based on the typical temporal vocalization species` patterns. This produces robust system performance.

  9. Liver vessel segmentation based on extreme learning machine.

    PubMed

    Zeng, Ye Zhan; Zhao, Yu Qian; Liao, Miao; Zou, Bei Ji; Wang, Xiao Fang; Wang, Wei

    2016-05-01

    Liver-vessel segmentation plays an important role in vessel structure analysis for liver surgical planning. This paper presents a liver-vessel segmentation method based on extreme learning machine (ELM). Firstly, an anisotropic filter is used to remove noise while preserving vessel boundaries from the original computer tomography (CT) images. Then, based on the knowledge of prior shapes and geometrical structures, three classical vessel filters including Sato, Frangi and offset medialness filters together with the strain energy filter are used to extract vessel structure features. Finally, the ELM is applied to segment liver vessels from background voxels. Experimental results show that the proposed method can effectively segment liver vessels from abdominal CT images, and achieves good accuracy, sensitivity and specificity. PMID:27132031

  10. Transferable Atomic Multipole Machine Learning Models for Small Organic Molecules.

    PubMed

    Bereau, Tristan; Andrienko, Denis; von Lilienfeld, O Anatole

    2015-07-14

    Accurate representation of the molecular electrostatic potential, which is often expanded in distributed multipole moments, is crucial for an efficient evaluation of intermolecular interactions. Here we introduce a machine learning model for multipole coefficients of atom types H, C, O, N, S, F, and Cl in any molecular conformation. The model is trained on quantum-chemical results for atoms in varying chemical environments drawn from thousands of organic molecules. Multipoles in systems with neutral, cationic, and anionic molecular charge states are treated with individual models. The models' predictive accuracy and applicability are illustrated by evaluating intermolecular interaction energies of nearly 1,000 dimers and the cohesive energy of the benzene crystal. PMID:26575759

  11. Liquid intake monitoring through breathing signal using machine learning

    NASA Astrophysics Data System (ADS)

    Dong, Bo; Biswas, Subir

    2013-05-01

    This paper presents the design, system structure and performance for a wireless and wearable diet monitoring system. Food and drink intake can be detected by the way of detecting a person's swallow events. The system works based on the key observation that a person's otherwise continuous breathing process is interrupted by a short apnea when she or he swallows as a part of solid or liquid intake process. We detect the swallows through the difference between normal breathing cycle and breathing cycle with swallows using a wearable chest-belt. Three popular machine learning algorithms have been applied on both time and frequency domain features. Discrimination power of features is then analyzed for applications where only small number of features is allowed. It is shown that high detection performance can be achieved with only few features.

  12. Calibration transfer via an extreme learning machine auto-encoder.

    PubMed

    Chen, Wo-Ruo; Bin, Jun; Lu, Hong-Mei; Zhang, Zhi-Min; Liang, Yi-Zeng

    2016-03-21

    In order to solve the spectra standardization problem in near-infrared (NIR) spectroscopy, a Transfer via Extreme learning machine Auto-encoder Method (TEAM) has been proposed in this study. A comparative study among TEAM, piecewise direct standardization (PDS), generalized least squares (GLS) and calibration transfer methods based on canonical correlation analysis (CCA) was conducted, and the performances of these algorithms were benchmarked with three spectral datasets: corn, tobacco and pharmaceutical tablet spectra. The results show that TEAM is a stable method and can significantly reduce prediction errors compared with PDS, GLS and CCA. TEAM can also achieve the best RMSEPs in most cases with a small number of calibration sets. TEAM is implemented in Python language and available as an open source package at https://github.com/zmzhang/TEAM. PMID:26846329

  13. Machine-Learning Techniques Applied to Antibacterial Drug Discovery

    PubMed Central

    Durrant, Jacob D.; Amaro, Rommie E.

    2014-01-01

    The emergence of drug-resistant bacteria threatens to catapult humanity back to the pre-antibiotic era. Even now, multi-drug-resistant bacterial infections annually result in millions of hospital days, billions in healthcare costs, and, most importantly, tens of thousands of lives lost. As many pharmaceutical companies have abandoned antibiotic development in search of more lucrative therapeutics, academic researchers are uniquely positioned to fill the resulting vacuum. Traditional high-throughput screens and lead-optimization efforts are expensive and labor intensive. Computer-aided drug discovery techniques, which are cheaper and faster, can accelerate the identification of novel antibiotics in an academic setting, leading to improved hit rates and faster transitions to pre-clinical and clinical testing. The current review describes two machine-learning techniques, neural networks and decision trees, that have been used to identify experimentally validated antibiotics. We conclude by describing the future directions of this exciting field. PMID:25521642

  14. Machine-learning techniques applied to antibacterial drug discovery.

    PubMed

    Durrant, Jacob D; Amaro, Rommie E

    2015-01-01

    The emergence of drug-resistant bacteria threatens to revert humanity back to the preantibiotic era. Even now, multidrug-resistant bacterial infections annually result in millions of hospital days, billions in healthcare costs, and, most importantly, tens of thousands of lives lost. As many pharmaceutical companies have abandoned antibiotic development in search of more lucrative therapeutics, academic researchers are uniquely positioned to fill the pipeline. Traditional high-throughput screens and lead-optimization efforts are expensive and labor intensive. Computer-aided drug-discovery techniques, which are cheaper and faster, can accelerate the identification of novel antibiotics, leading to improved hit rates and faster transitions to preclinical and clinical testing. The current review describes two machine-learning techniques, neural networks and decision trees, that have been used to identify experimentally validated antibiotics. We conclude by describing the future directions of this exciting field. PMID:25521642

  15. Extreme Learning Machine for the Predictions of Length of Day

    NASA Astrophysics Data System (ADS)

    Yu, Lei; Zhao, Danning; Cai, Hongbing

    2015-03-01

    This work presents short- and medium-term predictions of length of day (LOD) up to 500 days by means of extreme learning machine (ELM). The EOP C04 time-series with daily values from the International Earth Rotation and Reference Systems Service (IERS) serve as the data basis. The influences of the solid Earth and ocean tides and seasonal atmospheric variations are removed from the C04 series. The residuals are used for training of the ELM. The results of the prediction are compared with those from other prediction methods. The accuracy of the prediction is equal to or even better than that by other approaches. The most striking advantages of employing ELM instead of other algorithms are its noticeably reduced complexity and high computational efficiency.

  16. Hematocrit estimation using online sequential extreme learning machine.

    PubMed

    Huynh, Hieu Trung; Won, Yonggwan; Kim, Jinsul

    2015-01-01

    Hematocrit is a blood test that is defined as the volume percentage of red blood cells in the whole blood. It is one of the important indicators for clinical decision making and the most effective factor in glucose measurement using handheld devices. In this paper, a method for hematocrit estimation that is based upon the transduced current curve and the neural network is presented. The salient points of this method are that (1) the neural network is trained by the online sequential extreme learning machine (OS-ELM) in which the devices can be still trained with new samples during the using process and (2) the extended features are used to reduce the number of current points which can save the battery power of devices and speed up the measurement process. PMID:26405979

  17. Modelling the structure and function of enzymes by machine learning.

    PubMed

    Sternberg, M J; Lewis, R A; King, R D; Muggleton, S

    1992-01-01

    A machine learning program, GOLEM, has been applied to two problems: (1) the prediction of protein secondary structure from sequence and (2) modelling a quantitative structure-activity relationship in drug design. GOLEM takes as input observations and combines them with background knowledge of chemistry to yield rules expressed as stereochemical principles for prediction. The secondary structure prediction was explored on the alpha/alpha class of proteins; on an unrelated test set it yielded 81% accuracy. The rules from GOLEM defined patterns of residues forming alpha-helices. The system studied for drug design was the activities of trimethoprim analogues binding to E. coli dihydrofolate reductase. The GOLEM rules were a better model than standard regression approaches. More importantly, these rules described the chemical properties of the enzyme-binding site that were in broad agreement with the crystallographic structure. PMID:1290938

  18. Nonlinear machine learning and design of reconfigurable digital colloids.

    PubMed

    Long, Andrew W; Phillips, Carolyn L; Jankowksi, Eric; Ferguson, Andrew L

    2016-09-14

    Digital colloids, a cluster of freely rotating "halo" particles tethered to the surface of a central particle, were recently proposed as ultra-high density memory elements for information storage. Rational design of these digital colloids for memory storage applications requires a quantitative understanding of the thermodynamic and kinetic stability of the configurational states within which information is stored. We apply nonlinear machine learning to Brownian dynamics simulations of these digital colloids to extract the low-dimensional intrinsic manifold governing digital colloid morphology, thermodynamics, and kinetics. By modulating the relative size ratio between halo particles and central particles, we investigate the size-dependent configurational stability and transition kinetics for the 2-state tetrahedral (N = 4) and 30-state octahedral (N = 6) digital colloids. We demonstrate the use of this framework to guide the rational design of a memory storage element to hold a block of text that trades off the competing design criteria of memory addressability and volatility. PMID:27498992

  19. Bicriteria single machine scheduling with setup times and learning effects

    NASA Astrophysics Data System (ADS)

    Soroush, H. M.

    2012-11-01

    We study a bicriteria single machine scheduling problem with job-dependent and past-sequence-dependent (psd) setup time and job-dependent learning effects. The goal is to find the optimal sequence that minimizes a linear combination of a pair of performance criteria consisting of the makespan, the total completion time, and the total absolute differences in completion times. We show that special cases of the resulting three problems are solvable polynomially. However, the general cases cannot be solved in polynomial time; thus, branch-and-bound (B&B) methods are proposed to derive optimal sequences. Computational results demonstrate that the B&B methods solve relatively large problem instances in reasonable amounts of time.

  20. Gene discovery for facioscapulohumeral muscular dystrophy by machine learning techniques.

    PubMed

    González-Navarro, Félix F; Belanche-Muñoz, Lluís A; Gámez-Moreno, María G; Flores-Ríos, Brenda L; Ibarra-Esquer, Jorge E; López-Morteo, Gabriel A

    2016-04-28

    Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results. PMID:26960968

  1. Machine Learning Techniques for Arterial Pressure Waveform Analysis

    PubMed Central

    Almeida, Vânia G.; Vieira, João; Santos, Pedro; Pereira, Tânia; Pereira, H. Catarina; Correia, Carlos; Pego, Mariano; Cardoso, João

    2013-01-01

    The Arterial Pressure Waveform (APW) can provide essential information about arterial wall integrity and arterial stiffness. Most of APW analysis frameworks individually process each hemodynamic parameter and do not evaluate inter-dependencies in the overall pulse morphology. The key contribution of this work is the use of machine learning algorithms to deal with vectorized features extracted from APW. With this purpose, we follow a five-step evaluation methodology: (1) a custom-designed, non-invasive, electromechanical device was used in the data collection from 50 subjects; (2) the acquired position and amplitude of onset, Systolic Peak (SP), Point of Inflection (Pi) and Dicrotic Wave (DW) were used for the computation of some morphological attributes; (3) pre-processing work on the datasets was performed in order to reduce the number of input features and increase the model accuracy by selecting the most relevant ones; (4) classification of the dataset was carried out using four different machine learning algorithms: Random Forest, BayesNet (probabilistic), J48 (decision tree) and RIPPER (rule-based induction); and (5) we evaluate the trained models, using the majority-voting system, comparatively to the respective calculated Augmentation Index (AIx). Classification algorithms have been proved to be efficient, in particular Random Forest has shown good accuracy (96.95%) and high area under the curve (AUC) of a Receiver Operating Characteristic (ROC) curve (0.961). Finally, during validation tests, a correlation between high risk labels, retrieved from the multi-parametric approach, and positive AIx values was verified. This approach gives allowance for designing new hemodynamic morphology vectors and techniques for multiple APW analysis, thus improving the arterial pulse understanding, especially when compared to traditional single-parameter analysis, where the failure in one parameter measurement component, such as Pi, can jeopardize the whole evaluation. PMID

  2. Identifying Energy-Efficient Concurrency Levels using Machine Learning

    SciTech Connect

    Curtis-Maury, M; Singh, K; Blagojevic, F; Nikolopoulos, D S; de Supinski, B R; Schulz, M; McKee, S A

    2007-07-23

    Multicore microprocessors have been largely motivated by the diminishing returns in performance and the increased power consumption of single-threaded ILP microprocessors. With the industry already shifting from multicore to many-core microprocessors, software developers must extract more thread-level parallelism from applications. Unfortunately, low power-efficiency and diminishing returns in performance remain major obstacles with many cores. Poor interaction between software and hardware, and bottlenecks in shared hardware structures often prevent scaling to many cores, even in applications where a high degree of parallelism is potentially available. In some cases, throwing additional cores at a problem may actually harm performance and increase power consumption. Better use of otherwise limitedly beneficial cores by software components such as hypervisors and operating systems can improve system-wide performance and reliability, even in cases where power consumption is not a main concern. In response to these observations, we evaluate an approach to throttle concurrency in parallel programs dynamically. We throttle concurrency to levels with higher predicted efficiency from both performance and energy standpoints, and we do so via machine learning, specifically artificial neural networks (ANNs). One advantage of using ANNs over similar techniques previously explored is that the training phase is greatly simplified, thereby reducing the burden on the end user. Using machine learning in the context of concurrency throttling is novel. We show that ANNs are effective for identifying energy-efficient concurrency levels in multithreaded scientific applications, and we do so using physical experimentation on a state-of-the-art quad-core Xeon platform.

  3. Active machine learning-driven experimentation to determine compound effects on protein patterns

    PubMed Central

    Naik, Armaghan W; Kangas, Joshua D; Sullivan, Devin P; Murphy, Robert F

    2016-01-01

    High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learning-driven biological experimentation in which the set of possible phenotypes is unknown in advance. DOI: http://dx.doi.org/10.7554/eLife.10047.001 PMID:26840049

  4. Advancement in Productivity of Arabic into English Machine Translation Systems from 2008 to 2013

    ERIC Educational Resources Information Center

    Abu-Al-Sha'r, Awatif M.; AbuSeileek, Ali F.

    2013-01-01

    This paper attempts to compare between the advancements in the productivity of Arabic into English Machine Translation Systems between two years, 2008 and 2013. It also aims to evaluate the progress achieved by various systems of Arabic into English electronic translation between the two years. For tracing such advancement, a comparative analysis…

  5. Gravity Spy - Integrating LIGO detector characterization, citizen science, and machine learning

    NASA Astrophysics Data System (ADS)

    Zevin, Michael; Gravity Spy

    2016-06-01

    On September 14th 2015, the Advanced Laser Interferometer Gravitational-wave Observatory (aLIGO) made the first direct observation of gravitational waves and opened a new field of observational astronomy. However, being the most complicated and sensitve experiment ever undertaken in gravitational physics, aLIGO is susceptible to various sources of environmental and instrumental noise that hinder the search for more gravitational waves.Of particular concern are transient, non-Gaussian noise features known as glitches. Glitches can mimic true astrophysical gravitational waves, occur at a high enough frequency to be coherent between the two detectors, and generally worsen aLIGO's detection capabilities. The proper classification and charaterization of glitches is paramount in optimizing aLIGO's ability to detect gravitational waves. However, teaching computers to identify and morphologically classify these artifacts is exceedingly difficult.Human intuition has proven to be a useful tool in classifcation probelms such as this. Gravity Spy is an innovative, interdisciplinary project hosted by Zooniverse that combines aLIGO detector characterization, citizen science, machine learning, and social science. In this project, citizen scientists and computers will work together in a sybiotic relationship that leverages human pattern recognition and the ability of machine learning to process large amounts of data systematically: volunteers classify triggers from the aLIGO data steam that are constantly updated as aLIGO takes in new data, and these classifications are used to train machine learning algorithms which proceed to classify the bulk of aLIGO data and feed questionable glithces back to the users.In this talk, I will discuss the workflow and initial results of the Gravity Spy project with regard to aLIGO's future observing runs and highlight the potential of such citizen science projects in promoting nascent fields such as gravitational wave astrophysics.

  6. Cost-Benefit Analysis of Computer Resources for Machine Learning

    USGS Publications Warehouse

    Champion, Richard A.

    2007-01-01

    Machine learning describes pattern-recognition algorithms - in this case, probabilistic neural networks (PNNs). These can be computationally intensive, in part because of the nonlinear optimizer, a numerical process that calibrates the PNN by minimizing a sum of squared errors. This report suggests efficiencies that are expressed as cost and benefit. The cost is computer time needed to calibrate the PNN, and the benefit is goodness-of-fit, how well the PNN learns the pattern in the data. There may be a point of diminishing returns where a further expenditure of computer resources does not produce additional benefits. Sampling is suggested as a cost-reduction strategy. One consideration is how many points to select for calibration and another is the geometric distribution of the points. The data points may be nonuniformly distributed across space, so that sampling at some locations provides additional benefit while sampling at other locations does not. A stratified sampling strategy can be designed to select more points in regions where they reduce the calibration error and fewer points in regions where they do not. Goodness-of-fit tests ensure that the sampling does not introduce bias. This approach is illustrated by statistical experiments for computing correlations between measures of roadless area and population density for the San Francisco Bay Area. The alternative to training efficiencies is to rely on high-performance computer systems. These may require specialized programming and algorithms that are optimized for parallel performance.

  7. Extreme learning machine and adaptive sparse representation for image classification.

    PubMed

    Cao, Jiuwen; Zhang, Kai; Luo, Minxia; Yin, Chun; Lai, Xiaoping

    2016-09-01

    Recent research has shown the speed advantage of extreme learning machine (ELM) and the accuracy advantage of sparse representation classification (SRC) in the area of image classification. Those two methods, however, have their respective drawbacks, e.g., in general, ELM is known to be less robust to noise while SRC is known to be time-consuming. Consequently, ELM and SRC complement each other in computational complexity and classification accuracy. In order to unify such mutual complementarity and thus further enhance the classification performance, we propose an efficient hybrid classifier to exploit the advantages of ELM and SRC in this paper. More precisely, the proposed classifier consists of two stages: first, an ELM network is trained by supervised learning. Second, a discriminative criterion about the reliability of the obtained ELM output is adopted to decide whether the query image can be correctly classified or not. If the output is reliable, the classification will be performed by ELM; otherwise the query image will be fed to SRC. Meanwhile, in the stage of SRC, a sub-dictionary that is adaptive to the query image instead of the entire dictionary is extracted via the ELM output. The computational burden of SRC thus can be reduced. Extensive experiments on handwritten digit classification, landmark recognition and face recognition demonstrate that the proposed hybrid classifier outperforms ELM and SRC in classification accuracy with outstanding computational efficiency. PMID:27389571

  8. Using machine learning techniques to automate sky survey catalog generation

    NASA Technical Reports Server (NTRS)

    Fayyad, Usama M.; Roden, J. C.; Doyle, R. J.; Weir, Nicholas; Djorgovski, S. G.

    1993-01-01

    We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data.

  9. Skull-Stripping with Machine Learning Deformable Organisms

    PubMed Central

    Prasad, Gautam; Joshi, Anand A.; Feng, Albert; Toga, Arthur W.; Thompson, Paul M.; Terzopoulos, Demetri

    2014-01-01

    Background Segmentation methods for medical images may not generalize well to new data sets or new tasks, hampering their utility. We attempt to remedy these issues using deformable organisms to create an easily customizable segmentation plan. We validate our framework by creating a plan to locate the brain in 3D magnetic resonance images of the head (skull-stripping). New Method Our method borrows ideas from artificial life to govern a set of deformable models. We use control processes such as sensing, proactive planning, reactive behavior, and knowledge representation to segment an image. The image may have landmarks and features specific to that dataset; these may be easily incorporated into the plan. In addition, we use a machine learning method to make our segmentation more accurate. Results Our method had the least Hausdorff distance error, but included slightly less brain voxels (false negatives). It also had the lowest false positive error and performed on par to skull-stripping specific method on other metrics. Comparison with Existing Method(s) We tested our method on 838 T1-weighted images, evaluating results using distance and overlap error metrics based on expert gold standard segmentations. We evaluated the results before and after the learning step to quantify its benefit; we also compare our results to three other widely used methods: BSE, BET, and the Hybrid Watershed algorithm. Conclusions Our framework captures diverse categories of information needed for brain segmentation and will provide a foundation for tackling a wealth of segmentation problems. PMID:25124851

  10. Advancing the Relationship between Business School Ranking and Student Learning

    ERIC Educational Resources Information Center

    Elbeck, Matt

    2009-01-01

    This commentary advances a positive relationship between a business school's ranking in the popular press and student learning by advocating market-oriented measures of student learning. A framework for student learning is based on the Assurance of Learning mandated by the Association to Advance Collegiate Schools of Business International,…

  11. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  12. Machine learning techniques for energy optimization in mobile embedded systems

    NASA Astrophysics Data System (ADS)

    Donohoo, Brad Kyoshi

    Mobile smartphones and other portable battery operated embedded systems (PDAs, tablets) are pervasive computing devices that have emerged in recent years as essential instruments for communication, business, and social interactions. While performance, capabilities, and design are all important considerations when purchasing a mobile device, a long battery lifetime is one of the most desirable attributes. Battery technology and capacity has improved over the years, but it still cannot keep pace with the power consumption demands of today's mobile devices. This key limiter has led to a strong research emphasis on extending battery lifetime by minimizing energy consumption, primarily using software optimizations. This thesis presents two strategies that attempt to optimize mobile device energy consumption with negligible impact on user perception and quality of service (QoS). The first strategy proposes an application and user interaction aware middleware framework that takes advantage of user idle time between interaction events of the foreground application to optimize CPU and screen backlight energy consumption. The framework dynamically classifies mobile device applications based on their received interaction patterns, then invokes a number of different power management algorithms to adjust processor frequency and screen backlight levels accordingly. The second strategy proposes the usage of machine learning techniques to learn a user's mobile device usage pattern pertaining to spatiotemporal and device contexts, and then predict energy-optimal data and location interface configurations. By learning where and when a mobile device user uses certain power-hungry interfaces (3G, WiFi, and GPS), the techniques, which include variants of linear discriminant analysis, linear logistic regression, non-linear logistic regression, and k-nearest neighbor, are able to dynamically turn off unnecessary interfaces at runtime in order to save energy.

  13. OpenCL based machine learning labeling of biomedical datasets

    NASA Astrophysics Data System (ADS)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  14. The identification of cis-regulatory elements: A review from a machine learning perspective.

    PubMed

    Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

    2015-12-01

    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field. PMID:26499213

  15. Rapid and Accurate Machine Learning Recognition of High Performing Metal Organic Frameworks for CO2 Capture.

    PubMed

    Fernandez, Michael; Boyd, Peter G; Daff, Thomas D; Aghaji, Mohammad Zein; Woo, Tom K

    2014-09-01

    In this work, we have developed quantitative structure-property relationship (QSPR) models using advanced machine learning algorithms that can rapidly and accurately recognize high-performing metal organic framework (MOF) materials for CO2 capture. More specifically, QSPR classifiers have been developed that can, in a fraction of a section, identify candidate MOFs with enhanced CO2 adsorption capacity (>1 mmol/g at 0.15 bar and >4 mmol/g at 1 bar). The models were tested on a large set of 292 050 MOFs that were not part of the training set. The QSPR classifier could recover 945 of the top 1000 MOFs in the test set while flagging only 10% of the whole library for compute intensive screening. Thus, using the machine learning classifiers as part of a high-throughput screening protocol would result in an order of magnitude reduction in compute time and allow intractably large structure libraries and search spaces to be screened. PMID:26278259

  16. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text

    PubMed Central

    Bravo, Àlex; Li, Tong Shu; Su, Andrew I.; Good, Benjamin M.; Furlong, Laura I.

    2016-01-01

    Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects. Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V PMID:27307137

  17. Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.

    PubMed

    Bravo, Àlex; Li, Tong Shu; Su, Andrew I; Good, Benjamin M; Furlong, Laura I

    2016-01-01

    Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V. PMID:27307137

  18. Structure classification and melting temperature prediction in octet AB solids via machine learning

    NASA Astrophysics Data System (ADS)

    Pilania, G.; Gubernatis, J. E.; Lookman, T.

    2015-06-01

    resulting model to be 11 % of the mean melting temperature of the data, but we note that if the accuracy of this predicted error is itself measured, our estimated fitting error itself has a root-mean-square error of 50 % . In short, what we illustrate is that classification and regression predictions can vary significantly, depending on the details of how machine learning methods are applied to small data sets. This variation makes it important, if not essential, to average the predictions and compute confidence intervals about these averages to report results meaningfully. However, when properly used, these statistical methods can advance our understanding and improve predictions of material properties even for small data sets.

  19. Modeling the Swift BAT Trigger Algorithm with Machine Learning

    NASA Technical Reports Server (NTRS)

    Graff, Philip B.; Lien, Amy Y.; Baker, John G.; Sakamoto, Takanori

    2015-01-01

    To draw inferences about gamma-ray burst (GRB) source populations based on Swift observations, it is essential to understand the detection efficiency of the Swift burst alert telescope (BAT). This study considers the problem of modeling the Swift BAT triggering algorithm for long GRBs, a computationally expensive procedure, and models it using machine learning algorithms. A large sample of simulated GRBs from Lien et al. (2014) is used to train various models: random forests, boosted decision trees (with AdaBoost), support vector machines, and artificial neural networks. The best models have accuracies of approximately greater than 97% (approximately less than 3% error), which is a significant improvement on a cut in GRB flux which has an accuracy of 89:6% (10:4% error). These models are then used to measure the detection efficiency of Swift as a function of redshift z, which is used to perform Bayesian parameter estimation on the GRB rate distribution. We find a local GRB rate density of eta(sub 0) approximately 0.48(+0.41/-0.23) Gpc(exp -3) yr(exp -1) with power-law indices of eta(sub 1) approximately 1.7(+0.6/-0.5) and eta(sub 2) approximately -5.9(+5.7/-0.1) for GRBs above and below a break point of z(sub 1) approximately 6.8(+2.8/-3.2). This methodology is able to improve upon earlier studies by more accurately modeling Swift detection and using this for fully Bayesian model fitting. The code used in this is analysis is publicly available online.

  20. Machine learning applications in cancer prognosis and prediction.

    PubMed

    Kourou, Konstantina; Exarchos, Themis P; Exarchos, Konstantinos P; Karamouzis, Michalis V; Fotiadis, Dimitrios I

    2015-01-01

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes. PMID:25750696

  1. Machine learning applications in cancer prognosis and prediction

    PubMed Central

    Kourou, Konstantina; Exarchos, Themis P.; Exarchos, Konstantinos P.; Karamouzis, Michalis V.; Fotiadis, Dimitrios I.

    2014-01-01

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes. PMID:25750696

  2. A cross-validation scheme for machine learning algorithms in shotgun proteomics

    PubMed Central

    2012-01-01

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting. PMID:23176259

  3. Application of machine learning for the evaluation of turfgrass plots using aerial images

    NASA Astrophysics Data System (ADS)

    Ding, Ke; Raheja, Amar; Bhandari, Subodh; Green, Robert L.

    2016-05-01

    Historically, investigation of turfgrass characteristics have been limited to visual ratings. Although relevant information may result from such evaluations, final inferences may be questionable because of the subjective nature in which the data is collected. Recent advances in computer vision techniques allow researchers to objectively measure turfgrass characteristics such as percent ground cover, turf color, and turf quality from the digital images. This paper focuses on developing a methodology for automated assessment of turfgrass quality from aerial images. Images of several turfgrass plots of varying quality were gathered using a camera mounted on an unmanned aerial vehicle. The quality of these plots were also evaluated based on visual ratings. The goal was to use the aerial images to generate quality evaluations on a regular basis for the optimization of water treatment. Aerial images are used to train a neural network so that appropriate features such as intensity, color, and texture of the turfgrass are extracted from these images. Neural network is a nonlinear classifier commonly used in machine learning. The output of the neural network trained model is the ratings of the grass, which is compared to the visual ratings. Currently, the quality and the color of turfgrass, measured as the greenness of the grass, are evaluated. The textures are calculated using the Gabor filter and co-occurrence matrix. Other classifiers such as support vector machines and simpler linear regression models such as Ridge regression and LARS regression are also used. The performance of each model is compared. The results show encouraging potential for using machine learning techniques for the evaluation of turfgrass quality and color.

  4. Is extreme learning machine feasible? A theoretical assessment (part I).

    PubMed

    Liu, Xia; Lin, Shaobo; Fang, Jian; Xu, Zongben

    2015-01-01

    An extreme learning machine (ELM) is a feedforward neural network (FNN) like learning system whose connections with output neurons are adjustable, while the connections with and within hidden neurons are randomly fixed. Numerous applications have demonstrated the feasibility and high efficiency of ELM-like systems. It has, however, been open if this is true for any general applications. In this two-part paper, we conduct a comprehensive feasibility analysis of ELM. In Part I, we provide an answer to the question by theoretically justifying the following: 1) for some suitable activation functions, such as polynomials, Nadaraya-Watson and sigmoid functions, the ELM-like systems can attain the theoretical generalization bound of the FNNs with all connections adjusted, i.e., they do not degrade the generalization capability of the FNNs even when the connections with and within hidden neurons are randomly fixed; 2) the number of hidden neurons needed for an ELM-like system to achieve the theoretical bound can be estimated; and 3) whenever the activation function is taken as polynomial, the deduced hidden layer output matrix is of full column-rank, therefore the generalized inverse technique can be efficiently applied to yield the solution of an ELM-like system, and, furthermore, for the nonpolynomial case, the Tikhonov regularization can be applied to guarantee the weak regularity while not sacrificing the generalization capability. In Part II, however, we reveal a different aspect of the feasibility of ELM: there also exists some activation functions, which makes the corresponding ELM degrade the generalization capability. The obtained results underlie the feasibility and efficiency of ELM-like systems, and yield various generalizations and improvements of the systems as well. PMID:25069126

  5. The Value Simulation-Based Learning Added to Machining Technology in Singapore

    ERIC Educational Resources Information Center

    Fang, Linda; Tan, Hock Soon; Thwin, Mya Mya; Tan, Kim Cheng; Koh, Caroline

    2011-01-01

    This study seeks to understand the value simulation-based learning (SBL) added to the learning of Machining Technology in a 15-week core subject course offered to university students. The research questions were: (1) How did SBL enhance classroom learning? (2) How did SBL help participants in their test? (3) How did SBL prepare participants for…

  6. Some remarks on prediction of protein-protein interaction with machine learning.

    PubMed

    Zhang, Shao-Wu; Wei, Ze-Gang

    2015-01-01

    Protein-protein interactions (PPIs) play a key role in many cellular processes. Uncovering the PPIs and their function within the cell is a challenge of post-genomic biology and will improve our understanding of disease and help in the development of novel methods for disease diagnosis and forensics. The experimental methods currently used to identify PPIs are both time-consuming and expensive, and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. These obstacles could be overcome by developing computational approaches to predict PPIs and validate the obtained experimental results. In this work, we will describe the recent advances in predicting protein-protein interaction from the following aspects: i) the benchmark dataset construction, ii) the sequence representation approaches, iii) the common machine learning algorithms, and iv) the cross-validation test methods and assessment metrics. PMID:25548927

  7. Detection of blue-white veil areas in dermoscopy images using machine learning techniques

    NASA Astrophysics Data System (ADS)

    Celebi, M. E.; Kingravi, Hassan A.; Aslandogan, Y. A.; Stoecker, William V.

    2006-03-01

    As a result of the advances in skin imaging technology and the development of suitable image processing techniques, during the last decade, there has been a significant increase of interest in the computer-aided diagnosis of skin cancer. Dermoscopy is a non-invasive skin imaging technique which permits visualization of features of pigmented melanocytic neoplasms that are not discernable by examination with the naked eye. One of the useful features in dermoscopic diagnosis is the blue-white veil (irregular, structureless areas of confluent blue pigmentation with an overlying white "ground-glass" film) which is mostly associated with invasive melanoma. In this preliminary study, a machine learning approach to the detection of blue-white veil areas in dermoscopy images is presented. The method involves pixel classification based on relative and absolute color features using a decision tree classifier. Promising results were obtained on a set of 224 dermoscopy images.

  8. Identifying relatively high-risk group of coronary artery calcification based on progression rate: statistical and machine learning methods.

    PubMed

    Kim, Ha-Young; Yoo, Sanghyun; Lee, Jihyun; Kam, Hye Jin; Woo, Kyoung-Gu; Choi, Yoon-Ho; Sung, Jidong; Kang, Mira

    2012-01-01

    Coronary artery calcification (CAC) score is an important predictor of coronary artery disease (CAD), which is the primary cause of death in advanced countries. Early prediction of high-risk of CAC based on progression rate enables people to prevent CAD from developing into severe symptoms and diseases. In this study, we developed various classifiers to identify patients in high risk of CAC using statistical and machine learning methods, and compared them with performance accuracy. For statistical approaches, linear regression based classifier and logistic regression model were developed. For machine learning approaches, we suggested three kinds of ensemble-based classifiers (best, top-k, and voting method) to deal with imbalanced distribution of our data set. Ensemble voting method outperformed all other methods including regression methods as AUC was 0.781. PMID:23366360

  9. A Distributed Support Vector Machine Learning Over Wireless Sensor Networks.

    PubMed

    Kim, Woojin; Stanković, Milos S; Johansson, Karl H; Kim, H Jin

    2015-11-01

    This paper is about fully-distributed support vector machine (SVM) learning over wireless sensor networks. With the concept of the geometric SVM, we propose to gossip the set of extreme points of the convex hull of local data set with neighboring nodes. It has the advantages of a simple communication mechanism and finite-time convergence to a common global solution. Furthermore, we analyze the scalability with respect to the amount of exchanged information and convergence time, with a specific emphasis on the small-world phenomenon. First, with the proposed naive convex hull algorithm, the message length remains bounded as the number of nodes increases. Second, by utilizing a small-world network, we have an opportunity to drastically improve the convergence performance with only a small increase in power consumption. These properties offer a great advantage when dealing with a large-scale network. Simulation and experimental results support the feasibility and effectiveness of the proposed gossip-based process and the analysis. PMID:26470063

  10. Parsimonious extreme learning machine using recursive orthogonal least squares.

    PubMed

    Wang, Ning; Er, Meng Joo; Han, Min

    2014-10-01

    Novel constructive and destructive parsimonious extreme learning machines (CP- and DP-ELM) are proposed in this paper. By virtue of the proposed ELMs, parsimonious structure and excellent generalization of multiinput-multioutput single hidden-layer feedforward networks (SLFNs) are obtained. The proposed ELMs are developed by innovative decomposition of the recursive orthogonal least squares procedure into sequential partial orthogonalization (SPO). The salient features of the proposed approaches are as follows: 1) Initial hidden nodes are randomly generated by the ELM methodology and recursively orthogonalized into an upper triangular matrix with dramatic reduction in matrix size; 2) the constructive SPO in the CP-ELM focuses on the partial matrix with the subcolumn of the selected regressor including nonzeros as the first column while the destructive SPO in the DP-ELM operates on the partial matrix including elements determined by the removed regressor; 3) termination criteria for CP- and DP-ELM are simplified by the additional residual error reduction method; and 4) the output weights of the SLFN need not be solved in the model selection procedure and is derived from the final upper triangular equation by backward substitution. Both single- and multi-output real-world regression data sets are used to verify the effectiveness and superiority of the CP- and DP-ELM in terms of parsimonious architecture and generalization accuracy. Innovative applications to nonlinear time-series modeling demonstrate superior identification results. PMID:25291736

  11. Floor-Fractured Craters through Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Thorey, C.

    2015-12-01

    Floor-fractured craters are impact craters that have undergone post impact deformations. They are characterized by shallow floors with a plate-like or convex appearance, wide floor moats, and radial, concentric, and polygonal floor-fractures. While the origin of these deformations has long been debated, it is now generally accepted that they are the result of the emplacement of shallow magmatic intrusions below their floor. These craters thus constitute an efficient tool to probe the importance of intrusive magmatism from the lunar surface. The most recent catalog of lunar-floor fractured craters references about 200 of them, mainly located around the lunar maria Herein, we will discuss the possibility of using machine learning algorithms to try to detect new floor-fractured craters on the Moon among the 60000 craters referenced in the most recent catalogs. In particular, we will use the gravity field provided by the Gravity Recovery and Interior Laboratory (GRAIL) mission, and the topographic dataset obtained from the Lunar Orbiter Laser Altimeter (LOLA) instrument to design a set of representative features for each crater. We will then discuss the possibility to design a binary supervised classifier, based on these features, to discriminate between the presence or absence of crater-centered intrusion below a specific crater. First predictions from different classifier in terms of their accuracy and uncertainty will be presented.

  12. Detecting abbreviations in discharge summaries using machine learning methods.

    PubMed

    Wu, Yonghui; Rosenbloom, S Trent; Denny, Joshua C; Miller, Randolph A; Mani, Subramani; Giuse, Dario A; Xu, Hua

    2011-01-01

    Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%. PMID:22195219

  13. Automated mapping of building facades by machine learning

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-08-01

    Facades of buildings contain various types of objects which have to be recorded for information systems. The article describes a solution for this task focussing on automated classification by means of machine learning techniques. Stereo pairs of oblique images are used to derive 3D point clouds of buildings. The planes of the buildings are automatically detected. The derived planes are supplemented with a regular grid of points for which the colour values are found in the images. For each grid point of the façade additional attributes are derived from image and object data. This "intelligent" point cloud is analysed by a decision tree, which is derived from a small training set. The derived decision tree is then used to classify the complete point cloud. To each point of the regular façade grid a class is assigned and a façade plan is mapped by a colour palette representing the different objects. Some image processing methods are applied to improve the appearance of the interpreted façade plot and to extract additional information. The proposed method is tested on facades of a church. Accuracy measures were derived from 140 independent checkpoints, which were randomly selected. When selecting four classes ("window", "stone work", "painted wall", and "vegetation") the overall accuracy is assessed with 80 % (95 % Confidence Interval: 71 %-88 %). The user accuracy of class "stonework" was assessed with 90 % (95 % CI: 80 %-97 %). The proposed methodology has a high potential for automation and fast processing.

  14. Phase discontinuity predictions using a machine-learning trained kernel.

    PubMed

    Sawaf, Firas; Groves, Roger M

    2014-08-20

    Phase unwrapping is one of the key steps of interferogram analysis, and its accuracy relies primarily on the correct identification of phase discontinuities. This can be especially challenging for inherently noisy phase fields, such as those produced through shearography and other speckle-based interferometry techniques. We showed in a recent work how a relatively small 10×10 pixel kernel was trained, through machine learning methods, for predicting the locations of phase discontinuities within noisy wrapped phase maps. We describe here how this kernel can be applied in a sliding-window fashion, such that each pixel undergoes 100 phase-discontinuity examinations--one test for each of its possible positions relative to its neighbors within the kernel's extent. We explore how the resulting predictions can be accumulated, and aggregated through a voting system, and demonstrate that the reliability of this method outperforms processing the image by segmenting it into more conventional 10×10 nonoverlapping tiles. When used in this way, we demonstrate that our 10×10 pixel kernel is large enough for effective processing of full-field interferograms. Avoiding, thus, the need for substantially more formidable computational resources which otherwise would have been necessary for training a kernel of a significantly larger size. PMID:25321117

  15. Machine learning assembly landscapes from particle tracking data.

    PubMed

    Long, Andrew W; Zhang, Jie; Granick, Steve; Ferguson, Andrew L

    2015-11-01

    Bottom-up self-assembly offers a powerful route for the fabrication of novel structural and functional materials. Rational engineering of self-assembling systems requires understanding of the accessible aggregation states and the structural assembly pathways. In this work, we apply nonlinear machine learning to experimental particle tracking data to infer low-dimensional assembly landscapes mapping the morphology, stability, and assembly pathways of accessible aggregates as a function of experimental conditions. To the best of our knowledge, this represents the first time that collective order parameters and assembly landscapes have been inferred directly from experimental data. We apply this technique to the nonequilibrium self-assembly of metallodielectric Janus colloids in an oscillating electric field, and quantify the impact of field strength, oscillation frequency, and salt concentration on the dominant assembly pathways and terminal aggregates. This combined computational and experimental framework furnishes new understanding of self-assembling systems, and quantitatively informs rational engineering of experimental conditions to drive assembly along desired aggregation pathways. PMID:26338295

  16. Machine learning methods for quantitative analysis of Raman spectroscopy data

    NASA Astrophysics Data System (ADS)

    Madden, Michael G.; Ryder, Alan G.

    2003-03-01

    The automated identification and quantification of illicit materials using Raman spectroscopy is of significant importance for law enforcement agencies. This paper explores the use of Machine Learning (ML) methods in comparison with standard statistical regression techniques for developing automated identification methods. In this work, the ML task is broken into two sub-tasks, data reduction and prediction. In well-conditioned data, the number of samples should be much larger than the number of attributes per sample, to limit the degrees of freedom in predictive models. In this spectroscopy data, the opposite is normally true. Predictive models based on such data have a high number of degrees of freedom, which increases the risk of models over-fitting to the sample data and having poor predictive power. In the work described here, an approach to data reduction based on Genetic Algorithms is described. For the prediction sub-task, the objective is to estimate the concentration of a component in a mixture, based on its Raman spectrum and the known concentrations of previously seen mixtures. Here, Neural Networks and k-Nearest Neighbours are used for prediction. Preliminary results are presented for the problem of estimating the concentration of cocaine in solid mixtures, and compared with previously published results in which statistical analysis of the same dataset was performed. Finally, this paper demonstrates how more accurate results may be achieved by using an ensemble of prediction techniques.

  17. A machine learning approach to crater detection from topographic data

    NASA Astrophysics Data System (ADS)

    Di, Kaichang; Li, Wei; Yue, Zongyu; Sun, Yiwei; Liu, Yiliang

    2014-12-01

    Craters are distinctive features on the surfaces of most terrestrial planets. Craters reveal the relative ages of surface units and provide information on surface geology. Extracting craters is one of the fundamental tasks in planetary research. Although many automated crater detection algorithms have been developed to exact craters from image or topographic data, most of them are applicable only in particular regions, and only a few can be widely used, especially in complex surface settings. In this study, we present a machine learning approach to crater detection from topographic data. This approach includes two steps: detecting square regions which contain one crater with the use of a boosting algorithm and delineating the rims of the crater in each square region by local terrain analysis and circular Hough transform. A new variant of Haar-like features (scaled Haar-like features) is proposed and combined with traditional Haar-like features and local binary pattern features to enhance the performance of the classifier. Experimental results with the use of Mars topographic data demonstrate that the developed approach can significantly decrease the false positive detection rate while maintaining a relatively high true positive detection rate even in challenging sites.

  18. A machine learning approach for detecting cell phone usage

    NASA Astrophysics Data System (ADS)

    Xu, Beilei; Loce, Robert P.

    2015-03-01

    Cell phone usage while driving is common, but widely considered dangerous due to distraction to the driver. Because of the high number of accidents related to cell phone usage while driving, several states have enacted regulations that prohibit driver cell phone usage while driving. However, to enforce the regulation, current practice requires dispatching law enforcement officers at road side to visually examine incoming cars or having human operators manually examine image/video records to identify violators. Both of these practices are expensive, difficult, and ultimately ineffective. Therefore, there is a need for a semi-automatic or automatic solution to detect driver cell phone usage. In this paper, we propose a machine-learning-based method for detecting driver cell phone usage using a camera system directed at the vehicle's front windshield. The developed method consists of two stages: first, the frontal windshield region localization using the deformable part model (DPM), next, we utilize Fisher vectors (FV) representation to classify the driver's side of the windshield into cell phone usage violation and non-violation classes. The proposed method achieved about 95% accuracy with a data set of more than 100 images with drivers in a variety of challenging poses with or without cell phones.

  19. Characterization of decohering quantum systems: Machine learning approach

    NASA Astrophysics Data System (ADS)

    Stenberg, Markku P. V.; Köhn, Oliver; Wilhelm, Frank K.

    2016-01-01

    Adaptive data collection and analysis, where data are being fed back to update the measurement settings, can greatly increase speed, precision, and reliability of the characterization of quantum systems. However, decoherence tends to make adaptive characterization difficult. As an example, we consider two coupled discrete quantum systems. When one of the systems can be controlled and measured, the standard method to characterize another, with an unknown frequency ωr, is swap spectroscopy. Here, adapting measurements can provide estimates whose error decreases exponentially in the number of measurement shots rather than as a power law in conventional swap spectroscopy. However, when the decoherence time is so short that an excitation oscillating between the two systems can only undergo less than a few tens of vacuum Rabi oscillations, this approach can be marred by a severe limit on accuracy unless carefully designed. We adopt machine learning techniques to search for efficient policies for the characterization of decohering quantum systems. We find, for instance, that when the system undergoes more than 2 Rabi oscillations during its relaxation time T1, O (103) measurement shots are sufficient to reduce the squared error of the Bayesian initial prior of the unknown frequency ωr by a factor O (104) or larger. We also develop policies optimized for extreme initial parameter uncertainty and for the presence of imperfections in the readout.

  20. Machine learning and cosmological simulations - I. Semi-analytical models

    NASA Astrophysics Data System (ADS)

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2016-01-01

    We present a new exploratory framework to model galaxy formation and evolution in a hierarchical Universe by using machine learning (ML). Our motivations are two-fold: (1) presenting a new, promising technique to study galaxy formation, and (2) quantitatively analysing the extent of the influence of dark matter halo properties on galaxies in the backdrop of semi-analytical models (SAMs). We use the influential Millennium Simulation and the corresponding Munich SAM to train and test various sophisticated ML algorithms (k-Nearest Neighbors, decision trees, random forests, and extremely randomized trees). By using only essential dark matter halo physical properties for haloes of M > 1012 M⊙ and a partial merger tree, our model predicts the hot gas mass, cold gas mass, bulge mass, total stellar mass, black hole mass and cooling radius at z = 0 for each central galaxy in a dark matter halo for the Millennium run. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon SAMs and demonstrably place ML as a promising and a computationally efficient tool to study small-scale structure formation.

  1. Machine learning and cosmological simulations - II. Hydrodynamical simulations

    NASA Astrophysics Data System (ADS)

    Kamdar, Harshil M.; Turk, Matthew J.; Brunner, Robert J.

    2016-04-01

    We extend a machine learning (ML) framework presented previously to model galaxy formation and evolution in a hierarchical universe using N-body + hydrodynamical simulations. In this work, we show that ML is a promising technique to study galaxy formation in the backdrop of a hydrodynamical simulation. We use the Illustris simulation to train and test various sophisticated ML algorithms. By using only essential dark matter halo physical properties and no merger history, our model predicts the gas mass, stellar mass, black hole mass, star formation rate, g - r colour, and stellar metallicity fairly robustly. Our results provide a unique and powerful phenomenological framework to explore the galaxy-halo connection that is built upon a solid hydrodynamical simulation. The promising reproduction of the listed galaxy properties demonstrably place ML as a promising and a significantly more computationally efficient tool to study small-scale structure formation. We find that ML mimics a full-blown hydrodynamical simulation surprisingly well in a computation time of mere minutes. The population of galaxies simulated by ML, while not numerically identical to Illustris, is statistically robust and physically consistent with Illustris galaxies and follows the same fundamental observational constraints. ML offers an intriguing and promising technique to create quick mock galaxy catalogues in the future.

  2. Detecting Abbreviations in Discharge Summaries using Machine Learning Methods

    PubMed Central

    Wu, Yonghui; Rosenbloom, S. Trent; Denny, Joshua C.; Miller, Randolph A.; Mani, Subramani; Giuse, Dario A.; Xu, Hua

    2011-01-01

    Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%. PMID:22195219

  3. Machine learning of parameters for accurate semiempirical quantum chemical calculations

    DOE PAGESBeta

    Dral, Pavlo O.; von Lilienfeld, O. Anatole; Thiel, Walter

    2015-04-14

    We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempiricalmore » OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.« less

  4. Electronic spectra from TDDFT and machine learning in chemical space

    SciTech Connect

    Ramakrishnan, Raghunathan; Hartmann, Mia; Tapavicza, Enrico; Lilienfeld, O. Anatole von

    2015-08-28

    Due to its favorable computational efficiency, time-dependent (TD) density functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chemical space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of reference second-order approximate coupled-cluster (CC2) singles and doubles spectra from TDDFT counterparts, or even from DFT gap. We applied this approach to low-lying singlet-singlet vertical electronic spectra of over 20 000 synthetically feasible small organic molecules with up to eight CONF atoms. The prediction errors decay monotonously as a function of training set size. For a training set of 10 000 molecules, CC2 excitation energies can be reproduced to within ±0.1 eV for the remaining molecules. Analysis of our spectral database via chromophore counting suggests that even higher accuracies can be achieved. Based on the evidence collected, we discuss open challenges associated with data-driven modeling of high-lying spectra and transition intensities.

  5. Machine learning of parameters for accurate semiempirical quantum chemical calculations

    SciTech Connect

    Dral, Pavlo O.; von Lilienfeld, O. Anatole; Thiel, Walter

    2015-04-14

    We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempirical OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.

  6. MODIS Aerosol Optical Depth Bias Adjustment Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Albayrak, A.; Wei, J. C.; Petrenko, M.; Lary, D. J.; Leptoukh, G. G.

    2011-12-01

    Over the past decade, global aerosol observations have been conducted by space-borne sensors, airborne instruments, and ground-base network measurements. Unfortunately, quite often we encounter the differences of aerosol measurements by different well-calibrated instruments, even with a careful collocation in time and space. The differences might be rather substantial, and need to be better understood and accounted for when merging data from many sensors. The possible causes for these differences come from instrumental bias, different satellite viewing geometries, calibration issues, dynamically changing atmospheric and the surface conditions, and other "regressors", resulting in random and systematic errors in the final aerosol products. In this study, we will concentrate on the subject of removing biases and the systematic errors from MODIS (both Terra and Aqua) aerosol product, using Machine Learning algorithms. While we are assessing our regressors in our system when comparing global aerosol products, the Aerosol Robotic Network of sun-photometers (AERONET) will be used as a baseline for evaluating the MODIS aerosol products (Dark Target for land and ocean, and Deep Blue retrieval algorithms). The results of bias adjustment for MODIS Terra and Aqua are planned to be incorporated into the AeroStat Giovanni as part of the NASA ACCESS funded AeroStat project.

  7. Machine Learning Estimation of Atom Condensed Fukui Functions.

    PubMed

    Zhang, Qingyou; Zheng, Fangfang; Zhao, Tanfeng; Qu, Xiaohui; Aires-de-Sousa, João

    2016-02-01

    To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %. PMID:27491791

  8. A Machine Learning Approach for Accurate Annotation of Noncoding RNAs

    PubMed Central

    Liu, Chunmei; Wang, Zhi

    2016-01-01

    Searching genomes to locate noncoding RNA genes with known secondary structure is an important problem in bioinformatics. In general, the secondary structure of a searched noncoding RNA is defined with a structure model constructed from the structural alignment of a set of sequences from its family. Computing the optimal alignment between a sequence and a structure model is the core part of an algorithm that can search genomes for noncoding RNAs. In practice, a single structure model may not be sufficient to capture all crucial features important for a noncoding RNA family. In this paper, we develop a novel machine learning approach that can efficiently search genomes for noncoding RNAs with high accuracy. During the search procedure, a sequence segment in the searched genome sequence is processed and a feature vector is extracted to represent it. Based on the feature vector, a classifier is used to determine whether the sequence segment is the searched ncRNA or not. Our testing results show that this approach is able to efficiently capture crucial features of a noncoding RNA family. Compared with existing search tools, it significantly improves the accuracy of genome annotation. PMID:26357266

  9. Drug repositioning: a machine-learning approach through data integration.

    PubMed

    Napolitano, Francesco; Zhao, Yan; Moreira, Vânia M; Tagliaferri, Roberto; Kere, Juha; D'Amato, Mauro; Greco, Dario

    2013-01-01

    : Existing computational methods for drug repositioning either rely only on the gene expression response of cell lines after treatment, or on drug-to-disease relationships, merging several information levels. However, the noisy nature of the gene expression and the scarcity of genomic data for many diseases are important limitations to such approaches. Here we focused on a drug-centered approach by predicting the therapeutic class of FDA-approved compounds, not considering data concerning the diseases. We propose a novel computational approach to predict drug repositioning based on state-of-the-art machine-learning algorithms. We have integrated multiple layers of information: i) on the distances of the drugs based on how similar are their chemical structures, ii) on how close are their targets within the protein-protein interaction network, and iii) on how correlated are the gene expression patterns after treatment. Our classifier reaches high accuracy levels (78%), allowing us to re-interpret the top misclassifications as re-classifications, after rigorous statistical evaluation. Efficient drug repurposing has the potential to significantly impact the whole field of drug development. The results presented here can significantly accelerate the translation into the clinics of known compounds for novel therapeutic uses. PMID:23800010

  10. Drug repositioning: a machine-learning approach through data integration

    PubMed Central

    2013-01-01

    Existing computational methods for drug repositioning either rely only on the gene expression response of cell lines after treatment, or on drug-to-disease relationships, merging several information levels. However, the noisy nature of the gene expression and the scarcity of genomic data for many diseases are important limitations to such approaches. Here we focused on a drug-centered approach by predicting the therapeutic class of FDA-approved compounds, not considering data concerning the diseases. We propose a novel computational approach to predict drug repositioning based on state-of-the-art machine-learning algorithms. We have integrated multiple layers of information: i) on the distances of the drugs based on how similar are their chemical structures, ii) on how close are their targets within the protein-protein interaction network, and iii) on how correlated are the gene expression patterns after treatment. Our classifier reaches high accuracy levels (78%), allowing us to re-interpret the top misclassifications as re-classifications, after rigorous statistical evaluation. Efficient drug repurposing has the potential to significantly impact the whole field of drug development. The results presented here can significantly accelerate the translation into the clinics of known compounds for novel therapeutic uses. PMID:23800010

  11. INSIGHTS FROM MACHINE-LEARNED DIET SUCCESS PREDICTION.

    PubMed

    Weber, Ingmar; Achananuparp, Palakorn

    2016-01-01

    To support people trying to lose weight and stay healthy, more and more fitness apps have sprung up including the ability to track both calories intake and expenditure. Users of such apps are part of a wider "quantified self" movement and many opt-in to publicly share their logged data. In this paper, we use public food diaries of more than 4,000 long-term active MyFitnessPal users to study the characteristics of a (un-)successful diet. Concretely, we train a machine learning model to predict repeatedly being over or under self-set daily calories goals and then look at which features contribute to the model's prediction. Our findings include both expected results, such as the token "mcdonalds" or the category "dessert" being indicative for being over the calories goal, but also less obvious ones such as the difference between pork and poultry concerning dieting success, or the use of the "quick added calories" functionality being indicative of over-shooting calorie-wise. This study also hints at the feasibility of using such data for more in-depth data mining, e.g., looking at the interaction between consumed foods such as mixing protein- and carbohydrate-rich foods. To the best of our knowledge, this is the first systematic study of public food diaries. PMID:26776216

  12. Neural Network Machine Learning and Dimension Reduction for Data Visualization

    NASA Technical Reports Server (NTRS)

    Liles, Charles A.

    2014-01-01

    Neural network machine learning in computer science is a continuously developing field of study. Although neural network models have been developed which can accurately predict a numeric value or nominal classification, a general purpose method for constructing neural network architecture has yet to be developed. Computer scientists are often forced to rely on a trial-and-error process of developing and improving accurate neural network models. In many cases, models are constructed from a large number of input parameters. Understanding which input parameters have the greatest impact on the prediction of the model is often difficult to surmise, especially when the number of input variables is very high. This challenge is often labeled the "curse of dimensionality" in scientific fields. However, techniques exist for reducing the dimensionality of problems to just two dimensions. Once a problem's dimensions have been mapped to two dimensions, it can be easily plotted and understood by humans. The ability to visualize a multi-dimensional dataset can provide a means of identifying which input variables have the highest effect on determining a nominal or numeric output. Identifying these variables can provide a better means of training neural network models; models can be more easily and quickly trained using only input variables which appear to affect the outcome variable. The purpose of this project is to explore varying means of training neural networks and to utilize dimensional reduction for visualizing and understanding complex datasets.

  13. A novel virtual viewpoint merging method based on machine learning

    NASA Astrophysics Data System (ADS)

    Zheng, Di; Peng, Zongju; Wang, Hui; Jiang, Gangyi; Chen, Fen

    2014-11-01

    In multi-view video system, multiple video plus depth is main data format of 3D scene representation. Continuous virtual views can be generated by using depth image based rendering (DIBR) technique. DIBR process includes geometric mapping, hole filling and merging. Unique weights, inversely proportional to the distance between the virtual and real cameras, are used to merge the virtual views. However, the weights might not the optimal ones in terms of virtual view quality. In this paper, a novel virtual view merging algorithm is proposed. In the proposed algorithm, machine learning method is utilized to establish an optimal weight model. In the model, color, depth, color gradient and sequence parameters are taken into consideration. Firstly, we render the same virtual view from left and right views, and select the training samples by using a threshold. Then, the eigenvalues of the samples are extracted and the optimal merging weights are calculated as training labels. Finally, support vector classifier (SVC) is adopted to establish the model which is used for guiding virtual views rendering. Experimental results show that the proposed method can improve the quality of virtual views for most sequences. Especially, it is effective in the case of large distance between the virtual and real cameras. And compared to the original method of virtual view synthesis, the proposed method can obtain more than 0.1dB gain for some sequences.

  14. Protein sequence classification with improved extreme learning machine algorithms.

    PubMed

    Cao, Jiuwen; Xiong, Lianglin

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876

  15. Electronic spectra from TDDFT and machine learning in chemical space

    NASA Astrophysics Data System (ADS)

    Ramakrishnan, Raghunathan; Hartmann, Mia; Tapavicza, Enrico; von Lilienfeld, O. Anatole

    2015-08-01

    Due to its favorable computational efficiency, time-dependent (TD) density functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chemical space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of reference second-order approximate coupled-cluster (CC2) singles and doubles spectra from TDDFT counterparts, or even from DFT gap. We applied this approach to low-lying singlet-singlet vertical electronic spectra of over 20 000 synthetically feasible small organic molecules with up to eight CONF atoms. The prediction errors decay monotonously as a function of training set size. For a training set of 10 000 molecules, CC2 excitation energies can be reproduced to within ±0.1 eV for the remaining molecules. Analysis of our spectral database via chromophore counting suggests that even higher accuracies can be achieved. Based on the evidence collected, we discuss open challenges associated with data-driven modeling of high-lying spectra and transition intensities.

  16. Electronic spectra from TDDFT and machine learning in chemical space.

    PubMed

    Ramakrishnan, Raghunathan; Hartmann, Mia; Tapavicza, Enrico; von Lilienfeld, O Anatole

    2015-08-28

    Due to its favorable computational efficiency, time-dependent (TD) density functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chemical space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of reference second-order approximate coupled-cluster (CC2) singles and doubles spectra from TDDFT counterparts, or even from DFT gap. We applied this approach to low-lying singlet-singlet vertical electronic spectra of over 20 000 synthetically feasible small organic molecules with up to eight CONF atoms. The prediction errors decay monotonously as a function of training set size. For a training set of 10 000 molecules, CC2 excitation energies can be reproduced to within ±0.1 eV for the remaining molecules. Analysis of our spectral database via chromophore counting suggests that even higher accuracies can be achieved. Based on the evidence collected, we discuss open challenges associated with data-driven modeling of high-lying spectra and transition intensities. PMID:26328822

  17. A novel extreme learning machine for hypoglycemia detection.

    PubMed

    San, Phyo Phyo; Ling, Sai Ho; Soe, Ni Ni; Nguyen, Hung T

    2014-01-01

    Hypoglycemia is a common side-effect of insulin therapy for patients with type 1 diabetes mellitus (T1DM) and is the major limiting factor to maintain tight glycemic control. The deficiency in glucose counter-regulation may even lead to severe hypoglycaemia. It is always threatening to the well-being of patients with T1DM since more severe hypoglycemia leads to seizures or loss of consciousness and the possible development of permanent brain dysfunction under certain circumstances. Thus, an accurate early detection on hypoglycemia is an important research topic. With the use of new emerging technology, an extreme learning machine (ELM) based hypoglycemia detection system is developed to recognize the presence of hypoglycemic episodes. From a clinical study of 16 children with T1DM, natural occurrence of nocturnal hypoglycemic episodes are associated with increased heart rates (p <; 0.06) and increased corrected QT intervals (p <; 0.001). The overall data were organized into a training set with 8 patients (320 data points) and a testing set with 8 patients (269 data points). By using the ELM trained feed-forward neural network (ELM-FFNN), the testing sensitivity (true positive) and specificity (true negative) for detection of hypoglycemia is 78 and 60% respectability. PMID:25569957

  18. Distinguishing meanders of the Kuroshio using machine learning

    NASA Astrophysics Data System (ADS)

    Plotkin, David A.; Weare, Jonathan; Abbot, Dorian S.

    2014-10-01

    The Kuroshio south of Japan is often described as being bimodal, with abrupt transitions between a straight path state that stays near the coast (small meander) and a meandering state that deviates from the coast (large meander). Despite evidence of the existence of two or more states of the Kuroshio, previous data-driven studies have shown only high variability of the current; they have not, however, demonstrated bimodality in the sense of two states of relatively high probability separated by a region of relatively low probability. We use singular value decomposition (SVD), a standard time series analysis method for characterizing variability, and diffusion maps and spectral clustering (DMSC), a machine learning algorithm that seeks multimodality, to investigate Kuroshio reanalysis output. By applying these methods to a time series of velocity fields, we find that (1) the Kuroshio is bimodal, with high inflow and low path variability in the small meander and low inflow and high path variability in the large meander, (2) the state of the system correlates highly with the location of the recirculation gyre south of Japan, and (3) the meanders are better characterized by path variability than by mean path. Because these results are consistent with satellite sea surface height data, they are not an artifact of the model used for reanalysis. Further, our results provide evidence for a previously proposed transition mechanism based on the strengthening, migration, and weakening of the recirculation gyre south of Japan and can therefore help direct future modeling studies.

  19. Improved Automated Seismic Event Extraction Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Mackey, L.; Kleiner, A.; Jordan, M. I.

    2009-12-01

    Like many organizations engaged in seismic monitoring, the Preparatory Commission for the Comprehensive Test Ban Treaty Organization collects and processes seismic data from a large network of sensors. This data is continuously transmitted to a central data center, and bulletins of seismic events are automatically extracted. However, as for many such automated systems at present, the inaccuracy of this extraction necessitates substantial human analyst review effort. A significant opportunity for improvement thus lies in the fact that these systems currently fail to fully utilize the valuable repository of historical data provided by prior analyst reviews. In this work, we present the results of the application of machine learning approaches to several fundamental sub-tasks in seismic event extraction. These methods share as a common theme the use of historical analyst-reviewed bulletins as ground truth from which they extract relevant patterns to accomplish the desired goals. For instance, we demonstrate the effectiveness of classification and ranking methods for the identification of false events -- that is, those which will be invalidated and discarded by analysts -- in automated bulletins. We also show gains in the accuracy of seismic phase identification via the use of classification techniques to automatically assign seismic phase labels to station detections. Furthermore, we examine the potential of historical association data to inform the direct association of new signal detections with their corresponding seismic events. Empirical results are based upon parametric historical seismic detection and event data received from the Preparatory Commission for the Comprehensive Test Ban Treaty Organization.

  20. Using EHRs and Machine Learning for Heart Failure Survival Analysis

    PubMed Central

    Panahiazar, Maryam; Taslimitehrani, Vahid; Pereira, Naveen; Pathak, Jyotishman

    2016-01-01

    “Heart failure (HF) is a frequent health problem with high morbidity and mortality, increasing prevalence and escalating healthcare costs” [1]. By calculating a HF survival risk score based on patient-specific characteristics from Electronic Health Records (EHRs), we can identify high-risk patients and apply individualized treatment and healthy living choices to potentially reduce their mortality risk. The Seattle Heart Failure Model (SHFM) is one of the most popular models to calculate HF survival risk that uses multiple clinical variables to predict HF prognosis and also incorporates impact of HF therapy on patient outcomes. Although the SHFM has been validated across multiple cohorts [1–5], these studies were primarily done using clinical trials databases that do not reflect routine clinical care in the community. Further, the impact of contemporary therapeutic interventions, such as beta-blockers or defibrillators, was incorporated in SHFM by extrapolation from external trials. In this study, we assess the performance of SHFM using EHRs at Mayo Clinic, and sought to develop a risk prediction model using machine learning techiniques that applies routine clinical care data. Our results shows the models which were built using EHR data are more accurate (11% improvement in AUC) with the convenience of being more readily applicable in routine clinical care. Furthermore, we demonstrate that new predictive markers (such as co-morbidities) when incorporated into our models improve prognostic performance significantly (8% improvement in AUC). PMID:26262006

  1. Machine Learning in the Big Data Era: Are We There Yet?

    SciTech Connect

    Sukumar, Sreenivas Rangan

    2014-01-01

    In this paper, we discuss the machine learning challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are machine learning algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstanding challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security and healthcare to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.

  2. Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm

    PubMed Central

    Azé, Jérôme; Sola, Christophe; Zhang, Jian; Lafosse-Marin, Florian; Yasmin, Memona; Siddiqui, Rubina; Kremer, Kristin; van Soolingen, Dick; Refrégier, Guislaine

    2015-01-01

    Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface “TBminer.” Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first

  3. Recent advances in computer camera methods for machine vision

    NASA Astrophysics Data System (ADS)

    Olson, Gaylord G.; Walker, Jo N.

    1998-10-01

    During the past year, several new computer camera methods (hardware and software) have been developed which have applications in machine vision. These are described below, along with some test results. The improvements are generally in the direction of higher speed and greater parallelism. A PCI interface card has been designed which is adaptable to multiple CCD types, both color and monochrome. A newly designed A/D converter allows for a choice of 8 or 10-bit conversion resolution and a choice of two different analog inputs. Thus, by using four of these converters feeding the 32-bit PCI data bus, up to 8 camera heads can be used with a single PCI card, and four camera heads can be operated in parallel. The card has been designed so that any of 8 different CCD types can be used with it (6 monochrome and 2 color CCDs) ranging in resolution from 192 by 165 pixels up to 1134 by 972 pixels. In the area of software, a method has been developed to better utilize the decision-making capability of the computer along with the sub-array scan capabilities of many CCDs. Specifically, it is shown below how to achieve a dual scan mode camera system wherein one scan mode is a low density, high speed scan of a complete image area, and a higher density sub-array scan is used in those areas where changes have been observed. The name given to this technique is adaptive sub-array scanning.

  4. Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Tout, Hicham

    2013-01-01

    The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning…

  5. Position Paper: Applying Machine Learning to Software Analysis to Achieve Trusted, Repeatable Scientific Computing

    SciTech Connect

    Prowell, Stacy J; Symons, Christopher T

    2015-01-01

    Producing trusted results from high-performance codes is essential for policy and has significant economic impact. We propose combining rigorous analytical methods with machine learning techniques to achieve the goal of repeatable, trustworthy scientific computing.

  6. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. PMID:27049046

  7. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare.

    PubMed

    Mozaffari-Kermani, Mehran; Sur-Kolay, Susmita; Raghunathan, Anand; Jha, Niraj K

    2015-11-01

    Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness. PMID

  8. Automated Prediction of CMEs Using Machine Learning of CME - Flare Associations

    NASA Astrophysics Data System (ADS)

    Qahwaji, R.; Colak, T.; Al-Omari, M.; Ipson, S.

    2008-04-01

    Machine-learning algorithms are applied to explore the relation between significant flares and their associated CMEs. The NGDC flares catalogue and the SOHO/LASCO CME catalogue are processed to associate X and M-class flares with CMEs based on timing information. Automated systems are created to process and associate years of flare and CME data, which are later arranged in numerical-training vectors and fed to machine-learning algorithms to extract the embedded knowledge and provide learning rules that can be used for the automated prediction of CMEs. Properties representing the intensity, flare duration, and duration of decline and duration of growth are extracted from all the associated (A) and not-associated (NA) flares and converted to a numerical format that is suitable for machine-learning use. The machine-learning algorithms Cascade Correlation Neural Networks (CCNN) and Support Vector Machines (SVM) are used and compared in our work. The machine-learning systems predict, from the input of a flare’s properties, if the flare is likely to initiate a CME. Intensive experiments using Jack-knife techniques are carried out and the relationships between flare properties and CMEs are investigated using the results. The predictive performance of SVM and CCNN is analysed and recommendations for enhancing the performance are provided.

  9. Short-term wind speed predictions with machine learning techniques

    NASA Astrophysics Data System (ADS)

    Ghorbani, M. A.; Khatibi, R.; FazeliFard, M. H.; Naghipour, L.; Makarynskyy, O.

    2016-02-01

    Hourly wind speed forecasting is presented by a modeling study with possible applications to practical problems including farming wind energy, aircraft safety and airport operations. Modeling techniques employed in this paper for such short-term predictions are based on the machine learning techniques of artificial neural networks (ANNs) and genetic expression programming (GEP). Recorded values of wind speed were used, which comprised 8 years of collected data at the Kersey site, Colorado, USA. The January data over the first 7 years (2005-2011) were used for model training; and the January data for 2012 were used for model testing. A number of model structures were investigated for the validation of the robustness of these two techniques. The prediction results were compared with those of a multiple linear regression (MLR) method and with the Persistence method developed for the data. The model performances were evaluated using the correlation coefficient, root mean square error, Nash-Sutcliffe efficiency coefficient and Akaike information criterion. The results indicate that forecasting wind speed is feasible using past records of wind speed alone, but the maximum lead time for the data was found to be 14 h. The results show that different techniques would lead to different results, where the choice between them is not easy. Thus, decision making has to be informed of these modeling results and decisions should be arrived at on the basis of an understanding of inherent uncertainties. The results show that both GEP and ANN are equally credible selections and even MLR should not be dismissed, as it has its uses.

  10. Using machine learning for discovery in synoptic survey imaging data

    NASA Astrophysics Data System (ADS)

    Brink, Henrik; Richards, Joseph W.; Poznanski, Dovi; Bloom, Joshua S.; Rice, John; Negahban, Sahand; Wainwright, Martin

    2013-10-01

    Modern time-domain surveys continuously monitor large swaths of the sky to look for astronomical variability. Astrophysical discovery in such data sets is complicated by the fact that detections of real transient and variable sources are highly outnumbered by `bogus' detections caused by imperfect subtractions, atmospheric effects and detector artefacts. In this work, we present a machine-learning (ML) framework for discovery of variability in time-domain imaging surveys. Our ML methods provide probabilistic statements, in near real time, about the degree to which each newly observed source is an astrophysically relevant source of variable brightness. We provide details about each of the analysis steps involved, including compilation of the training and testing sets, construction of descriptive image-based and contextual features, and optimization of the feature subset and model tuning parameters. Using a validation set of nearly 30 000 objects from the Palomar Transient Factory, we demonstrate a missed detection rate of at most 7.7 per cent at our chosen false-positive rate of 1 per cent for an optimized ML classifier of 23 features, selected to avoid feature correlation and overfitting from an initial library of 42 attributes. Importantly, we show that our classification methodology is insensitive to mislabelled training data up to a contamination of nearly 10 per cent, making it easier to compile sufficient training sets for accurate performance in future surveys. This ML framework, if so adopted, should enable the maximization of scientific gain from future synoptic survey and enable fast follow-up decisions on the vast amounts of streaming data produced by such experiments.

  11. Machine Learning to Assess Grassland Productivity in Southeastern Arizona

    NASA Astrophysics Data System (ADS)

    Ponce-Campos, G. E.; Heilman, P.; Armendariz, G.; Moser, E.; Archer, V.; Vaughan, R.

    2015-12-01

    We present preliminary results of machine learning (ML) techniques modeling the combined effects of climate, management, and inherent potential on productivity of grazed semi-arid grasslands in southeastern Arizona. Our goal is to support public land managers determine if agency management policies are meeting objectives and where to focus attention. Monitoring in the field is becoming more and more limited in space and time. Remotely sensed data cover the entire allotments and go back in time, but do not consider the key issue of species composition. By estimating expected vegetative production as a function of site potential and climatic inputs, management skill can be assessed through time, across individual allotments, and between allotments. Here we present the use of Random Forest (RF) as the main ML technique, in this case for the purpose of regression. Our response variable is the maximum annual NDVI, a surrogate for grassland productivity, as generated by the Google Earth Engine cloud computing platform based on Landsat 5, 7, and 8 datasets. PRISM 33-year normal precipitation (1980-2013) was resampled to the Landsat scale. In addition, the GRIDMET climate dataset was the source for the calculation of the annual SPEI (Standardized Precipitation Evapotranspiration Index), a drought index. We also included information about landscape position, aspect, streams, ponds, roads and fire disturbances as part of the modeling process. Our results show that in terms of variable importance, the 33-year normal precipitation, along with SPEI, are the most important features affecting grasslands productivity within the study area. The RF approach was compared to a linear regression model with the same variables. The linear model resulted in an r2 = 0.41, whereas RF showed a significant improvement with an r2 = 0.79. We continue refining the model by comparison with aerial photography and to include grazing intensity and infrastructure from units/allotments to assess the

  12. Machine Learning Helps Identify CHRONO as a Circadian Clock Component

    PubMed Central

    Venkataraman, Anand; Ramanathan, Chidambaram; Kavakli, Ibrahim H.; Hughes, Michael E.; Baggs, Julie E.; Growe, Jacqueline; Liu, Andrew C.; Kim, Junhyong; Hogenesch, John B.

    2014-01-01

    Over the last decades, researchers have characterized a set of “clock genes” that drive daily rhythms in physiology and behavior. This arduous work has yielded results with far-reaching consequences in metabolic, psychiatric, and neoplastic disorders. Recent attempts to expand our understanding of circadian regulation have moved beyond the mutagenesis screens that identified the first clock components, employing higher throughput genomic and proteomic techniques. In order to further accelerate clock gene discovery, we utilized a computer-assisted approach to identify and prioritize candidate clock components. We used a simple form of probabilistic machine learning to integrate biologically relevant, genome-scale data and ranked genes on their similarity to known clock components. We then used a secondary experimental screen to characterize the top candidates. We found that several physically interact with known clock components in a mammalian two-hybrid screen and modulate in vitro cellular rhythms in an immortalized mouse fibroblast line (NIH 3T3). One candidate, Gene Model 129, interacts with BMAL1 and functionally represses the key driver of molecular rhythms, the BMAL1/CLOCK transcriptional complex. Given these results, we have renamed the gene CHRONO (computationally highlighted repressor of the network oscillator). Bi-molecular fluorescence complementation and co-immunoprecipitation demonstrate that CHRONO represses by abrogating the binding of BMAL1 to its transcriptional co-activator CBP. Most importantly, CHRONO knockout mice display a prolonged free-running circadian period similar to, or more drastic than, six other clock components. We conclude that CHRONO is a functional clock component providing a new layer of control on circadian molecular dynamics. PMID:24737000

  13. Machine Learning in Ionospheric Phenomena Detection Using Passive Radar

    NASA Astrophysics Data System (ADS)

    Pankratius, V.; Barari, S.; Lind, F. D.

    2015-12-01

    This work describes an approach to automate ionospheric feature detection in passive radar data using a tunable pipeline of Python-implemented algorithms for detection and classification. In particular, our detector is tuned to capture E-region irregularities and various other events such as meteors, aircraft, and ambiguities that result from poor transmission of signals or noise interference. The detection stage applies to passive radar images with pixels normalized to a defined value range. To separate the background, we apply a thresholding value and an area cuttoff to keep regions with connected pixels of a minimum size; for each particular image, these parameters can be determined algorithmically in two ways through our ExplainedEntropy (EE) and MaximumRegionArea (MRA) techniques. EE identifies the smallest set of regions that explain the most entropy of the image. MRA sets the area threshold to be a function of the largest region size. The classification stage picks up on these detected areas and applies neural networks and random forests to the image feature space. This way we are able categorize images based on their scientific content and make them searchable for scientists. A training set of real radar images was available to evaluate our approach and its adaptivity. Based on these labeled real images, we also evaluated the robustness of the detection with enhanced set of perturbed images that were generated through a model-based simulator. The simulator also allowed for controlled experiments in the amount of perturbation and noise added, to precisely characterize the operation ranges of our machine learning algorithms. We will discuss the performance of the algorithms and potential scientific applications. Acknowledgements. We would like to acknowledge support from the NSF ACI-1442997 (PI V. Pankratius).

  14. Machine learning helps identify CHRONO as a circadian clock component.

    PubMed

    Anafi, Ron C; Lee, Yool; Sato, Trey K; Venkataraman, Anand; Ramanathan, Chidambaram; Kavakli, Ibrahim H; Hughes, Michael E; Baggs, Julie E; Growe, Jacqueline; Liu, Andrew C; Kim, Junhyong; Hogenesch, John B

    2014-04-01

    Over the last decades, researchers have characterized a set of "clock genes" that drive daily rhythms in physiology and behavior. This arduous work has yielded results with far-reaching consequences in metabolic, psychiatric, and neoplastic disorders. Recent attempts to expand our understanding of circadian regulation have moved beyond the mutagenesis screens that identified the first clock components, employing higher throughput genomic and proteomic techniques. In order to further accelerate clock gene discovery, we utilized a computer-assisted approach to identify and prioritize candidate clock components. We used a simple form of probabilistic machine learning to integrate biologically relevant, genome-scale data and ranked genes on their similarity to known clock components. We then used a secondary experimental screen to characterize the top candidates. We found that several physically interact with known clock components in a mammalian two-hybrid screen and modulate in vitro cellular rhythms in an immortalized mouse fibroblast line (NIH 3T3). One candidate, Gene Model 129, interacts with BMAL1 and functionally represses the key driver of molecular rhythms, the BMAL1/CLOCK transcriptional complex. Given these results, we have renamed the gene CHRONO (computationally highlighted repressor of the network oscillator). Bi-molecular fluorescence complementation and co-immunoprecipitation demonstrate that CHRONO represses by abrogating the binding of BMAL1 to its transcriptional co-activator CBP. Most importantly, CHRONO knockout mice display a prolonged free-running circadian period similar to, or more drastic than, six other clock components. We conclude that CHRONO is a functional clock component providing a new layer of control on circadian molecular dynamics. PMID:24737000

  15. Relevance vector machine learning for neonate pain intensity assessment using digital imaging.

    PubMed

    Gholami, Behnood; Haddad, Wassim M; Tannenbaum, Allen R

    2010-06-01

    Pain assessment in patients who are unable to verbally communicate is a challenging problem. The fundamental limitations in pain assessment in neonates stem from subjective assessment criteria, rather than quantifiable and measurable data. This often results in poor quality and inconsistent treatment of patient pain management. Recent advancements in pattern recognition techniques using relevance vector machine (RVM) learning techniques can assist medical staff in assessing pain by constantly monitoring the patient and providing the clinician with quantifiable data for pain management. The RVM classification technique is a Bayesian extension of the support vector machine (SVM) algorithm, which achieves comparable performance to SVM while providing posterior probabilities for class memberships and a sparser model. If classes represent "pure" facial expressions (i.e., extreme expressions that an observer can identify with a high degree of confidence), then the posterior probability of the membership of some intermediate facial expression to a class can provide an estimate of the intensity of such an expression. In this paper, we use the RVM classification technique to distinguish pain from nonpain in neonates as well as assess their pain intensity levels. We also correlate our results with the pain intensity assessed by expert and nonexpert human examiners. PMID:20172803

  16. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction

    PubMed Central

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children’s social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a “mental model” of the robot, tailoring the tutoring to the robot’s performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot’s bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  17. Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction.

    PubMed

    de Greeff, Joachim; Belpaeme, Tony

    2015-01-01

    Social learning is a powerful method for cultural propagation of knowledge and skills relying on a complex interplay of learning strategies, social ecology and the human propensity for both learning and tutoring. Social learning has the potential to be an equally potent learning strategy for artificial systems and robots in specific. However, given the complexity and unstructured nature of social learning, implementing social machine learning proves to be a challenging problem. We study one particular aspect of social machine learning: that of offering social cues during the learning interaction. Specifically, we study whether people are sensitive to social cues offered by a learning robot, in a similar way to children's social bids for tutoring. We use a child-like social robot and a task in which the robot has to learn the meaning of words. For this a simple turn-based interaction is used, based on language games. Two conditions are tested: one in which the robot uses social means to invite a human teacher to provide information based on what the robot requires to fill gaps in its knowledge (i.e. expression of a learning preference); the other in which the robot does not provide social cues to communicate a learning preference. We observe that conveying a learning preference through the use of social cues results in better and faster learning by the robot. People also seem to form a "mental model" of the robot, tailoring the tutoring to the robot's performance as opposed to using simply random teaching. In addition, the social learning shows a clear gender effect with female participants being responsive to the robot's bids, while male teachers appear to be less receptive. This work shows how additional social cues in social machine learning can result in people offering better quality learning input to artificial systems, resulting in improved learning performance. PMID:26422143

  18. Classification of BMI control commands from rat's neural signals using extreme learning machine.

    PubMed

    Lee, Youngbum; Lee, Hyunjoo; Kim, Jinkwon; Shin, Hyung-Cheul; Lee, Myoungho

    2009-01-01

    A recently developed machine learning algorithm referred to as Extreme Learning Machine (ELM) was used to classify machine control commands out of time series of spike trains of ensembles of CA1 hippocampus neurons (n = 34) of a rat, which was performing a target-to-goal task on a two-dimensional space through a brain-machine interface system. Performance of ELM was analyzed in terms of training time and classification accuracy. The results showed that some processes such as class code prefix, redundancy code suffix and smoothing effect of the classifiers' outputs could improve the accuracy of classification of robot control commands for a brain-machine interface system. PMID:19860924

  19. The use of machine learning and nonlinear statistical tools for ADME prediction.

    PubMed

    Sakiyama, Yojiro

    2009-02-01

    Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future. PMID:19239395

  20. On-line Machine Learning and Event Detection in Petascale Data Streams

    NASA Astrophysics Data System (ADS)

    Thompson, David R.; Wagstaff, K. L.

    2012-01-01

    Traditional statistical data mining involves off-line analysis in which all data are available and equally accessible. However, petascale datasets have challenged this premise since it is often impossible to store, let alone analyze, the relevant observations. This has led the machine learning community to investigate adaptive processing chains where data mining is a continuous process. Here pattern recognition permits triage and followup decisions at multiple stages of a processing pipeline. Such techniques can also benefit new astronomical instruments such as the Large Synoptic Survey Telescope (LSST) and Square Kilometre Array (SKA) that will generate petascale data volumes. We summarize some machine learning perspectives on real time data mining, with representative cases of astronomical applications and event detection in high volume datastreams. The first is a "supervised classification" approach currently used for transient event detection at the Very Long Baseline Array (VLBA). It injects known signals of interest - faint single-pulse anomalies - and tunes system parameters to recover these events. This permits meaningful event detection for diverse instrument configurations and observing conditions whose noise cannot be well-characterized in advance. Second, "semi-supervised novelty detection" finds novel events based on statistical deviations from previous patterns. It detects outlier signals of interest while considering known examples of false alarm interference. Applied to data from the Parkes pulsar survey, the approach identifies anomalous "peryton" phenomena that do not match previous event models. Finally, we consider online light curve classification that can trigger adaptive followup measurements of candidate events. Classifier performance analyses suggest optimal survey strategies, and permit principled followup decisions from incomplete data. These examples trace a broad range of algorithm possibilities available for online astronomical data

  1. Investigating machine learning techniques for MRI-based classification of brain neoplasms

    PubMed Central

    Kanas, Vasileios G.; Davatzikos, Christos

    2015-01-01

    Purpose Diagnosis and characterization of brain neoplasms appears of utmost importance for therapeutic management. The emerging of imaging techniques, such as Magnetic Resonance (MR) imaging, gives insight into pathology, while the combination of several sequences from conventional and advanced protocols (such as perfusion imaging) increases the diagnostic information. To optimally combine the multiple sources and summarize the information into a distinctive set of variables however remains difficult. The purpose of this study is to investigate machine learning algorithms that automatically identify the relevant attributes and are optimal for brain tumor differentiation. Methods Different machine learning techniques are studied for brain tumor classification based on attributes extracted from conventional and perfusion MRI. The attributes, calculated from neoplastic, necrotic, and edematous regions of interest, include shape and intensity characteristics. Attributes subset selection is performed aiming to remove redundant attributes using two filtering methods and a wrapper approach, in combination with three different search algorithms (Best First, Greedy Stepwise and Scatter). The classification frameworks are implemented using the WEKA software. Results The highest average classification accuracy assessed by leave-one-out (LOO) cross-validation on 101 brain neoplasms was achieved using the wrapper evaluator in combination with the Best First search algorithm and the KNN classifier and reached 96.9% when discriminating metastases from gliomas and 94.5% when discriminating high-grade from low-grade neoplasms. Conclusions A computer-assisted classification framework is developed and used for differential diagnosis of brain neoplasms based on MRI. The framework can achieve higher accuracy than most reported studies using MRI. PMID:21516321

  2. Environmental Monitoring Networks Optimization Using Advanced Active Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Kanevski, Mikhail; Volpi, Michele; Copa, Loris

    2010-05-01

    The problem of environmental monitoring networks optimization (MNO) belongs to one of the basic and fundamental tasks in spatio-temporal data collection, analysis, and modeling. There are several approaches to this problem, which can be considered as a design or redesign of monitoring network by applying some optimization criteria. The most developed and widespread methods are based on geostatistics (family of kriging models, conditional stochastic simulations). In geostatistics the variance is mainly used as an optimization criterion which has some advantages and drawbacks. In the present research we study an application of advanced techniques following from the statistical learning theory (SLT) - support vector machines (SVM) and the optimization of monitoring networks when dealing with a classification problem (data are discrete values/classes: hydrogeological units, soil types, pollution decision levels, etc.) is considered. SVM is a universal nonlinear modeling tool for classification problems in high dimensional spaces. The SVM solution is maximizing the decision boundary between classes and has a good generalization property for noisy data. The sparse solution of SVM is based on support vectors - data which contribute to the solution with nonzero weights. Fundamentally the MNO for classification problems can be considered as a task of selecting new measurement points which increase the quality of spatial classification and reduce the testing error (error on new independent measurements). In SLT this is a typical problem of active learning - a selection of the new unlabelled points which efficiently reduce the testing error. A classical approach (margin sampling) to active learning is to sample the points closest to the classification boundary. This solution is suboptimal when points (or generally the dataset) are redundant for the same class. In the present research we propose and study two new advanced methods of active learning adapted to the solution of

  3. Machine Shop I. Learning Activity Packets (LAPs). Section A--Orientation.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This document contains two learning activity packets (LAPs) for the "orientation and safety" instructional area of a Machine Shop I course. The two LAPs cover the following topics: orientation and general shop safety. Each LAP contains a cover sheet that describes its purpose, an introduction, and the tasks included in the LAP; learning steps…

  4. Machine Shop I. Learning Activity Packets (LAPs). Section C--Hand and Bench Work.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This document contains two learning activity packets (LAPs) for the "hand and bench work" instructional area of a Machine Shop I course. The two LAPs cover the following topics: hand and bench work and pedestal grinder. Each LAP contains a cover sheet that describes its purpose, an introduction, and the tasks included in the LAP; learning steps…

  5. Collaborative Learning: Cognitive and Computational Approaches. Advances in Learning and Instruction Series.

    ERIC Educational Resources Information Center

    Dillenbourg, Pierre, Ed.

    Intended to illustrate the benefits of collaboration between scientists from psychology and computer science, namely machine learning, this book contains the following chapters, most of which are co-authored by scholars from both sides: (1) "Introduction: What Do You Mean by 'Collaborative Learning'?" (Pierre Dillenbourg); (2) "Learning Together:…

  6. Cross-person activity recognition using reduced kernel extreme learning machine.

    PubMed

    Deng, Wan-Yu; Zheng, Qing-Hua; Wang, Zhong-Min

    2014-05-01

    Activity recognition based on mobile embedded accelerometer is very important for developing human-centric pervasive applications such as healthcare, personalized recommendation and so on. However, the distribution of accelerometer data is heavily affected by varying users. The performance will degrade when the model trained on one person is used to others. To solve this problem, we propose a fast and accurate cross-person activity recognition model, known as TransRKELM (Transfer learning Reduced Kernel Extreme Learning Machine) which uses RKELM (Reduced Kernel Extreme Learning Machine) to realize initial activity recognition model. In the online phase OS-RKELM (Online Sequential Reduced Kernel Extreme Learning Machine) is applied to update the initial model and adapt the recognition model to new device users based on recognition results with high confidence level efficiently. Experimental results show that, the proposed model can adapt the classifier to new device users quickly and obtain good recognition performance. PMID:24513850

  7. Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems.

    PubMed

    Kuusisto, Finn; Dutra, Inês; Elezaby, Mai; Mendonça, Eneida A; Shavlik, Jude; Burnside, Elizabeth S

    2015-01-01

    While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy. PMID:26306246

  8. Machine learning applications in proteomics research: how the past can boost the future.

    PubMed

    Kelchtermans, Pieter; Bittremieux, Wout; De Grave, Kurt; Degroeve, Sven; Ramon, Jan; Laukens, Kris; Valkenborg, Dirk; Barsnes, Harald; Martens, Lennart

    2014-03-01

    Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis. PMID:24323524

  9. Uncertainty "escalation" and use of machine learning to forecast residual and data model uncertainties

    NASA Astrophysics Data System (ADS)

    Solomatine, Dimitri

    2016-04-01

    When speaking about model uncertainty many authors implicitly assume the data uncertainty (mainly in parameters or inputs) which is probabilistically described by distributions. Often however it is look also into the residual uncertainty as well. It is hence reasonable to classify the main approaches to uncertainty analysis with respect to the two main types of model uncertainty that can be distinguished: A. The residual uncertainty of models. In this case the model parameters and/or model inputs are considered to be fixed (deterministic), i.e. the model is considered to be optimal (calibrated) and deterministic. Model error is considered as the manifestation of uncertainty. If there is enough past data about the model errors (i.e. it uncertainty), it is possible to build a statistical or machine learning model of uncertainty trained on this data. The following methods can be mentioned: (a) quantile regression (QR) method by Koenker and Basset in which linear regression is used to build predictive models for distribution quantiles [1] (b) a more recent approach that takes into account the input variables influencing such uncertainty and uses more advanced machine learning (non-linear) methods (neural networks, model trees etc.) - the UNEEC method [2,3,7] (c) and even more recent DUBRAUE method (Dynamic Uncertainty Model By Regression on Absolute Error), a autoregressive model of model residuals (it corrects the model residual first and then carries out the uncertainty prediction by a autoregressive statistical model) [5] B. The data uncertainty (parametric and/or input) - in this case we study the propagation of uncertainty (presented typically probabilistically) from parameters or inputs to the model outputs. In case of simple functions representing models analytical approaches can be used, or approximation methods (e.g., first-order second moment method). However, for real complex non-linear models implemented in software there is no other choice except using

  10. Integrating data sources to improve hydraulic head predictions : a hierarchical machine learning approach.

    SciTech Connect

    Michael, W. J.; Minsker, B. S.; Tcheng, D.; Valocchi, A. J.; Quinn, J. J.; Environmental Assessment; Univ. of Illinois

    2005-03-26

    This study investigates how machine learning methods can be used to improve hydraulic head predictions by integrating different types of data, including data from numerical models, in a hierarchical approach. A suite of four machine learning methods (decision trees, instance-based weighting, inverse distance weighting, and neural networks) are tested in several hierarchical configurations with different types of data from the 317/319 area at Argonne National Laboratory-East. The best machine learning model had a mean predicted head error 50% smaller than an existing MODFLOW numerical flow model, and a standard deviation of predicted head error 67% lower than the MODFLOW model, computed across all sampled locations used for calibrating the MODFLOW model. These predictions were obtained using decision trees trained with all historical quarterly data; the hourly head measurements were not as useful for prediction, most likely because of their poor spatial coverage. The results show promise for using hierarchical machine learning approaches to improve predictions and to identify the most essential types of data to guide future sampling efforts. Decision trees were also combined with an existing MODFLOW model to test their capabilities for updating numerical models to improve predictions as new data are collected. The combined model had a mean error 50% lower than the MODFLOW model alone. These results demonstrate that hierarchical machine learning approaches can be used to improve predictive performance of existing numerical models in areas with good data coverage. Further research is needed to compare this approach with methods such as Kalman filtering.

  11. Application of Machine Learning to Proteomics Data: Classification and Biomarker Identification in Postgenomics Biology

    PubMed Central

    Swan, Anna Louise; Mobasheri, Ali; Allaway, David; Liddell, Susan

    2013-01-01

    Abstract Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes. PMID:24116388

  12. Automatic Quality Inspection of Percussion Cap Mass Production by Means of 3D Machine Vision and Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Tellaeche, A.; Arana, R.; Ibarguren, A.; Martínez-Otzeta, J. M.

    The exhaustive quality control is becoming very important in the world's globalized market. One of these examples where quality control becomes critical is the percussion cap mass production. These elements must achieve a minimum tolerance deviation in their fabrication. This paper outlines a machine vision development using a 3D camera for the inspection of the whole production of percussion caps. This system presents multiple problems, such as metallic reflections in the percussion caps, high speed movement of the system and mechanical errors and irregularities in percussion cap placement. Due to these problems, it is impossible to solve the problem by traditional image processing methods, and hence, machine learning algorithms have been tested to provide a feasible classification of the possible errors present in the percussion caps.

  13. Gaussian Process Regression as a machine learning tool for predicting organic carbon from soil spectra - a machine learning comparison study

    NASA Astrophysics Data System (ADS)

    Schmidt, Andreas; Lausch, Angela; Vogel, Hans-Jörg

    2016-04-01

    Diffuse reflectance spectroscopy as a soil analytical tool is spreading more and more. There is a wide range of possible applications ranging from the point scale (e.g. simple soil samples, drill cores, vertical profile scans) through the field scale to the regional and even global scale (UAV, airborne and space borne instruments, soil reflectance databases). The basic idea is that the soil's reflectance spectrum holds information about its properties (like organic matter content or mineral composition). The relation between soil properties and the observable spectrum is usually not exactly know and is typically derived from statistical methods. Nowadays these methods are classified in the term machine learning, which comprises a vast pool of algorithms and methods for learning the relationship between pairs if input - output data (training data set). Within this pool of methods a Gaussian Process Regression (GPR) is newly emerging method (originating from Bayesian statistics) which is increasingly applied to applications in different fields. For example, it was successfully used to predict vegetation parameters from hyperspectral remote sensing data. In this study we apply GPR to predict soil organic carbon from soil spectroscopy data (400 - 2500 nm). We compare it to more traditional and widely used methods such as Partitial Least Squares Regression (PLSR), Random Forest (RF) and Gradient Boosted Regression Trees (GBRT). All these methods have the common ability to calculate a measure for the variable importance (wavelengths importance). The main advantage of GPR is its ability to also predict the variance of the target parameter. This makes it easy to see whether a prediction is reliable or not. The ability to choose from various covariance functions makes GPR a flexible method. This allows for including different assumptions or a priori knowledge about the data. For this study we use samples from three different locations to test the prediction accuracies. One

  14. Demarcating Advanced Learning Approaches from Methodological and Technological Perspectives

    ERIC Educational Resources Information Center

    Horvath, Imre; Peck, David; Verlinden, Jouke

    2009-01-01

    In the field of design and engineering education, the fast and expansive evolution of information and communication technologies is steadily converting traditional learning approaches into more advanced ones. Facilitated by Broadband (high bandwidth) personal computers, distance learning has developed into web-hosted electronic learning. The…

  15. Lifelong Learning in Artistic Context Mediated by Advanced Technologies

    ERIC Educational Resources Information Center

    Ferrari, Mirella

    2016-01-01

    This research starts by analysing the current state of artistic heritage in Italy and studying some examples in Europe: we try to investigate the scope of non-formal learning in artistic context, mediated by advanced technology. The framework within which we have placed our investigation is that of lifelong learning and lifedeep learning. The…

  16. Collaborative Learning in Advanced Supply Systems: The KLASS Pilot Project.

    ERIC Educational Resources Information Center

    Rhodes, Ed; Carter, Ruth

    2003-01-01

    The Knowledge and Learning in Advanced Supply Systems (KLASS) project developed collaborative learning networks of suppliers in the British automotive and aerospace industries. Methods included face-to-face and distance learning, work toward National Vocational Qualifications, and diagnostic workshops for senior managers on improving quality,…

  17. TEACHING MACHINES AND PROGRAMMED LEARNING, A SOURCE BOOK.

    ERIC Educational Resources Information Center

    LUMSDAINE, A.A., ED.; GLASER, ROBERT, ED.

    BROUGHT TOGETHER HERE IS THE WIDELY-SCATTERED LITERATURE ON SELF-INSTRUCTIONAL PROGRAMS AND DEVICES BY LEADERS, PAST AND PRESENT, IN THEIR DEVELOPMENT. S.L. PRESSEY IN HIS ARTICLES DESCRIBES THE APPARATUS, METHODS, THEORY, AND RESULTS ATTENDANT UPON USE OF HIS TEST-SCORING DEVICES. B.F. SKINNER IN HIS ARTICLES DEVELOPS THEORY, DESCRIBES MACHINES,…

  18. Learning Simple Machines through Cross-Age Collaborations

    ERIC Educational Resources Information Center

    Lancor, Rachael; Schiebel, Amy

    2008-01-01

    In this project, introductory college physics students (noneducation majors) were asked to teach simple machines to a class of second graders. This nontraditional activity proved to be a successful way to encourage college students to think critically about physics and how it applied to their everyday lives. The noneducation majors benefited by…

  19. Slow Dynamics Due to Singularities of Hierarchical Learning Machines

    NASA Astrophysics Data System (ADS)

    Inoue, H. P. M.; Okada, M.

    Recently, slow dynamics in learning ofneural networks has been known to be closely related to singularities, which exist in parameter spaces of hierarchical learning models. To show the influence of singular structure on learning dynamics, we take statistical mechanical approaches and investigate online-learning dynamics under various learning scenario with different relationship between optimum and singularities. From the investigation, we found a quasi-plateau phenomenon which differs from the well known plateau. The quasi-plateau and plateau become extremely serious when an optimal point is in a neighborhood of a singularity. The quasi-plateau and plateau disappear in the natural gradient learning, which takes singular structures into account and uses Riemannian measure for the parameter space.

  20. Machine Learning for Power System Disturbance and Cyber-attack Discrimination

    SciTech Connect

    Borges, Raymond Charles; Beaver, Justin M; Buckner, Mark A; Morris, Thomas; Adhikari, Uttam; Pan, Shengyi

    2014-01-01

    Power system disturbances are inherently complex and can be attributed to a wide range of sources, including both natural and man-made events. Currently, the power system operators are heavily relied on to make decisions regarding the causes of experienced disturbances and the appropriate course of action as a response. In the case of cyber-attacks against a power system, human judgment is less certain since there is an overt attempt to disguise the attack and deceive the operators as to the true state of the system. To enable the human decision maker, we explore the viability of machine learning as a means for discriminating types of power system disturbances, and focus specifically on detecting cyber-attacks where deception is a core tenet of the event. We evaluate various machine learning methods as disturbance discriminators and discuss the practical implications for deploying machine learning systems as an enhancement to existing power system architectures.

  1. Recent progresses in the exploration of machine learning methods as in-silico ADME prediction tools.

    PubMed

    Tao, L; Zhang, P; Qin, C; Chen, S Y; Zhang, C; Chen, Z; Zhu, F; Yang, S Y; Wei, Y Q; Chen, Y Z

    2015-06-23

    In-silico methods have been explored as potential tools for assessing ADME and ADME regulatory properties particularly in early drug discovery stages. Machine learning methods, with their ability in classifying diverse structures and complex mechanisms, are well suited for predicting ADME and ADME regulatory properties. Recent efforts have been directed at the broadening of application scopes and the improvement of predictive performance with particular focuses on the coverage of ADME properties, and exploration of more diversified training data, appropriate molecular features, and consensus modeling. Moreover, several online machine learning ADME prediction servers have emerged. Here we review these progresses and discuss the performances, application prospects and challenges of exploring machine learning methods as useful tools in predicting ADME and ADME regulatory properties. PMID:26037068

  2. Cross-platform normalization of microarray and RNA-seq data for machine learning applications

    PubMed Central

    Thompson, Jeffrey A.; Tan, Jie

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  3. Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

    PubMed

    Thompson, Jeffrey A; Tan, Jie; Greene, Casey S

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  4. Combining Human and Machine Learning for Morphological Analysis of Galaxy Images

    NASA Astrophysics Data System (ADS)

    Kuminski, Evan; George, Joe; Wallin, John; Shamir, Lior

    2014-10-01

    The increasing importance of digital sky surveys collecting many millions of galaxy images has reinforced the need for robust methods that can perform morphological analysis of large galaxy image databases. Citizen science initiatives such as Galaxy Zoo showed that large data sets of galaxy images can be analyzed effectively by nonscientist volunteers, but since databases generated by robotic telescopes grow much faster than the processing power of any group of citizen scientists, it is clear that computer analysis is required. Here, we propose to use citizen science data for training machine learning systems, and show experimental results demonstrating that machine learning systems can be trained with citizen science data. Our findings show that the performance of machine learning depends on the quality of the data, which can be improved by using samples that have a high degree of agreement between the citizen scientists. The source code of the method is publicly available.

  5. Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

    PubMed Central

    2011-01-01

    Background Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures. PMID:21982277

  6. Machine-learning to characterise neonatal functional connectivity in the preterm brain

    PubMed Central

    Ball, G.; Aljabar, P.; Arichi, T.; Tusor, N.; Cox, D.; Merchant, N.; Nongena, P.; Hajnal, J.V.; Edwards, A.D.; Counsell, S.J.

    2016-01-01

    Brain development is adversely affected by preterm birth. Magnetic resonance image analysis has revealed a complex fusion of structural alterations across all tissue compartments that are apparent by term-equivalent age, persistent into adolescence and adulthood, and associated with wide-ranging neurodevelopment disorders. Although functional MRI has revealed the relatively advanced organisational state of the neonatal brain, the full extent and nature of functional disruptions following preterm birth remain unclear. In this study, we apply machine-learning methods to compare whole-brain functional connectivity in preterm infants at term-equivalent age and healthy term-born neonates in order to test the hypothesis that preterm birth results in specific alterations to functional connectivity by term-equivalent age. Functional connectivity networks were estimated in 105 preterm infants and 26 term controls using group-independent component analysis and a graphical lasso model. A random forest–based feature selection method was used to identify discriminative edges within each network and a nonlinear support vector machine was used to classify subjects based on functional connectivity alone. We achieved 80% cross-validated classification accuracy informed by a small set of discriminative edges. These edges connected a number of functional nodes in subcortical and cortical grey matter, and most were stronger in term neonates compared to those born preterm. Half of the discriminative edges connected one or more nodes within the basal ganglia. These results demonstrate that functional connectivity in the preterm brain is significantly altered by term-equivalent age, confirming previous reports of altered connectivity between subcortical structures and higher-level association cortex following preterm birth. PMID:26341027

  7. Machine learning for many-body physics: The case of the Anderson impurity model

    NASA Astrophysics Data System (ADS)

    Arsenault, Louis-François; Lopez-Bezanilla, Alejandro; von Lilienfeld, O. Anatole; Millis, Andrew J.

    2014-10-01

    Machine learning methods are applied to finding the Green's function of the Anderson impurity model, a basic model system of quantum many-body condensed-matter physics. Different methods of parametrizing the Green's function are investigated; a representation in terms of Legendre polynomials is found to be superior due to its limited number of coefficients and its applicability to state of the art methods of solution. The dependence of the errors on the size of the training set is determined. The results indicate that a machine learning approach to dynamical mean-field theory may be feasible.

  8. Machine Learning Approaches for High-resolution Urban Land Cover Classification: A Comparative Study

    SciTech Connect

    Vatsavai, Raju; Chandola, Varun; Cheriyadat, Anil M; Bright, Eddie A; Bhaduri, Budhendra L; Graesser, Jordan B

    2011-01-01

    The proliferation of several machine learning approaches makes it difficult to identify a suitable classification technique for analyzing high-resolution remote sensing images. In this study, ten classification techniques were compared from five broad machine learning categories. Surprisingly, the performance of simple statistical classification schemes like maximum likelihood and Logistic regression over complex and recent techniques is very close. Given that these two classifiers require little input from the user, they should still be considered for most classification tasks. Multiple classifier systems is a good choice if the resources permit.

  9. Uncertainty "escalation" and use of machine learning to forecast residual and data model uncertainties

    NASA Astrophysics Data System (ADS)

    Solomatine, Dimitri

    2016-04-01

    When speaking about model uncertainty many authors implicitly assume the data uncertainty (mainly in parameters or inputs) which is probabilistically described by distributions. Often however it is look also into the residual uncertainty as well. It is hence reasonable to classify the main approaches to uncertainty analysis with respect to the two main types of model uncertainty that can be distinguished: A. The residual uncertainty of models. In this case the model parameters and/or model inputs are considered to be fixed (deterministic), i.e. the model is considered to be optimal (calibrated) and deterministic. Model error is considered as the manifestation of uncertainty. If there is enough past data about the model errors (i.e. it uncertainty), it is possible to build a statistical or machine learning model of uncertainty trained on this data. The following methods can be mentioned: (a) quantile regression (QR) method by Koenker and Basset in which linear regression is used to build predictive models for distribution quantiles [1] (b) a more recent approach that takes into account the input variables influencing such uncertainty and uses more advanced machine learning (non-linear) methods (neural networks, model trees etc.) - the UNEEC method [2,3,7] (c) and even more recent DUBRAUE method (Dynamic Uncertainty Model By Regression on Absolute Error), a autoregressive model of model residuals (it corrects the model residual first and then carries out the uncertainty prediction by a autoregressive statistical model) [5] B. The data uncertainty (parametric and/or input) - in this case we study the propagation of uncertainty (presented typically probabilistically) from parameters or inputs to the model outputs. In case of simple functions representing models analytical approaches can be used, or approximation methods (e.g., first-order second moment method). However, for real complex non-linear models implemented in software there is no other choice except using

  10. Human and machine learning in non-Markovian decision making.

    PubMed

    Clarke, Aaron Michael; Friedrich, Johannes; Tartaglia, Elisa M; Marchesotti, Silvia; Senn, Walter; Herzog, Michael H

    2015-01-01

    Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), where a series of rewarded decisions must be made, is a particularly important type of learning. Computational and behavioral studies of RL have focused mainly on Markovian decision processes, where the next state depends on only the current state and action. Little is known about non-Markovian decision making, where the next state depends on more than the current state and action. Learning is non-Markovian, for example, when there is no unique mapping between actions and feedback. We have produced a model based on spiking neurons that can handle these non-Markovian conditions by performing policy gradient descent [1]. Here, we examine the model's performance and compare it with human learning and a Bayes optimal reference, which provides an upper-bound on performance. We find that in all cases, our population of spiking neurons model well-describes human performance. PMID:25898139

  11. Human and Machine Learning in Non-Markovian Decision Making

    PubMed Central

    Clarke, Aaron Michael; Friedrich, Johannes; Tartaglia, Elisa M.; Marchesotti, Silvia; Senn, Walter; Herzog, Michael H.

    2015-01-01

    Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), where a series of rewarded decisions must be made, is a particularly important type of learning. Computational and behavioral studies of RL have focused mainly on Markovian decision processes, where the next state depends on only the current state and action. Little is known about non-Markovian decision making, where the next state depends on more than the current state and action. Learning is non-Markovian, for example, when there is no unique mapping between actions and feedback. We have produced a model based on spiking neurons that can handle these non-Markovian conditions by performing policy gradient descent [1]. Here, we examine the model’s performance and compare it with human learning and a Bayes optimal reference, which provides an upper-bound on performance. We find that in all cases, our population of spiking neurons model well-describes human performance. PMID:25898139

  12. DOE FreedomCAR and vehicle technologies program advanced power electronic and electrical machines annual review report

    SciTech Connect

    Olszewski, Mitch

    2006-10-11

    This report is a summary of the Review Panel at the FY06 DOE FreedomCAR and Vehicle Technologies (FCVT) Annual Review of Advanced Power Electronics and Electric Machine (APEEM) research activities held on August 15-17, 2006.

  13. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W. . Dept. of Computer Sciences); Noordewier, M.O. . Dept. of Computer Science)

    1992-01-01

    We are primarily developing a machine teaming (ML) system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being teamed. Using this information, our teaming algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, our KBANN algorithm maps inference rules about a given recognition task into a neural network. Neural network training techniques then use the training examples to refine these inference rules. We call these rules a domain theory, following the convention in the machine teaming community. We have been applying this approach to several problems in DNA sequence analysis. In addition, we have been extending the capabilities of our teaming system along several dimensions. We have also been investigating parallel algorithms that perform sequence alignments in the presence of frameshift errors.

  14. Classification of hydration status using electrocardiogram and machine learning

    NASA Astrophysics Data System (ADS)

    Kaveh, Anthony; Chung, Wayne

    2013-10-01

    The electrocardiogram (ECG) has been used extensively in clinical practice for decades to non-invasively characterize the health of heart tissue; however, these techniques are limited to time domain features. We propose a machine classification system using support vector machines (SVM) that uses temporal and spectral information to classify health state beyond cardiac arrhythmias. Our method uses single lead ECG to classify volume depletion (or dehydration) without the lengthy and costly blood analysis tests traditionally used for detecting dehydration status. Our method builds on established clinical ECG criteria for identifying electrolyte imbalances and lends to automated, computationally efficient implementation. The method was tested on the MIT-BIH PhysioNet database to validate this purely computational method for expedient disease-state classification. The results show high sensitivity, supporting use as a cost- and time-effective screening tool.

  15. Reduction of false positives by machine learning for computer-aided detection of colonic polyps

    NASA Astrophysics Data System (ADS)

    Zhao, Xin; Wang, Su; Zhu, Hongbin; Liang, Zhengrong

    2009-02-01

    With the development of computer-aided detection of polyps (CADpolyp), various features have been extracted to detect the initial polyp candidates (IPCs). In this paper, three approaches were utilized to reduce the number of false positives (FPs): the multiply linear regression (MLR) and two modified machine learning methods, i.e., neural network (NN) and support vector machine (SVM), based on their own characteristics and specific learning purposes. Compared to MLR, the two modified machine learning methods are much more sophisticated and well-adapted to the data provided. To achieve the optimal sensitivity and specificity, raw features were pre-processed by the principle component analysis (PCA) in the hope of removing the second-order statistical correlation prior to any learning actions. The gain by the use of PCA was evidenced by the collected 26 patient studies, which included 32 colonic polyps confirmed by both optical colonoscopy (OC) and virtual colonoscopy (VC). The learning and testing results showed that the two modified machine-learning methods can reduce the number of FPs by 48.9% (or 7.2 FPs per patient) and 45.3% (or 7.7 FPs per patient) respectively, at 100% detection sensitivity in comparison with that of traditional MLR method. Generally, more than necessary number of features were stacked as input vectors to machine learning algorithms, dimensionality reduction for a more compact feature combination, i.e., how to determine the remaining dimensionality via PCA linear transform was considered and discussed in this paper. In addition, we proposed a new PCA-scaled data pre-processing method to help reduce the FPs significantly. Finally, fROC (free-response receiver operating characteristic) curves corresponding to three FP-reduction approaches were acquired, and comparative analysis was conducted.

  16. Deep assessment of machine learning techniques using patient treatment in acute abdominal pain in children.

    PubMed

    Blazadonakis, M; Moustakis, V; Charissis, G

    1996-11-01

    Learning from patient records may aid knowledge acquisition and decision making. Existing inductive machine learning (ML) systems such us NewId, CN2, C4.5 and AQ15 learn from past case histories using symbolic and/or numeric values. These systems learn symbolic rules (IF... THEN like) which link an antecedent set of clinical factors to a consequent class or decision. This paper compares the learning performance of alternative ML systems with each other and with respect to a novel approach using logic minimization, called LML, to learn from data. Patient cases were taken from the archives of the Paediatric Surgery Clinic of the University Hospital of Crete, Heraklion, Greece. Comparison of ML system performance is based both on classification accuracy and on informal expert assessment of learned knowledge. PMID:8985539

  17. Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

    NASA Astrophysics Data System (ADS)

    Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin

    2010-12-01

    We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.

  18. A machine learning approach to improve contactless heart rate monitoring using a webcam.

    PubMed

    Monkaresi, Hamed; Calvo, Rafael A; Yan, Hong

    2014-07-01

    Unobtrusive, contactless recordings of physiological signals are very important for many health and human-computer interaction applications. Most current systems require sensors which intrusively touch the user's skin. Recent advances in contact-free physiological signals open the door to many new types of applications. This technology promises to measure heart rate (HR) and respiration using video only. The effectiveness of this technology, its limitations, and ways of overcoming them deserves particular attention. In this paper, we evaluate this technique for measuring HR in a controlled situation, in a naturalistic computer interaction session, and in an exercise situation. For comparison, HR was measured simultaneously using an electrocardiography device during all sessions. The results replicated the published results in controlled situations, but show that they cannot yet be considered as a valid measure of HR in naturalistic human-computer interaction. We propose a machine learning approach to improve the accuracy of HR detection in naturalistic measurements. The results demonstrate that the root mean squared error is reduced from 43.76 to 3.64 beats/min using the proposed method. PMID:25014930

  19. An experimental result of estimating an application volume by machine learning techniques.

    PubMed

    Hasegawa, Tatsuhito; Koshino, Makoto; Kimura, Haruhiko

    2015-01-01

    In this study, we improved the usability of smartphones by automating a user's operations. We developed an intelligent system using machine learning techniques that periodically detects a user's context on a smartphone. We selected the Android operating system because it has the largest market share and highest flexibility of its development environment. In this paper, we describe an application that automatically adjusts application volume. Adjusting the volume can be easily forgotten because users need to push the volume buttons to alter the volume depending on the given situation. Therefore, we developed an application that automatically adjusts the volume based on learned user settings. Application volume can be set differently from ringtone volume on Android devices, and these volume settings are associated with each specific application including games. Our application records a user's location, the volume setting, the foreground application name and other such attributes as learning data, thereby estimating whether the volume should be adjusted using machine learning techniques via Weka. PMID:25713755

  20. Comparison of Two Machine Learning Regression Approaches (Multivariate Relevance Vector Machine and Artificial Neural Network) Coupled with Wavelet Decomposition to Forecast Monthly Streamflow in Peru

    NASA Astrophysics Data System (ADS)

    Ticlavilca, A. M.; Maslova, I.; McKee, M.

    2011-12-01

    This research presents a modeling approach that incorporates wavelet-based analysis techniques used in statistical signal processing and multivariate machine learning regression to forecast monthly streamflow in Peru. Two machine learning regression approaches, Multivariate Relevance Vector Machine and Artificial Neural Network, are compared in terms of performance and robustness. The inputs of the model utilize information of streamflow and Pacific sea surface temperature (SST). The monthly Pacific SST data (from 1950 to 2010) are obtained from the NOAA Climate Prediction Center website. The inputs are decomposed into meaningful components formulated in terms of wavelet multiresolution analysis (MRA). The outputs are the forecasts of streamflow two, four and six months ahead simultaneously. The proposed hybrid modeling approach of wavelet decomposition and machine learning regression can capture sufficient information at meaningful temporal scales to improve the performance of the streamflow forecasts in Peru. A bootstrap analysis is used to explore the robustness of the hybrid modeling approaches.

  1. Solar Flare Predictions Using Time Series of SDO/HMI Observations and Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Ilonidis, Stathis; Bobra, Monica; Couvidat, Sebastien

    2015-08-01

    Solar active regions are dynamic systems that can rapidly evolve in time and produce flare eruptions. The temporal evolution of an active region can provide important information about its potential to produce major flares. In this study, we build a flare forecasting model using supervised machine learning methods and time series of SDO/HMI data for all the flaring regions with magnitude M1.0 or higher that have been observed with HMI and several thousand non-flaring regions. We define and compute hundreds of features that characterize the temporal evolution of physical properties related to the size, non-potentiality, and complexity of the active region, as well as its flaring history, for several days before the flare eruption. Using these features, we implement and test the performance of several machine learning algorithms, including support vector machines, neural networks, decision trees, discriminant analysis, and others. We also apply feature selection algorithms that aim to discard features with low predictive power and improve the performance of the machine learning methods. Our results show that support vector machines provide the best forecasts for the next 24 hours, achieving a True Skill Statistic of 0.923, an accuracy of 0.985, and a Heidke skill score of 0.861, which improve the scores obtained by Bobra and Couvidat (2015). The results of this study contribute to the development of a more reliable and fully automated data-driven flare forecasting system.

  2. Machine learning algorithms for damage detection: Kernel-based approaches

    NASA Astrophysics Data System (ADS)

    Santos, Adam; Figueiredo, Eloi; Silva, M. F. M.; Sales, C. S.; Costa, J. C. W. A.

    2016-02-01

    This paper presents four kernel-based algorithms for damage detection under varying operational and environmental conditions, namely based on one-class support vector machine, support vector data description, kernel principal component analysis and greedy kernel principal component analysis. Acceleration time-series from an array of accelerometers were obtained from a laboratory structure and used for performance comparison. The main contribution of this study is the applicability of the proposed algorithms for damage detection as well as the comparison of the classification performance between these algorithms and other four ones already considered as reliable approaches in the literature. All proposed algorithms revealed to have better classification performance than the previous ones.

  3. Active extreme learning machines for quad-polarimetric SAR imagery classification

    NASA Astrophysics Data System (ADS)

    Samat, Alim; Gamba, Paolo; Du, Peijun; Luo, Jieqiong

    2015-03-01

    Supervised classification of quad-polarimetric SAR images is often constrained by the availability of reliable training samples. Active learning (AL) provides a unique capability at selecting samples with high representation quality and low redundancy. The most important part of AL is the criterion for selecting the most informative candidates (pixels) by ranking. In this paper, class supports based on the posterior probability function are approximated by ensemble learning and majority voting. This approximation is statistically meaningful when a large enough classifier ensemble is exploited. In this work, we propose to use extreme learning machines and apply AL to quad-polarimetric SAR image classification. Extreme learning machines are ideal because of their fast operation, straightforward solution and strong generalization. As inputs to the so-called active extreme learning machines, both polarimetric and spatial features (morphological profiles) are considered. In order to validate the proposed method, results and performance are compared with random sampling and state-of-the-art AL methods, such as margin sampling, normalized entropy query-by-bagging and multiclass level uncertainty. Experimental results for four quad-polarimetric SAR images collected by RADARSAT-2, AirSAR and EMISAR indicate that the proposed method achieves promising results in different scenarios. Moreover, the proposed method is faster than existing techniques in both the learning and the classification phases.

  4. Drought monitoring using downscaled soil moisture through machine learning approaches over North and South Korea

    NASA Astrophysics Data System (ADS)

    Park, S.; Im, J.; Rhee, J.; Park, S.

    2015-12-01

    Soil moisture is one of the most important key variables for drought monitoring. It reflects hydrological and agricultural processes because soil moisture is a function of precipitation and energy flux and crop yield is highly related to soil moisture. Many satellites including Advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR-E), Soil Moisture and Ocean Salinity sensor (SMOS), and Soil Moisture Active Passive (SMAP) provide global scale soil moisture products through microwave sensors. However, as the spatial resolution of soil moisture products is typically tens of kilometers, it is difficult to monitor drought using soil moisture at local or regional scale. In this study, AMSR-E and AMSR2 soil moisture were downscaled up to 1 km spatial resolution using Moderate Resolution Imaging Spectroradiometer (MODIS) data—Evapotranspiration, Land Surface Temperature, Leaf Area Index, Normalized Difference Vegetation Index, Enhanced Vegetation Index and Albedo—through machine learning approaches over Korean peninsula. To monitor drought from 2003 to 2014, each pixel of the downscaled soil moisture was scaled from 0 to 1 (1 is the wettest and 0 is the driest). The soil moisture based drought maps were validated using Standardized Precipitation Index (SPI) and crop yield data. Spatial distribution of drought status was also compared with other drought indices such as Scaled Drought Condition Index (SDCI). Machine learning approaches were performed well (R=0.905) for downscaling. Downscaled soil moisture was validated using in situ Asia flux data. The Root Mean Square Errors (RMSE) improved from 0.172 (25 km AMSR2) to 0.065 (downscaled soil moisture). The correlation coefficients improved from 0.201 (25 km AMSR2) to 0.341 (downscaled soil moisture). The soil moisture based drought maps and SDCI showed similar spatial distribution that caught both extreme drought and no drought. Since the proposed drought monitoring approach based on the downscaled

  5. A new machine learning algorithm for removal of salt and pepper noise

    NASA Astrophysics Data System (ADS)

    Wang, Yi; Adhami, Reza; Fu, Jian

    2015-07-01

    Supervised machine learning algorithm has been extensively studied and applied to different fields of image processing in past decades. This paper proposes a new machine learning algorithm, called margin setting (MS), for restoring images that are corrupted by salt and pepper impulse noise. Margin setting generates decision surface to classify the noise pixels and non-noise pixels. After the noise pixels are detected, a modified ranked order mean (ROM) filter is used to replace the corrupted pixels for images reconstruction. Margin setting algorithm is tested with grayscale and color images for different noise densities. The experimental results are compared with those of the support vector machine (SVM) and standard median filter (SMF). The results show that margin setting outperforms these methods with higher Peak Signal-to-Noise Ratio (PSNR), lower mean square error (MSE), higher image enhancement factor (IEF) and higher Structural Similarity Index (SSIM).

  6. A software framework for building biomedical machine learning classifiers through grid computing resources.

    PubMed

    Ramos-Pollán, Raúl; Guevara-López, Miguel Angel; Oliveira, Eugénio

    2012-08-01

    This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation. PMID:21479625

  7. Machine learning classification of resting state functional connectivity predicts smoking status

    PubMed Central

    Pariyadath, Vani; Stein, Elliot A.; Ross, Thomas J.

    2014-01-01

    Machine learning-based approaches are now able to examine functional magnetic resonance imaging data in a multivariate manner and extract features predictive of group membership. We applied support vector machine (SVM)-based classification to resting state functional connectivity (rsFC) data from nicotine-dependent smokers and healthy controls to identify brain-based features predictive of nicotine dependence. By employing a network-centered approach, we observed that within-network functional connectivity measures offered maximal information for predicting smoking status, as opposed to between-network connectivity, or the representativeness of each individual node with respect to its parent network. Further, our analysis suggests that connectivity measures within the executive control and frontoparietal networks are particularly informative in predicting smoking status. Our findings suggest that machine learning-based approaches to classifying rsFC data offer a valuable alternative technique to understanding large-scale differences in addiction-related neurobiology. PMID:24982629

  8. Machine learning deconvolution filter kernels for image restoration

    NASA Astrophysics Data System (ADS)

    Mainali, Pradip; Wittebrood, Rimmert

    2015-03-01

    In this paper, we propose a novel algorithm to recover a sharp image from its corrupted form by deconvolution. The algorithm learns the deconvolution process. This is achieved by learning the deconvolution filter kernels for the set of learnt basic pixel patterns. The algorithm consists of the offline learning and online filtering stages. In the one-time offline learning stage, the algorithm learns the dictionary of various local characteristics of the pixel patch as the basic pixel patterns from a huge number of natural images in the training database. Later, the deconvolution filter coefficients for each pixel pattern is optimized by using the source and the corrupted image pairs in the training database. In the online stage, the algorithm only needs to find the nearest matching pixel pattern in the dictionary for each pixel and filter it using the filter optimized for the corresponding pixel pattern. Experimental results on natural images show that our method achieves the state-of-art result on an image deblurring. The proposed approach can be applied to recover a sharp image for applications such as camera, HD/UHD TV, document scanning systems etc.

  9. Unsupervised nonlinear dimensionality reduction machine learning methods applied to multiparametric MRI in cerebral ischemia: preliminary results

    NASA Astrophysics Data System (ADS)

    Parekh, Vishwa S.; Jacobs, Jeremy R.; Jacobs, Michael A.

    2014-03-01

    The evaluation and treatment of acute cerebral ischemia requires a technique that can determine the total area of tissue at risk for infarction using diagnostic magnetic resonance imaging (MRI) sequences. Typical MRI data sets consist of T1- and T2-weighted imaging (T1WI, T2WI) along with advanced MRI parameters of diffusion-weighted imaging (DWI) and perfusion weighted imaging (PWI) methods. Each of these parameters has distinct radiological-pathological meaning. For example, DWI interrogates the movement of water in the tissue and PWI gives an estimate of the blood flow, both are critical measures during the evolution of stroke. In order to integrate these data and give an estimate of the tissue at risk or damaged; we have developed advanced machine learning methods based on unsupervised non-linear dimensionality reduction (NLDR) techniques. NLDR methods are a class of algorithms that uses mathematically defined manifolds for statistical sampling of multidimensional classes to generate a discrimination rule of guaranteed statistical accuracy and they can generate a two- or three-dimensional map, which represents the prominent structures of the data and provides an embedded image of meaningful low-dimensional structures hidden in their high-dimensional observations. In this manuscript, we develop NLDR methods on high dimensional MRI data sets of preclinical animals and clinical patients with stroke. On analyzing the performance of these methods, we observed that there was a high of similarity between multiparametric embedded images from NLDR methods and the ADC map and perfusion map. It was also observed that embedded scattergram of abnormal (infarcted or at risk) tissue can be visualized and provides a mechanism for automatic methods to delineate potential stroke volumes and early tissue at risk.

  10. Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.

    PubMed

    Armañanzas, Rubén; Alonso-Nanclares, Lidia; Defelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G; Defelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148

  11. Machine Learning Approach for the Outcome Prediction of Temporal Lobe Epilepsy Surgery

    PubMed Central

    DeFelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G.; DeFelipe, Javier; Bielza, Concha; Larrañaga, Pedro

    2013-01-01

    Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148

  12. Exploring Machine Learning Techniques Using Patient Interactions in Online Health Forums to Classify Drug Safety

    ERIC Educational Resources Information Center

    Chee, Brant Wah Kwong

    2011-01-01

    This dissertation explores the use of personal health messages collected from online message forums to predict drug safety using natural language processing and machine learning techniques. Drug safety is defined as any drug with an active safety alert from the US Food and Drug Administration (FDA). It is believed that this is the first…

  13. Detection of dispersed radio pulses: a machine learning approach to candidate identification and classification

    NASA Astrophysics Data System (ADS)

    Devine, Thomas Ryan; Goseva-Popstojanova, Katerina; McLaughlin, Maura

    2016-06-01

    Searching for extraterrestrial, transient signals in astronomical data sets is an active area of current research. However, machine learning techniques are lacking in the literature concerning single-pulse detection. This paper presents a new, two-stage approach for identifying and classifying dispersed pulse groups (DPGs) in single-pulse search output. The first stage identified DPGs and extracted features to characterize them using a new peak identification algorithm which tracks sloping tendencies around local maxima in plots of signal-to-noise ratio versus dispersion measure. The second stage used supervised machine learning to classify DPGs. We created four benchmark data sets: one unbalanced and three balanced versions using three different imbalance treatments. We empirically evaluated 48 classifiers by training and testing binary and multiclass versions of six machine learning algorithms on each of the four benchmark versions. While each classifier had advantages and disadvantages, all classifiers with imbalance treatments had higher recall values than those with unbalanced data, regardless of the machine learning algorithm used. Based on the benchmarking results, we selected a subset of classifiers to classify the full, unlabelled data set of over 1.5 million DPGs identified in 42 405 observations made by the Green Bank Telescope. Overall, the classifiers using a multiclass ensemble tree learner in combination with two oversampling imbalance treatments were the most efficient; they identified additional known pulsars not in the benchmark data set and provided six potential discoveries, with significantly less false positives than the other classifiers.

  14. Learning Control: Sense-Making, CNC Machines, and Changes in Vocational Training for Industrial Work

    ERIC Educational Resources Information Center

    Berner, Boel

    2009-01-01

    The paper explores how novices in school-based vocational training make sense of computerized numerical control (CNC) machines. Based on two ethnographic studies in Swedish schools, one from the early 1980s and one from 2006, it analyses change and continuity in the cognitive, social, and emotional processes of learning how to become a machine…

  15. Machine Shop I. Learning Activity Packets (LAPs). Section B--Basic and Related Technology.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This document contains eight learning activity packets (LAPs) for the "basic and related technology" instructional area of a Machine Shop I course. The eight LAPs cover the following topics: basic mathematics, blueprints, rules, micrometer measuring tools, Vernier measuring tools, dial indicators, gaging and inspection tools, and materials and…

  16. Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality

    PubMed Central

    2016-01-01

    Background One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Objective Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Methods Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Results Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Conclusions Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data. PMID:27185366

  17. Introduction to the JASIST Special Topic Issue on Web Retrieval and Mining: A Machine Learning Perspective.

    ERIC Educational Resources Information Center

    Chen, Hsinchun

    2003-01-01

    Discusses information retrieval techniques used on the World Wide Web. Topics include machine learning in information extraction; relevance feedback; information filtering and recommendation; text classification and text clustering; Web mining, based on data mining techniques; hyperlink structure; and Web size. (LRW)

  18. Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

    ERIC Educational Resources Information Center

    Nehm, Ross H.; Ha, Minsu; Mayfield, Elijah

    2012-01-01

    This study explored the use of machine learning to automatically evaluate the accuracy of students' written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate…

  19. Use of Machine Learning to Identify Children with Autism and Their Motor Abnormalities

    ERIC Educational Resources Information Center

    Crippa, Alessandro; Salvatore, Christian; Perego, Paolo; Forti, Sara; Nobile, Maria; Molteni, Massimo; Castiglioni, Isabella

    2015-01-01

    In the present work, we have undertaken a proof-of-concept study to determine whether a simple upper-limb movement could be useful to accurately classify low-functioning children with autism spectrum disorder (ASD) aged 2-4. To answer this question, we developed a supervised machine-learning method to correctly discriminate 15 preschool children…

  20. Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

    PubMed Central

    Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M.; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John

    2016-01-01

    Objective: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient’s status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral. PMID:27257386

  1. Drought Forecasting Based on Machine Learning of Remote Sensing and Long-Range Forecast Data

    NASA Astrophysics Data System (ADS)

    Rhee, J.; Im, J.; Park, S.

    2016-06-01

    The reduction of drought impacts may be achieved through sustainable drought management and proactive measures against drought disaster. Accurate and timely provision of drought information is essential. In this study, drought forecasting models to provide high-resolution drought information based on drought indicators for ungauged areas were developed. The developed models predict drought indices of the 6-month Standardized Precipitation Index (SPI6) and the 6-month Standardized Precipitation Evapotranspiration Index (SPEI6). An interpolation method based on multiquadric spline interpolation method as well as three machine learning models were tested. Three machine learning models of Decision Tree, Random Forest, and Extremely Randomized Trees were tested to enhance the provision of drought initial conditions based on remote sensing data, since initial conditions is one of the most important factors for drought forecasting. Machine learning-based methods performed better than interpolation methods for both classification and regression, and the methods using climatology data outperformed the methods using long-range forecast. The model based on climatological data and the machine learning method outperformed overall.

  2. Learning Activity Packets for Grinding Machines. Unit II--Surface Grinding.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the second unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  3. Learning Activity Packets for Grinding Machines. Unit III--Cylindrical Grinding.

    ERIC Educational Resources Information Center

    Oklahoma State Board of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.

    This learning activity packet (LAP) is one of three that accompany the curriculum guide on grinding machines. It outlines the study activities and performance tasks for the third unit of this curriculum guide. Its purpose is to aid the student in attaining a working knowledge of this area of training and in achieving a skilled or moderately…

  4. Machine Translation in Foreign Language Learning: Language Learners' and Tutors' Perceptions of Its Advantages and Disadvantages

    ERIC Educational Resources Information Center

    Nino, Ana

    2009-01-01

    This paper presents a snapshot of what has been investigated in terms of the relationship between machine translation (MT) and foreign language (FL) teaching and learning. For this purpose four different roles of MT in the language class have been identified: MT as a bad model, MT as a good model, MT as a vocational training tool (especially in…

  5. e-Learning Application for Machine Maintenance Process using Iterative Method in XYZ Company

    NASA Astrophysics Data System (ADS)

    Nurunisa, Suaidah; Kurniawati, Amelia; Pramuditya Soesanto, Rayinda; Yunan Kurnia Septo Hediyanto, Umar

    2016-02-01

    XYZ Company is a company based on manufacturing part for airplane, one of the machine that is categorized as key facility in the company is Millac 5H6P. As a key facility, the machines should be assured to work well and in peak condition, therefore, maintenance process is needed periodically. From the data gathering, it is known that there are lack of competency from the maintenance staff to maintain different type of machine which is not assigned by the supervisor, this indicate that knowledge which possessed by maintenance staff are uneven. The purpose of this research is to create knowledge-based e-learning application as a realization from externalization process in knowledge transfer process to maintain the machine. The application feature are adjusted for maintenance purpose using e-learning framework for maintenance process, the content of the application support multimedia for learning purpose. QFD is used in this research to understand the needs from user. The application is built using moodle with iterative method for software development cycle and UML Diagram. The result from this research is e-learning application as sharing knowledge media for maintenance staff in the company. From the test, it is known that the application make maintenance staff easy to understand the competencies.

  6. Using Machine-Learned Detectors to Assess and Predict Students' Inquiry Performance

    ERIC Educational Resources Information Center

    Gobert, Janice D.; Baker, Ryan; Pedro, Michael Sao

    2011-01-01

    The authors present work towards automatically assessing data collection behaviors as middle school students engage in inquiry within a physics microworld. In this study, the authors used machine learned models that can detect when students test their articulated hypotheses, design controlled experiments, and engage in planning behaviors using…

  7. Rare variants detection with kernel machine learning based on likelihood ratio test.

    PubMed

    Zeng, Ping; Zhao, Yang; Zhang, Liwei; Huang, Shuiping; Chen, Feng

    2014-01-01

    This paper mainly utilizes likelihood-based tests to detect rare variants associated with a continuous phenotype under the framework of kernel machine learning. Both the likelihood ratio test (LRT) and the restricted likelihood ratio test (ReLRT) are investigated. The relationship between the kernel machine learning and the mixed effects model is discussed. By using the eigenvalue representation of LRT and ReLRT, their exact finite sample distributions are obtained in a simulation manner. Numerical studies are performed to evaluate the performance of the proposed approaches under the contexts of standard mixed effects model and kernel machine learning. The results have shown that the LRT and ReLRT can control the type I error correctly at the given α level. The LRT and ReLRT consistently outperform the SKAT, regardless of the sample size and the proportion of the negative causal rare variants, and suffer from fewer power reductions compared to the SKAT when both positive and negative effects of rare variants are present. The LRT and ReLRT performed under the context of kernel machine learning have slightly higher powers than those performed under the context of standard mixed effects model. We use the Genetic Analysis Workshop 17 exome sequencing SNP data as an illustrative example. Some interesting results are observed from the analysis. Finally, we give the discussion. PMID:24675868

  8. Games and Machine Learning: A Powerful Combination in an Artificial Intelligence Course

    ERIC Educational Resources Information Center

    Wallace, Scott A.; McCartney, Robert; Russell, Ingrid

    2010-01-01

    Project MLeXAI [Machine Learning eXperiences in Artificial Intelligence (AI)] seeks to build a set of reusable course curriculum and hands on laboratory projects for the artificial intelligence classroom. In this article, we describe two game-based projects from the second phase of project MLeXAI: Robot Defense--a simple real-time strategy game…

  9. Coordinated machine learning and decision support for situation awareness.

    PubMed

    Brannon, N G; Seiffertt, J E; Draelos, T J; Wunsch, D C

    2009-04-01

    Domains such as force protection require an effective decision maker to maintain a high level of situation awareness. A system that combines humans with neural networks is a desirable approach. Furthermore, it is advantageous for the calculation engine to operate in three learning modes: supervised for initial training and known updating, reinforcement for online operational improvement, and unsupervised in the absence of all external signaling. An Adaptive Resonance Theory based architecture capable of seamlessly switching among the three types of learning is discussed that can be used to help optimize the decision making of a human operator in such a scenario. This is followed by a situation assessment module. PMID:19395234

  10. Dynamical analysis of contrastive divergence learning: Restricted Boltzmann machines with Gaussian visible units.

    PubMed

    Karakida, Ryo; Okada, Masato; Amari, Shun-Ichi

    2016-07-01

    The restricted Boltzmann machine (RBM) is an essential constituent of deep learning, but it is hard to train by using maximum likelihood (ML) learning, which minimizes the Kullback-Leibler (KL) divergence. Instead, contrastive divergence (CD) learning has been developed as an approximation of ML learning and widely used in practice. To clarify the performance of CD learning, in this paper, we analytically derive the fixed points where ML and CDn learning rules converge in two types of RBMs: one with Gaussian visible and Gaussian hidden units and the other with Gaussian visible and Bernoulli hidden units. In addition, we analyze the stability of the fixed points. As a result, we find that the stable points of CDn learning rule coincide with those of ML learning rule in a Gaussian-Gaussian RBM. We also reveal that larger principal components of the input data are extracted at the stable points. Moreover, in a Gaussian-Bernoulli RBM, we find that both ML and CDn learning can extract independent components at one of stable points. Our analysis demonstrates that the same feature components as those extracted by ML learning are extracted simply by performing CD1 learning. Expanding this study should elucidate the specific solutions obtained by CD learning in other types of RBMs or in deep networks. PMID:27131468

  11. Recent CESAR (Center for Engineering Systems Advanced Research) research activities in sensor based reasoning for autonomous machines

    SciTech Connect

    Pin, F.G.; de Saussure, G.; Spelt, P.F.; Killough, S.M.; Weisbin, C.R.

    1988-01-01

    This paper describes recent research activities at the Center for Engineering Systems Advanced Research (CESAR) in the area of sensor based reasoning, with emphasis being given to their application and implementation on our HERMIES-IIB autonomous mobile vehicle. These activities, including navigation and exploration in a-priori unknown and dynamic environments, goal recognition, vision-guided manipulation and sensor-driven machine learning, are discussed within the framework of a scenario in which an autonomous robot is asked to navigate through an unknown dynamic environment, explore, find and dock at the panel, read and understand the status of the panel's meters and dials, learn the functioning of a process control panel, and successfully manipulate the control devices of the panel to solve a maintenance emergency problems. A demonstration of the successful implementation of the algorithms on our HERMIES-IIB autonomous robot for resolution of this scenario is presented. Conclusions are drawn concerning the applicability of the methodologies to more general classes of problems and implications for future work on sensor-driven reasoning for autonomous robots are discussed. 8 refs., 3 figs.

  12. Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification.

    PubMed

    Mirza, Bilal; Lin, Zhiping

    2016-08-01

    In this paper, a meta-cognitive online sequential extreme learning machine (MOS-ELM) is proposed for class imbalance and concept drift learning. In MOS-ELM, meta-cognition is used to self-regulate the learning by selecting suitable learning strategies for class imbalance and concept drift problems. MOS-ELM is the first sequential learning method to alleviate the imbalance problem for both binary class and multi-class data streams with concept drift. In MOS-ELM, a new adaptive window approach is proposed for concept drift learning. A single output update equation is also proposed which unifies various application specific OS-ELM methods. The performance of MOS-ELM is evaluated under different conditions and compared with methods each specific to some of the conditions. On most of the datasets in comparison, MOS-ELM outperforms the competing methods. PMID:27187873

  13. Machine learning methods in the computational biology of cancer

    PubMed Central

    Vidyasagar, M.

    2014-01-01

    The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented. PMID:25002826

  14. Machine Methods for Acquiring, Learning, and Applying Knowledge.

    ERIC Educational Resources Information Center

    Hayes-Roth, Frederick; And Others

    A research plan for identifying and acting upon constraints that impede the development of knowledge-based intelligent systems is described. The two primary problems identified are knowledge programming, the task of which is to create an intelligent system that does what an expert says it should, and learning, the problem requiring the criticizing…

  15. Virtual Learning Environment for Interactive Engagement with Advanced Quantum Mechanics

    NASA Astrophysics Data System (ADS)

    Pedersen, Mads Kock; Skyum, Birk; Heck, Robert; Müller, Romain; Bason, Mark; Lieberoth, Andreas; Sherson, Jacob F.

    2016-06-01

    A virtual learning environment can engage university students in the learning process in ways that the traditional lectures and lab formats cannot. We present our virtual learning environment StudentResearcher, which incorporates simulations, multiple-choice quizzes, video lectures, and gamification into a learning path for quantum mechanics at the advanced university level. StudentResearcher is built upon the experiences gathered from workshops with the citizen science game Quantum Moves at the high-school and university level, where the games were used extensively to illustrate the basic concepts of quantum mechanics. The first test of this new virtual learning environment was a 2014 course in advanced quantum mechanics at Aarhus University with 47 enrolled students. We found increased learning for the students who were more active on the platform independent of their previous performances.

  16. Construction and Analysis of Educational Tests Using Abductive Machine Learning

    ERIC Educational Resources Information Center

    El-Alfy, El-Sayed M.; Abdel-Aal, Radwan E.

    2008-01-01

    Recent advances in educational technologies and the wide-spread use of computers in schools have fueled innovations in test construction and analysis. As the measurement accuracy of a test depends on the quality of the items it includes, item selection procedures play a central role in this process. Mathematical programming and the item response…

  17. Learning with Multiple Representations. Advances in Learning and Instruction Series.

    ERIC Educational Resources Information Center

    van Someren, Maarten W., Ed.; Reimann, Peter, Ed.; Boshuizen, Henny P. A., Ed.; de Jong, Ton, Ed.

    This book addresses questions of multiple representations in human reasoning and learning. Computational approaches to learning with multiple representations are introduced, and the role of multiple representations in teaching is discussed. The following chapters are included in Part I: Multiple Representations in Learning Concepts form Physics…

  18. A Multianalyzer Machine Learning Model for Marine Heterogeneous Data Schema Mapping

    PubMed Central

    Yan, Wang; Jiajin, Le; Yun, Zhang

    2014-01-01

    The main challenges that marine heterogeneous data integration faces are the problem of accurate schema mapping between heterogeneous data sources. In order to improve the schema mapping efficiency and get more accurate learning results, this paper proposes a heterogeneous data schema mapping method basing on multianalyzer machine learning model. The multianalyzer analysis the learning results comprehensively, and a fuzzy comprehensive evaluation system is introduced for output results' evaluation and multi factor quantitative judging. Finally, the data mapping comparison experiment on the East China Sea observing data confirms the effectiveness of the model and shows multianalyzer's obvious improvement of mapping error rate. PMID:25250372

  19. A multianalyzer machine learning model for marine heterogeneous data schema mapping.

    PubMed

    Yan, Wang; Jiajin, Le; Yun, Zhang

    2014-01-01

    The main challenges that marine heterogeneous data integration faces are the problem of accurate schema mapping between heterogeneous data sources. In order to improve the schema mapping efficiency and get more accurate learning results, this paper proposes a heterogeneous data schema mapping method basing on multianalyzer machine learning model. The multianalyzer analysis the learning results comprehensively, and a fuzzy comprehensive evaluation system is introduced for output results' evaluation and multi factor quantitative judging. Finally, the data mapping comparison experiment on the East China Sea observing data confirms the effectiveness of the model and shows multianalyzer's obvious improvement of mapping error rate. PMID:25250372

  20. Advanced Learning Space as an Asset for Students with Disabilities

    ERIC Educational Resources Information Center

    Císarová, Klára; Lamr, Marián; Vitvarová, Jana

    2015-01-01

    The paper describes an e-learning system called Advanced Learning Space that was developed at the Technical University of Liberec. The system provides a personalized virtual work space and promotes communication among students and their teachers. The core of the system is a module that can be used to automatically record, store and playback…