Sample records for machine learning techniques

  1. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  2. Prostate Cancer Probability Prediction By Machine Learning Technique.

    PubMed

    Jović, Srđan; Miljković, Milica; Ivanović, Miljan; Šaranović, Milena; Arsić, Milena

    2017-11-26

    The main goal of the study was to explore possibility of prostate cancer prediction by machine learning techniques. In order to improve the survival probability of the prostate cancer patients it is essential to make suitable prediction models of the prostate cancer. If one make relevant prediction of the prostate cancer it is easy to create suitable treatment based on the prediction results. Machine learning techniques are the most common techniques for the creation of the predictive models. Therefore in this study several machine techniques were applied and compared. The obtained results were analyzed and discussed. It was concluded that the machine learning techniques could be used for the relevant prediction of prostate cancer.

  3. Interpreting Medical Information Using Machine Learning and Individual Conditional Expectation.

    PubMed

    Nohara, Yasunobu; Wakata, Yoshifumi; Nakashima, Naoki

    2015-01-01

    Recently, machine-learning techniques have spread many fields. However, machine-learning is still not popular in medical research field due to difficulty of interpreting. In this paper, we introduce a method of interpreting medical information using machine learning technique. The method gave new explanation of partial dependence plot and individual conditional expectation plot from medical research field.

  4. The application of machine learning techniques in the clinical drug therapy.

    PubMed

    Meng, Huan-Yu; Jin, Wan-Lin; Yan, Cheng-Kai; Yang, Huan

    2018-05-25

    The development of a novel drug is an extremely complicated process that includes the target identification, design and manufacture, and proper therapy of the novel drug, as well as drug dose selection, drug efficacy evaluation, and adverse drug reaction control. Due to the limited resources, high costs, long duration, and low hit-to-lead ratio in the development of pharmacogenetics and computer technology, machine learning techniques have assisted novel drug development and have gradually received more attention by researchers. According to current research, machine learning techniques are widely applied in the process of the discovery of new drugs and novel drug targets, the decision surrounding proper therapy and drug dose, and the prediction of drug efficacy and adverse drug reactions. In this article, we discussed the history, workflow, and advantages and disadvantages of machine learning techniques in the processes mentioned above. Although the advantages of machine learning techniques are fairly obvious, the application of machine learning techniques is currently limited. With further research, the application of machine techniques in drug development could be much more widespread and could potentially be one of the major methods used in drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. Machine Learning Techniques in Clinical Vision Sciences.

    PubMed

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration, and diabetic retinopathy, these ocular pathologies being the major causes of irreversible visual impairment.

  6. Recent developments in machine learning applications in landslide susceptibility mapping

    NASA Astrophysics Data System (ADS)

    Lun, Na Kai; Liew, Mohd Shahir; Matori, Abdul Nasir; Zawawi, Noor Amila Wan Abdullah

    2017-11-01

    While the prediction of spatial distribution of potential landslide occurrences is a primary interest in landslide hazard mitigation, it remains a challenging task. To overcome the scarceness of complete, sufficiently detailed geomorphological attributes and environmental conditions, various machine-learning techniques are increasingly applied to effectively map landslide susceptibility for large regions. Nevertheless, limited review papers are devoted to this field, particularly on the various domain specific applications of machine learning techniques. Available literature often report relatively good predictive performance, however, papers discussing the limitations of each approaches are quite uncommon. The foremost aim of this paper is to narrow these gaps in literature and to review up-to-date machine learning and ensemble learning techniques applied in landslide susceptibility mapping. It provides new readers an introductory understanding on the subject matter and researchers a contemporary review of machine learning advancements alongside the future direction of these techniques in the landslide mitigation field.

  7. Automation of energy demand forecasting

    NASA Astrophysics Data System (ADS)

    Siddique, Sanzad

    Automation of energy demand forecasting saves time and effort by searching automatically for an appropriate model in a candidate model space without manual intervention. This thesis introduces a search-based approach that improves the performance of the model searching process for econometrics models. Further improvements in the accuracy of the energy demand forecasting are achieved by integrating nonlinear transformations within the models. This thesis introduces machine learning techniques that are capable of modeling such nonlinearity. Algorithms for learning domain knowledge from time series data using the machine learning methods are also presented. The novel search based approach and the machine learning models are tested with synthetic data as well as with natural gas and electricity demand signals. Experimental results show that the model searching technique is capable of finding an appropriate forecasting model. Further experimental results demonstrate an improved forecasting accuracy achieved by using the novel machine learning techniques introduced in this thesis. This thesis presents an analysis of how the machine learning techniques learn domain knowledge. The learned domain knowledge is used to improve the forecast accuracy.

  8. Machine learning in heart failure: ready for prime time.

    PubMed

    Awan, Saqib Ejaz; Sohel, Ferdous; Sanfilippo, Frank Mario; Bennamoun, Mohammed; Dwivedi, Girish

    2018-03-01

    The aim of this review is to present an up-to-date overview of the application of machine learning methods in heart failure including diagnosis, classification, readmissions and medication adherence. Recent studies have shown that the application of machine learning techniques may have the potential to improve heart failure outcomes and management, including cost savings by improving existing diagnostic and treatment support systems. Recently developed deep learning methods are expected to yield even better performance than traditional machine learning techniques in performing complex tasks by learning the intricate patterns hidden in big medical data. The review summarizes the recent developments in the application of machine and deep learning methods in heart failure management.

  9. Component Pin Recognition Using Algorithms Based on Machine Learning

    NASA Astrophysics Data System (ADS)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  10. Machine learning modelling for predicting soil liquefaction susceptibility

    NASA Astrophysics Data System (ADS)

    Samui, P.; Sitharam, T. G.

    2011-01-01

    This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

  11. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    NASA Astrophysics Data System (ADS)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  12. Application of machine learning techniques to lepton energy reconstruction in water Cherenkov detectors

    NASA Astrophysics Data System (ADS)

    Drakopoulou, E.; Cowan, G. A.; Needham, M. D.; Playfer, S.; Taani, M.

    2018-04-01

    The application of machine learning techniques to the reconstruction of lepton energies in water Cherenkov detectors is discussed and illustrated for TITUS, a proposed intermediate detector for the Hyper-Kamiokande experiment. It is found that applying these techniques leads to an improvement of more than 50% in the energy resolution for all lepton energies compared to an approach based upon lookup tables. Machine learning techniques can be easily applied to different detector configurations and the results are comparable to likelihood-function based techniques that are currently used.

  13. 2014 Bio-Acoustics Data Challenge for the International Community on Machine Learning and Bioacoustics

    DTIC Science & Technology

    2014-09-30

    This ONR grant promotes the development and application of advanced machine learning techniques for detection and classification of marine mammal...sounds. The objective is to engage a broad community of data scientists in the development and application of advanced machine learning techniques for detection and classification of marine mammal sounds.

  14. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    PubMed

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  15. Novel Breast Imaging and Machine Learning: Predicting Breast Lesion Malignancy at Cone-Beam CT Using Machine Learning Techniques.

    PubMed

    Uhlig, Johannes; Uhlig, Annemarie; Kunze, Meike; Beissbarth, Tim; Fischer, Uwe; Lotz, Joachim; Wienbeck, Susanne

    2018-05-24

    The purpose of this study is to evaluate the diagnostic performance of machine learning techniques for malignancy prediction at breast cone-beam CT (CBCT) and to compare them to human readers. Five machine learning techniques, including random forests, back propagation neural networks (BPN), extreme learning machines, support vector machines, and K-nearest neighbors, were used to train diagnostic models on a clinical breast CBCT dataset with internal validation by repeated 10-fold cross-validation. Two independent blinded human readers with profound experience in breast imaging and breast CBCT analyzed the same CBCT dataset. Diagnostic performance was compared using AUC, sensitivity, and specificity. The clinical dataset comprised 35 patients (American College of Radiology density type C and D breasts) with 81 suspicious breast lesions examined with contrast-enhanced breast CBCT. Forty-five lesions were histopathologically proven to be malignant. Among the machine learning techniques, BPNs provided the best diagnostic performance, with AUC of 0.91, sensitivity of 0.85, and specificity of 0.82. The diagnostic performance of the human readers was AUC of 0.84, sensitivity of 0.89, and specificity of 0.72 for reader 1 and AUC of 0.72, sensitivity of 0.71, and specificity of 0.67 for reader 2. AUC was significantly higher for BPN when compared with both reader 1 (p = 0.01) and reader 2 (p < 0.001). Machine learning techniques provide a high and robust diagnostic performance in the prediction of malignancy in breast lesions identified at CBCT. BPNs showed the best diagnostic performance, surpassing human readers in terms of AUC and specificity.

  16. The impact of machine learning techniques in the study of bipolar disorder: A systematic review.

    PubMed

    Librenza-Garcia, Diego; Kotzian, Bruno Jaskulski; Yang, Jessica; Mwangi, Benson; Cao, Bo; Pereira Lima, Luiza Nunes; Bermudez, Mariane Bagatin; Boeira, Manuela Vianna; Kapczinski, Flávio; Passos, Ives Cavalcante

    2017-09-01

    Machine learning techniques provide new methods to predict diagnosis and clinical outcomes at an individual level. We aim to review the existing literature on the use of machine learning techniques in the assessment of subjects with bipolar disorder. We systematically searched PubMed, Embase and Web of Science for articles published in any language up to January 2017. We found 757 abstracts and included 51 studies in our review. Most of the included studies used multiple levels of biological data to distinguish the diagnosis of bipolar disorder from other psychiatric disorders or healthy controls. We also found studies that assessed the prediction of clinical outcomes and studies using unsupervised machine learning to build more consistent clinical phenotypes of bipolar disorder. We concluded that given the clinical heterogeneity of samples of patients with BD, machine learning techniques may provide clinicians and researchers with important insights in fields such as diagnosis, personalized treatment and prognosis orientation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Inverse Problems in Geodynamics Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R. N.

    2018-01-01

    During the past few decades numerical studies have been widely employed to explore the style of circulation and mixing in the mantle of Earth and other planets. However, in geodynamical studies there are many properties from mineral physics, geochemistry, and petrology in these numerical models. Machine learning, as a computational statistic-related technique and a subfield of artificial intelligence, has rapidly emerged recently in many fields of sciences and engineering. We focus here on the application of supervised machine learning (SML) algorithms in predictions of mantle flow processes. Specifically, we emphasize on estimating mantle properties by employing machine learning techniques in solving an inverse problem. Using snapshots of numerical convection models as training samples, we enable machine learning models to determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at midmantle depths. Employing support vector machine algorithms, we show that SML techniques can successfully predict the magnitude of mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex geodynamic problems in mantle dynamics by employing deep learning algorithms for putting constraints on properties such as viscosity, elastic parameters, and the nature of thermal and chemical anomalies.

  18. Contemporary machine learning: techniques for practitioners in the physical sciences

    NASA Astrophysics Data System (ADS)

    Spears, Brian

    2017-10-01

    Machine learning is the science of using computers to find relationships in data without explicitly knowing or programming those relationships in advance. Often without realizing it, we employ machine learning every day as we use our phones or drive our cars. Over the last few years, machine learning has found increasingly broad application in the physical sciences. This most often involves building a model relationship between a dependent, measurable output and an associated set of controllable, but complicated, independent inputs. The methods are applicable both to experimental observations and to databases of simulated output from large, detailed numerical simulations. In this tutorial, we will present an overview of current tools and techniques in machine learning - a jumping-off point for researchers interested in using machine learning to advance their work. We will discuss supervised learning techniques for modeling complicated functions, beginning with familiar regression schemes, then advancing to more sophisticated decision trees, modern neural networks, and deep learning methods. Next, we will cover unsupervised learning and techniques for reducing the dimensionality of input spaces and for clustering data. We'll show example applications from both magnetic and inertial confinement fusion. Along the way, we will describe methods for practitioners to help ensure that their models generalize from their training data to as-yet-unseen test data. We will finally point out some limitations to modern machine learning and speculate on some ways that practitioners from the physical sciences may be particularly suited to help. This work was performed by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  19. Exploring machine-learning-based control plane intrusion detection techniques in software defined optical networks

    NASA Astrophysics Data System (ADS)

    Zhang, Huibin; Wang, Yuqiao; Chen, Haoran; Zhao, Yongli; Zhang, Jie

    2017-12-01

    In software defined optical networks (SDON), the centralized control plane may encounter numerous intrusion threatens which compromise the security level of provisioned services. In this paper, the issue of control plane security is studied and two machine-learning-based control plane intrusion detection techniques are proposed for SDON with properly selected features such as bandwidth, route length, etc. We validate the feasibility and efficiency of the proposed techniques by simulations. Results show an accuracy of 83% for intrusion detection can be achieved with the proposed machine-learning-based control plane intrusion detection techniques.

  20. Relationships Between the External and Internal Training Load in Professional Soccer: What Can We Learn From Machine Learning?

    PubMed

    Jaspers, Arne; De Beéck, Tim Op; Brink, Michel S; Frencken, Wouter G P; Staes, Filip; Davis, Jesse J; Helsen, Werner F

    2018-05-01

    Machine learning may contribute to understanding the relationship between the external load and internal load in professional soccer. Therefore, the relationship between external load indicators (ELIs) and the rating of perceived exertion (RPE) was examined using machine learning techniques on a group and individual level. Training data were collected from 38 professional soccer players over 2 seasons. The external load was measured using global positioning system technology and accelerometry. The internal load was obtained using the RPE. Predictive models were constructed using 2 machine learning techniques, artificial neural networks and least absolute shrinkage and selection operator (LASSO) models, and 1 naive baseline method. The predictions were based on a large set of ELIs. Using each technique, 1 group model involving all players and 1 individual model for each player were constructed. These models' performance on predicting the reported RPE values for future training sessions was compared with the naive baseline's performance. Both the artificial neural network and LASSO models outperformed the baseline. In addition, the LASSO model made more accurate predictions for the RPE than did the artificial neural network model. Furthermore, decelerations were identified as important ELIs. Regardless of the applied machine learning technique, the group models resulted in equivalent or better predictions for the reported RPE values than the individual models. Machine learning techniques may have added value in predicting RPE for future sessions to optimize training design and evaluation. These techniques may also be used in conjunction with expert knowledge to select key ELIs for load monitoring.

  1. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data

    PubMed Central

    Hepworth, Philip J.; Nefedov, Alexey V.; Muchnik, Ilya B.; Morgan, Kenton L.

    2012-01-01

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide. PMID:22319115

  2. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    PubMed

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  3. Use of Advanced Machine-Learning Techniques for Non-Invasive Monitoring of Hemorrhage

    DTIC Science & Technology

    2010-04-01

    that state-of-the-art machine learning techniques when integrated with novel non-invasive monitoring technologies could detect subtle, physiological...decompensation. Continuous, non-invasively measured hemodynamic signals (e.g., ECG, blood pressures, stroke volume) were used for the development of machine ... learning algorithms. Accuracy estimates were obtained by building models using 27 subjects and testing on the 28th. This process was repeated 28 times

  4. Machine learning models in breast cancer survival prediction.

    PubMed

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of accuracy. Therefore, this model is recommended as a useful tool for breast cancer survival prediction as well as medical decision making.

  5. Quantum machine learning.

    PubMed

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-13

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  6. Quantum machine learning

    NASA Astrophysics Data System (ADS)

    Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

    2017-09-01

    Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.

  7. Machine Learning for the Knowledge Plane

    DTIC Science & Technology

    2006-06-01

    this idea is to combine techniques from machine learning with new architectural concepts in networking to make the internet self-aware and self...work on the machine learning portion of the Knowledge Plane. This consisted of three components: (a) we wrote a document formulating the various

  8. Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Tout, Hicham

    2013-01-01

    The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning…

  9. Robust Fault Diagnosis in Electric Drives Using Machine Learning

    DTIC Science & Technology

    2004-09-08

    detection of fault conditions of the inverter. A machine learning framework is developed to systematically select torque-speed domain operation points...were used to generate various fault condition data for machine learning . The technique is viable for accurate, reliable and fast fault detection in electric drives.

  10. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    ERIC Educational Resources Information Center

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  11. Next-Generation Machine Learning for Biological Networks.

    PubMed

    Camacho, Diogo M; Collins, Katherine M; Powers, Rani K; Costello, James C; Collins, James J

    2018-06-14

    Machine learning, a collection of data-analytical techniques aimed at building predictive models from multi-dimensional datasets, is becoming integral to modern biological research. By enabling one to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a primer on machine learning for life scientists, including an introduction to deep learning. We discuss opportunities and challenges at the intersection of machine learning and network biology, which could impact disease biology, drug discovery, microbiome research, and synthetic biology. Copyright © 2018 Elsevier Inc. All rights reserved.

  12. Comparison between extreme learning machine and wavelet neural networks in data classification

    NASA Astrophysics Data System (ADS)

    Yahia, Siwar; Said, Salwa; Jemai, Olfa; Zaied, Mourad; Ben Amar, Chokri

    2017-03-01

    Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.

  13. Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model

    NASA Astrophysics Data System (ADS)

    Pathak, Jaideep; Wikner, Alexander; Fussell, Rebeckah; Chandra, Sarthak; Hunt, Brian R.; Girvan, Michelle; Ott, Edward

    2018-04-01

    A model-based approach to forecasting chaotic dynamical systems utilizes knowledge of the mechanistic processes governing the dynamics to build an approximate mathematical model of the system. In contrast, machine learning techniques have demonstrated promising results for forecasting chaotic systems purely from past time series measurements of system state variables (training data), without prior knowledge of the system dynamics. The motivation for this paper is the potential of machine learning for filling in the gaps in our underlying mechanistic knowledge that cause widely-used knowledge-based models to be inaccurate. Thus, we here propose a general method that leverages the advantages of these two approaches by combining a knowledge-based model and a machine learning technique to build a hybrid forecasting scheme. Potential applications for such an approach are numerous (e.g., improving weather forecasting). We demonstrate and test the utility of this approach using a particular illustrative version of a machine learning known as reservoir computing, and we apply the resulting hybrid forecaster to a low-dimensional chaotic system, as well as to a high-dimensional spatiotemporal chaotic system. These tests yield extremely promising results in that our hybrid technique is able to accurately predict for a much longer period of time than either its machine-learning component or its model-based component alone.

  14. Approaches to Machine Learning.

    DTIC Science & Technology

    1984-02-16

    The field of machine learning strives to develop methods and techniques to automatic the acquisition of new information, new skills, and new ways of organizing existing information. In this article, we review the major approaches to machine learning in symbolic domains, covering the tasks of learning concepts from examples, learning search methods, conceptual clustering, and language acquisition. We illustrate each of the basic approaches with paradigmatic examples. (Author)

  15. Current Developments in Machine Learning Techniques in Biological Data Mining.

    PubMed

    Dumancas, Gerard G; Adrianto, Indra; Bello, Ghalib; Dozmorov, Mikhail

    2017-01-01

    This supplement is intended to focus on the use of machine learning techniques to generate meaningful information on biological data. This supplement under Bioinformatics and Biology Insights aims to provide scientists and researchers working in this rapid and evolving field with online, open-access articles authored by leading international experts in this field. Advances in the field of biology have generated massive opportunities to allow the implementation of modern computational and statistical techniques. Machine learning methods in particular, a subfield of computer science, have evolved as an indispensable tool applied to a wide spectrum of bioinformatics applications. Thus, it is broadly used to investigate the underlying mechanisms leading to a specific disease, as well as the biomarker discovery process. With a growth in this specific area of science comes the need to access up-to-date, high-quality scholarly articles that will leverage the knowledge of scientists and researchers in the various applications of machine learning techniques in mining biological data.

  16. Agents Technology Research

    DTIC Science & Technology

    2010-02-01

    multi-agent reputation management. State abstraction is a technique used to allow machine learning technologies to cope with problems that have large...state abstrac- tion process to enable reinforcement learning in domains with large state spaces. State abstraction is vital to machine learning ...across a collective of independent platforms. These individual elements, often referred to as agents in the machine learning community, should exhibit both

  17. Large-Scale Machine Learning for Classification and Search

    ERIC Educational Resources Information Center

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  18. Machine Learning

    NASA Astrophysics Data System (ADS)

    Hoffmann, Achim; Mahidadia, Ashesh

    The purpose of this chapter is to present fundamental ideas and techniques of machine learning suitable for the field of this book, i.e., for automated scientific discovery. The chapter focuses on those symbolic machine learning methods, which produce results that are suitable to be interpreted and understood by humans. This is particularly important in the context of automated scientific discovery as the scientific theories to be produced by machines are usually meant to be interpreted by humans. This chapter contains some of the most influential ideas and concepts in machine learning research to give the reader a basic insight into the field. After the introduction in Sect. 1, general ideas of how learning problems can be framed are given in Sect. 2. The section provides useful perspectives to better understand what learning algorithms actually do. Section 3 presents the Version space model which is an early learning algorithm as well as a conceptual framework, that provides important insight into the general mechanisms behind most learning algorithms. In section 4, a family of learning algorithms, the AQ family for learning classification rules is presented. The AQ family belongs to the early approaches in machine learning. The next, Sect. 5 presents the basic principles of decision tree learners. Decision tree learners belong to the most influential class of inductive learning algorithms today. Finally, a more recent group of learning systems are presented in Sect. 6, which learn relational concepts within the framework of logic programming. This is a particularly interesting group of learning systems since the framework allows also to incorporate background knowledge which may assist in generalisation. Section 7 discusses Association Rules - a technique that comes from the related field of Data mining. Section 8 presents the basic idea of the Naive Bayesian Classifier. While this is a very popular learning technique, the learning result is not well suited for human comprehension as it is essentially a large collection of probability values. In Sect. 9, we present a generic method for improving accuracy of a given learner by generatingmultiple classifiers using variations of the training data. While this works well in most cases, the resulting classifiers have significantly increased complexity and, hence, tend to destroy the human readability of the learning result that a single learner may produce. Section 10 contains a summary, mentions briefly other techniques not discussed in this chapter and presents outlook on the potential of machine learning in the future.

  19. Concrete Condition Assessment Using Impact-Echo Method and Extreme Learning Machines

    PubMed Central

    Zhang, Jing-Kui; Yan, Weizhong; Cui, De-Mi

    2016-01-01

    The impact-echo (IE) method is a popular non-destructive testing (NDT) technique widely used for measuring the thickness of plate-like structures and for detecting certain defects inside concrete elements or structures. However, the IE method is not effective for full condition assessment (i.e., defect detection, defect diagnosis, defect sizing and location), because the simple frequency spectrum analysis involved in the existing IE method is not sufficient to capture the IE signal patterns associated with different conditions. In this paper, we attempt to enhance the IE technique and enable it for full condition assessment of concrete elements by introducing advanced machine learning techniques for performing comprehensive analysis and pattern recognition of IE signals. Specifically, we use wavelet decomposition for extracting signatures or features out of the raw IE signals and apply extreme learning machine, one of the recently developed machine learning techniques, as classification models for full condition assessment. To validate the capabilities of the proposed method, we build a number of specimens with various types, sizes, and locations of defects and perform IE testing on these specimens in a lab environment. Based on analysis of the collected IE signals using the proposed machine learning based IE method, we demonstrate that the proposed method is effective in performing full condition assessment of concrete elements or structures. PMID:27023563

  20. Introduction to the JASIST Special Topic Issue on Web Retrieval and Mining: A Machine Learning Perspective.

    ERIC Educational Resources Information Center

    Chen, Hsinchun

    2003-01-01

    Discusses information retrieval techniques used on the World Wide Web. Topics include machine learning in information extraction; relevance feedback; information filtering and recommendation; text classification and text clustering; Web mining, based on data mining techniques; hyperlink structure; and Web size. (LRW)

  1. The New Possibilities from "Big Data" to Overlooked Associations Between Diabetes, Biochemical Parameters, Glucose Control, and Osteoporosis.

    PubMed

    Kruse, Christian

    2018-06-01

    To review current practices and technologies within the scope of "Big Data" that can further our understanding of diabetes mellitus and osteoporosis from large volumes of data. "Big Data" techniques involving supervised machine learning, unsupervised machine learning, and deep learning image analysis are presented with examples of current literature. Supervised machine learning can allow us to better predict diabetes-induced osteoporosis and understand relative predictor importance of diabetes-affected bone tissue. Unsupervised machine learning can allow us to understand patterns in data between diabetic pathophysiology and altered bone metabolism. Image analysis using deep learning can allow us to be less dependent on surrogate predictors and use large volumes of images to classify diabetes-induced osteoporosis and predict future outcomes directly from images. "Big Data" techniques herald new possibilities to understand diabetes-induced osteoporosis and ascertain our current ability to classify, understand, and predict this condition.

  2. Enhancing understanding and improving prediction of severe weather through spatiotemporal relational learning.

    PubMed

    McGovern, Amy; Gagne, David J; Williams, John K; Brown, Rodger A; Basara, Jeffrey B

    Severe weather, including tornadoes, thunderstorms, wind, and hail annually cause significant loss of life and property. We are developing spatiotemporal machine learning techniques that will enable meteorologists to improve the prediction of these events by improving their understanding of the fundamental causes of the phenomena and by building skillful empirical predictive models. In this paper, we present significant enhancements of our Spatiotemporal Relational Probability Trees that enable autonomous discovery of spatiotemporal relationships as well as learning with arbitrary shapes. We focus our evaluation on two real-world case studies using our technique: predicting tornadoes in Oklahoma and predicting aircraft turbulence in the United States. We also discuss how to evaluate success for a machine learning algorithm in the severe weather domain, which will enable new methods such as ours to transfer from research to operations, provide a set of lessons learned for embedded machine learning applications, and discuss how to field our technique.

  3. Machine learning for medical images analysis.

    PubMed

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods. Crown Copyright © 2016. Published by Elsevier B.V. All rights reserved.

  4. Imaging and machine learning techniques for diagnosis of Alzheimer's disease.

    PubMed

    Mirzaei, Golrokh; Adeli, Anahita; Adeli, Hojjat

    2016-12-01

    Alzheimer's disease (AD) is a common health problem in elderly people. There has been considerable research toward the diagnosis and early detection of this disease in the past decade. The sensitivity of biomarkers and the accuracy of the detection techniques have been defined to be the key to an accurate diagnosis. This paper presents a state-of-the-art review of the research performed on the diagnosis of AD based on imaging and machine learning techniques. Different segmentation and machine learning techniques used for the diagnosis of AD are reviewed including thresholding, supervised and unsupervised learning, probabilistic techniques, Atlas-based approaches, and fusion of different image modalities. More recent and powerful classification techniques such as the enhanced probabilistic neural network of Ahmadlou and Adeli should be investigated with the goal of improving the diagnosis accuracy. A combination of different image modalities can help improve the diagnosis accuracy rate. Research is needed on the combination of modalities to discover multi-modal biomarkers.

  5. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques.

    PubMed

    Wang, Guanjin; Lam, Kin-Man; Deng, Zhaohong; Choi, Kup-Sze

    2015-08-01

    Bladder cancer is a common cancer in genitourinary malignancy. For muscle invasive bladder cancer, surgical removal of the bladder, i.e. radical cystectomy, is in general the definitive treatment which, unfortunately, carries significant morbidities and mortalities. Accurate prediction of the mortality of radical cystectomy is therefore needed. Statistical methods have conventionally been used for this purpose, despite the complex interactions of high-dimensional medical data. Machine learning has emerged as a promising technique for handling high-dimensional data, with increasing application in clinical decision support, e.g. cancer prediction and prognosis. Its ability to reveal the hidden nonlinear interactions and interpretable rules between dependent and independent variables is favorable for constructing models of effective generalization performance. In this paper, seven machine learning methods are utilized to predict the 5-year mortality of radical cystectomy, including back-propagation neural network (BPN), radial basis function (RBFN), extreme learning machine (ELM), regularized ELM (RELM), support vector machine (SVM), naive Bayes (NB) classifier and k-nearest neighbour (KNN), on a clinicopathological dataset of 117 patients of the urology unit of a hospital in Hong Kong. The experimental results indicate that RELM achieved the highest average prediction accuracy of 0.8 at a fast learning speed. The research findings demonstrate the potential of applying machine learning techniques to support clinical decision making. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Energy-free machine learning force field for aluminum.

    PubMed

    Kruglov, Ivan; Sergeev, Oleg; Yanilkin, Alexey; Oganov, Artem R

    2017-08-17

    We used the machine learning technique of Li et al. (PRL 114, 2015) for molecular dynamics simulations. Atomic configurations were described by feature matrix based on internal vectors, and linear regression was used as a learning technique. We implemented this approach in the LAMMPS code. The method was applied to crystalline and liquid aluminum and uranium at different temperatures and densities, and showed the highest accuracy among different published potentials. Phonon density of states, entropy and melting temperature of aluminum were calculated using this machine learning potential. The results are in excellent agreement with experimental data and results of full ab initio calculations.

  7. Survey of Machine Learning Methods for Database Security

    NASA Astrophysics Data System (ADS)

    Kamra, Ashish; Ber, Elisa

    Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.

  8. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies.

    PubMed

    Hansen, Katja; Montavon, Grégoire; Biegler, Franziska; Fazli, Siamac; Rupp, Matthias; Scheffler, Matthias; von Lilienfeld, O Anatole; Tkatchenko, Alexandre; Müller, Klaus-Robert

    2013-08-13

    The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.

  9. Classification of the Regional Ionospheric Disturbance Based on Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Terzi, Merve Begum; Arikan, Orhan; Karatay, Secil; Arikan, Feza; Gulyaeva, Tamara

    2016-08-01

    In this study, Total Electron Content (TEC) estimated from GPS receivers is used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. For the automated classification of regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. Performance of developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing developed classification technique to Global Ionospheric Map (GIM) TEC data, which is provided by the NASA Jet Propulsion Laboratory (JPL), it is shown that SVM can be a suitable learning method to detect anomalies in TEC variations.

  10. Cognitive learning: a machine learning approach for automatic process characterization from design

    NASA Astrophysics Data System (ADS)

    Foucher, J.; Baderot, J.; Martinez, S.; Dervilllé, A.; Bernard, G.

    2018-03-01

    Cutting edge innovation requires accurate and fast process-control to obtain fast learning rate and industry adoption. Current tools available for such task are mainly manual and user dependent. We present in this paper cognitive learning, which is a new machine learning based technique to facilitate and to speed up complex characterization by using the design as input, providing fast training and detection time. We will focus on the machine learning framework that allows object detection, defect traceability and automatic measurement tools.

  11. Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.

    PubMed

    Sakr, Sherif; Elshawi, Radwa; Ahmed, Amjad M; Qureshi, Waqas T; Brawner, Clinton A; Keteyian, Steven J; Blaha, Michael J; Al-Mallah, Mouaz H

    2017-12-19

    Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality). We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used. Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling. The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness data.

  12. The Next Era: Deep Learning in Pharmaceutical Research.

    PubMed

    Ekins, Sean

    2016-11-01

    Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule's properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.

  13. Machine learning in autistic spectrum disorder behavioral research: A review and ways forward.

    PubMed

    Thabtah, Fadi

    2018-02-13

    Autistic Spectrum Disorder (ASD) is a mental disorder that retards acquisition of linguistic, communication, cognitive, and social skills and abilities. Despite being diagnosed with ASD, some individuals exhibit outstanding scholastic, non-academic, and artistic capabilities, in such cases posing a challenging task for scientists to provide answers. In the last few years, ASD has been investigated by social and computational intelligence scientists utilizing advanced technologies such as machine learning to improve diagnostic timing, precision, and quality. Machine learning is a multidisciplinary research topic that employs intelligent techniques to discover useful concealed patterns, which are utilized in prediction to improve decision making. Machine learning techniques such as support vector machines, decision trees, logistic regressions, and others, have been applied to datasets related to autism in order to construct predictive models. These models claim to enhance the ability of clinicians to provide robust diagnoses and prognoses of ASD. However, studies concerning the use of machine learning in ASD diagnosis and treatment suffer from conceptual, implementation, and data issues such as the way diagnostic codes are used, the type of feature selection employed, the evaluation measures chosen, and class imbalances in data among others. A more serious claim in recent studies is the development of a new method for ASD diagnoses based on machine learning. This article critically analyses these recent investigative studies on autism, not only articulating the aforementioned issues in these studies but also recommending paths forward that enhance machine learning use in ASD with respect to conceptualization, implementation, and data. Future studies concerning machine learning in autism research are greatly benefitted by such proposals.

  14. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    ERIC Educational Resources Information Center

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  15. Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter

    DTIC Science & Technology

    2008-05-20

    Machine learning techniques are often used for decision making in security critical applications such as intrusion detection and spam filtering...filter. The defenses shown in this thesis are able to work against the attacks developed against SpamBayes and are sufficiently generic to be easily extended into other statistical machine learning algorithms.

  16. Advances in Machine Learning and Data Mining for Astronomy

    NASA Astrophysics Data System (ADS)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  17. Prediction of antiepileptic drug treatment outcomes using machine learning.

    PubMed

    Colic, Sinisa; Wither, Robert G; Lang, Min; Zhang, Liang; Eubanks, James H; Bardakjian, Berj L

    2017-02-01

    Antiepileptic drug (AED) treatments produce inconsistent outcomes, often necessitating patients to go through several drug trials until a successful treatment can be found. This study proposes the use of machine learning techniques to predict epilepsy treatment outcomes of commonly used AEDs. Machine learning algorithms were trained and evaluated using features obtained from intracranial electroencephalogram (iEEG) recordings of the epileptiform discharges observed in Mecp2-deficient mouse model of the Rett Syndrome. Previous work have linked the presence of cross-frequency coupling (I CFC ) of the delta (2-5 Hz) rhythm with the fast ripple (400-600 Hz) rhythm in epileptiform discharges. Using the I CFC to label post-treatment outcomes we compared support vector machines (SVMs) and random forest (RF) machine learning classifiers for providing likelihood scores of successful treatment outcomes. (a) There was heterogeneity in AED treatment outcomes, (b) machine learning techniques could be used to rank the efficacy of AEDs by estimating likelihood scores for successful treatment outcome, (c) I CFC features yielded the most effective a priori identification of appropriate AED treatment, and (d) both classifiers performed comparably. Machine learning approaches yielded predictions of successful drug treatment outcomes which in turn could reduce the burdens of drug trials and lead to substantial improvements in patient quality of life.

  18. Prediction of antiepileptic drug treatment outcomes using machine learning

    NASA Astrophysics Data System (ADS)

    Colic, Sinisa; Wither, Robert G.; Lang, Min; Zhang, Liang; Eubanks, James H.; Bardakjian, Berj L.

    2017-02-01

    Objective. Antiepileptic drug (AED) treatments produce inconsistent outcomes, often necessitating patients to go through several drug trials until a successful treatment can be found. This study proposes the use of machine learning techniques to predict epilepsy treatment outcomes of commonly used AEDs. Approach. Machine learning algorithms were trained and evaluated using features obtained from intracranial electroencephalogram (iEEG) recordings of the epileptiform discharges observed in Mecp2-deficient mouse model of the Rett Syndrome. Previous work have linked the presence of cross-frequency coupling (I CFC) of the delta (2-5 Hz) rhythm with the fast ripple (400-600 Hz) rhythm in epileptiform discharges. Using the I CFC to label post-treatment outcomes we compared support vector machines (SVMs) and random forest (RF) machine learning classifiers for providing likelihood scores of successful treatment outcomes. Main results. (a) There was heterogeneity in AED treatment outcomes, (b) machine learning techniques could be used to rank the efficacy of AEDs by estimating likelihood scores for successful treatment outcome, (c) I CFC features yielded the most effective a priori identification of appropriate AED treatment, and (d) both classifiers performed comparably. Significance. Machine learning approaches yielded predictions of successful drug treatment outcomes which in turn could reduce the burdens of drug trials and lead to substantial improvements in patient quality of life.

  19. Implementing Machine Learning in Radiology Practice and Research.

    PubMed

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  20. Ryan King | NREL

    Science.gov Websites

    research focuses on optimization and machine learning applied to complex energy systems and turbulent flows techniques to improve wind plant design and controls and developed a new data-driven machine learning closure

  1. Machine learning and medicine: book review and commentary.

    PubMed

    Koprowski, Robert; Foster, Kenneth R

    2018-02-01

    This article is a review of the book "Master machine learning algorithms, discover how they work and implement them from scratch" (ISBN: not available, 37 USD, 163 pages) edited by Jason Brownlee published by the Author, edition, v1.10 http://MachineLearningMastery.com . An accompanying commentary discusses some of the issues that are involved with use of machine learning and data mining techniques to develop predictive models for diagnosis or prognosis of disease, and to call attention to additional requirements for developing diagnostic and prognostic algorithms that are generally useful in medicine. Appendix provides examples that illustrate potential problems with machine learning that are not addressed in the reviewed book.

  2. Machine Learning and Radiology

    PubMed Central

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  3. The Next Era: Deep Learning in Pharmaceutical Research

    PubMed Central

    Ekins, Sean

    2016-01-01

    Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule’s properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique. PMID:27599991

  4. Comparing SVM and ANN based Machine Learning Methods for Species Identification of Food Contaminating Beetles.

    PubMed

    Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua

    2018-04-25

    Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy  for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.

  5. Classification of older adults with/without a fall history using machine learning methods.

    PubMed

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.

  6. Testing and Validating Machine Learning Classifiers by Metamorphic Testing☆

    PubMed Central

    Xie, Xiaoyuan; Ho, Joshua W. K.; Murphy, Christian; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2011-01-01

    Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no “test oracle” to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program. PMID:21532969

  7. A Hybrid Method for Opinion Finding Task (KUNLP at TREC 2008 Blog Track)

    DTIC Science & Technology

    2008-11-01

    retrieve relevant documents. For the Opinion Retrieval subtask, we propose a hybrid model of lexicon-based approach and machine learning approach for...estimating and ranking the opinionated documents. For the Polarized Opinion Retrieval subtask, we employ machine learning for predicting the polarity...and linear combination technique for ranking polar documents. The hybrid model which utilize both lexicon-based approach and machine learning approach

  8. Feasibility of Active Machine Learning for Multiclass Compound Classification.

    PubMed

    Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias

    2016-01-25

    A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.

  9. Application of Machine Learning to Proteomics Data: Classification and Biomarker Identification in Postgenomics Biology

    PubMed Central

    Swan, Anna Louise; Mobasheri, Ali; Allaway, David; Liddell, Susan

    2013-01-01

    Abstract Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes. PMID:24116388

  10. Exploring Machine Learning Techniques Using Patient Interactions in Online Health Forums to Classify Drug Safety

    ERIC Educational Resources Information Center

    Chee, Brant Wah Kwong

    2011-01-01

    This dissertation explores the use of personal health messages collected from online message forums to predict drug safety using natural language processing and machine learning techniques. Drug safety is defined as any drug with an active safety alert from the US Food and Drug Administration (FDA). It is believed that this is the first…

  11. Application of Metamorphic Testing to Supervised Classifiers

    PubMed Central

    Xie, Xiaoyuan; Ho, Joshua; Kaiser, Gail; Xu, Baowen; Chen, Tsong Yueh

    2010-01-01

    Many applications in the field of scientific computing - such as computational biology, computational linguistics, and others - depend on Machine Learning algorithms to provide important core functionality to support solutions in the particular problem domains. However, it is difficult to test such applications because often there is no “test oracle” to indicate what the correct output should be for arbitrary input. To help address the quality of such software, in this paper we present a technique for testing the implementations of supervised machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called “metamorphic testing”, which has been shown to be effective in such cases. More importantly, we demonstrate that our technique not only serves the purpose of verification, but also can be applied in validation. In addition to presenting our technique, we describe a case study we performed on a real-world machine learning application framework, and discuss how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also discuss how our findings can be of use to other areas outside scientific computing, as well. PMID:21243103

  12. Machine-Learning Approach for Design of Nanomagnetic-Based Antennas

    NASA Astrophysics Data System (ADS)

    Gianfagna, Carmine; Yu, Huan; Swaminathan, Madhavan; Pulugurtha, Raj; Tummala, Rao; Antonini, Giulio

    2017-08-01

    We propose a machine-learning approach for design of planar inverted-F antennas with a magneto-dielectric nanocomposite substrate. It is shown that machine-learning techniques can be efficiently used to characterize nanomagnetic-based antennas by accurately mapping the particle radius and volume fraction of the nanomagnetic material to antenna parameters such as gain, bandwidth, radiation efficiency, and resonant frequency. A modified mixing rule model is also presented. In addition, the inverse problem is addressed through machine learning as well, where given the antenna parameters, the corresponding design space of possible material parameters is identified.

  13. Using Machine Learning for Behavior-Based Access Control: Scalable Anomaly Detection on TCP Connections and HTTP Requests

    DTIC Science & Technology

    2013-11-01

    machine learning techniques used in BBAC to make predictions about the intent of actors establishing TCP connections and issuing HTTP requests. We discuss pragmatic challenges and solutions we encountered in implementing and evaluating BBAC, discussing (a) the general concepts underlying BBAC, (b) challenges we have encountered in identifying suitable datasets, (c) mitigation strategies to cope...and describe current plans for transitioning BBAC capabilities into the Department of Defense together with lessons learned for the machine learning

  14. An experimental result of estimating an application volume by machine learning techniques.

    PubMed

    Hasegawa, Tatsuhito; Koshino, Makoto; Kimura, Haruhiko

    2015-01-01

    In this study, we improved the usability of smartphones by automating a user's operations. We developed an intelligent system using machine learning techniques that periodically detects a user's context on a smartphone. We selected the Android operating system because it has the largest market share and highest flexibility of its development environment. In this paper, we describe an application that automatically adjusts application volume. Adjusting the volume can be easily forgotten because users need to push the volume buttons to alter the volume depending on the given situation. Therefore, we developed an application that automatically adjusts the volume based on learned user settings. Application volume can be set differently from ringtone volume on Android devices, and these volume settings are associated with each specific application including games. Our application records a user's location, the volume setting, the foreground application name and other such attributes as learning data, thereby estimating whether the volume should be adjusted using machine learning techniques via Weka.

  15. Machine learning and microsimulation techniques on the prognosis of dementia: A systematic literature review.

    PubMed

    Dallora, Ana Luiza; Eivazzadeh, Shahryar; Mendes, Emilia; Berglund, Johan; Anderberg, Peter

    2017-01-01

    Dementia is a complex disorder characterized by poor outcomes for the patients and high costs of care. After decades of research little is known about its mechanisms. Having prognostic estimates about dementia can help researchers, patients and public entities in dealing with this disorder. Thus, health data, machine learning and microsimulation techniques could be employed in developing prognostic estimates for dementia. The goal of this paper is to present evidence on the state of the art of studies investigating and the prognosis of dementia using machine learning and microsimulation techniques. To achieve our goal we carried out a systematic literature review, in which three large databases-Pubmed, Socups and Web of Science were searched to select studies that employed machine learning or microsimulation techniques for the prognosis of dementia. A single backward snowballing was done to identify further studies. A quality checklist was also employed to assess the quality of the evidence presented by the selected studies, and low quality studies were removed. Finally, data from the final set of studies were extracted in summary tables. In total 37 papers were included. The data summary results showed that the current research is focused on the investigation of the patients with mild cognitive impairment that will evolve to Alzheimer's disease, using machine learning techniques. Microsimulation studies were concerned with cost estimation and had a populational focus. Neuroimaging was the most commonly used variable. Prediction of conversion from MCI to AD is the dominant theme in the selected studies. Most studies used ML techniques on Neuroimaging data. Only a few data sources have been recruited by most studies and the ADNI database is the one most commonly used. Only two studies have investigated the prediction of epidemiological aspects of Dementia using either ML or MS techniques. Finally, care should be taken when interpreting the reported accuracy of ML techniques, given studies' different contexts.

  16. Machine learning and microsimulation techniques on the prognosis of dementia: A systematic literature review

    PubMed Central

    Mendes, Emilia; Berglund, Johan; Anderberg, Peter

    2017-01-01

    Background Dementia is a complex disorder characterized by poor outcomes for the patients and high costs of care. After decades of research little is known about its mechanisms. Having prognostic estimates about dementia can help researchers, patients and public entities in dealing with this disorder. Thus, health data, machine learning and microsimulation techniques could be employed in developing prognostic estimates for dementia. Objective The goal of this paper is to present evidence on the state of the art of studies investigating and the prognosis of dementia using machine learning and microsimulation techniques. Method To achieve our goal we carried out a systematic literature review, in which three large databases—Pubmed, Socups and Web of Science were searched to select studies that employed machine learning or microsimulation techniques for the prognosis of dementia. A single backward snowballing was done to identify further studies. A quality checklist was also employed to assess the quality of the evidence presented by the selected studies, and low quality studies were removed. Finally, data from the final set of studies were extracted in summary tables. Results In total 37 papers were included. The data summary results showed that the current research is focused on the investigation of the patients with mild cognitive impairment that will evolve to Alzheimer’s disease, using machine learning techniques. Microsimulation studies were concerned with cost estimation and had a populational focus. Neuroimaging was the most commonly used variable. Conclusions Prediction of conversion from MCI to AD is the dominant theme in the selected studies. Most studies used ML techniques on Neuroimaging data. Only a few data sources have been recruited by most studies and the ADNI database is the one most commonly used. Only two studies have investigated the prediction of epidemiological aspects of Dementia using either ML or MS techniques. Finally, care should be taken when interpreting the reported accuracy of ML techniques, given studies’ different contexts. PMID:28662070

  17. Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

    NASA Astrophysics Data System (ADS)

    Karsi, Redouane; Zaim, Mounia; El Alami, Jamila

    2017-07-01

    Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.

  18. An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge

    ERIC Educational Resources Information Center

    Mivule, Kato

    2014-01-01

    The purpose of this investigation is to study and pursue a user-defined approach in preserving data privacy while maintaining an acceptable level of data utility using machine learning classification techniques as a gauge in the generation of synthetic data sets. This dissertation will deal with data privacy, data utility, machine learning…

  19. Machine learning molecular dynamics for the simulation of infrared spectra.

    PubMed

    Gastegger, Michael; Behler, Jörg; Marquetand, Philipp

    2017-10-01

    Machine learning has emerged as an invaluable tool in many research areas. In the present work, we harness this power to predict highly accurate molecular infrared spectra with unprecedented computational efficiency. To account for vibrational anharmonic and dynamical effects - typically neglected by conventional quantum chemistry approaches - we base our machine learning strategy on ab initio molecular dynamics simulations. While these simulations are usually extremely time consuming even for small molecules, we overcome these limitations by leveraging the power of a variety of machine learning techniques, not only accelerating simulations by several orders of magnitude, but also greatly extending the size of systems that can be treated. To this end, we develop a molecular dipole moment model based on environment dependent neural network charges and combine it with the neural network potential approach of Behler and Parrinello. Contrary to the prevalent big data philosophy, we are able to obtain very accurate machine learning models for the prediction of infrared spectra based on only a few hundreds of electronic structure reference points. This is made possible through the use of molecular forces during neural network potential training and the introduction of a fully automated sampling scheme. We demonstrate the power of our machine learning approach by applying it to model the infrared spectra of a methanol molecule, n -alkanes containing up to 200 atoms and the protonated alanine tripeptide, which at the same time represents the first application of machine learning techniques to simulate the dynamics of a peptide. In all of these case studies we find an excellent agreement between the infrared spectra predicted via machine learning models and the respective theoretical and experimental spectra.

  20. Towards large-scale FAME-based bacterial species identification using machine learning techniques.

    PubMed

    Slabbinck, Bram; De Baets, Bernard; Dawyndt, Peter; De Vos, Paul

    2009-05-01

    In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy.

  1. Machine Learning Techniques for Stellar Light Curve Classification

    NASA Astrophysics Data System (ADS)

    Hinners, Trisha A.; Tat, Kevin; Thorp, Rachel

    2018-07-01

    We apply machine learning techniques in an attempt to predict and classify stellar properties from noisy and sparse time-series data. We preprocessed over 94 GB of Kepler light curves from the Mikulski Archive for Space Telescopes (MAST) to classify according to 10 distinct physical properties using both representation learning and feature engineering approaches. Studies using machine learning in the field have been primarily done on simulated data, making our study one of the first to use real light-curve data for machine learning approaches. We tuned our data using previous work with simulated data as a template and achieved mixed results between the two approaches. Representation learning using a long short-term memory recurrent neural network produced no successful predictions, but our work with feature engineering was successful for both classification and regression. In particular, we were able to achieve values for stellar density, stellar radius, and effective temperature with low error (∼2%–4%) and good accuracy (∼75%) for classifying the number of transits for a given star. The results show promise for improvement for both approaches upon using larger data sets with a larger minority class. This work has the potential to provide a foundation for future tools and techniques to aid in the analysis of astrophysical data.

  2. Machine Learning and Inverse Problem in Geodynamics

    NASA Astrophysics Data System (ADS)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R.

    2017-12-01

    During the past few decades numerical modeling and traditional HPC have been widely deployed in many diverse fields for problem solutions. However, in recent years the rapid emergence of machine learning (ML), a subfield of the artificial intelligence (AI), in many fields of sciences, engineering, and finance seems to mark a turning point in the replacement of traditional modeling procedures with artificial intelligence-based techniques. The study of the circulation in the interior of Earth relies on the study of high pressure mineral physics, geochemistry, and petrology where the number of the mantle parameters is large and the thermoelastic parameters are highly pressure- and temperature-dependent. More complexity arises from the fact that many of these parameters that are incorporated in the numerical models as input parameters are not yet well established. In such complex systems the application of machine learning algorithms can play a valuable role. Our focus in this study is the application of supervised machine learning (SML) algorithms in predicting mantle properties with the emphasis on SML techniques in solving the inverse problem. As a sample problem we focus on the spin transition in ferropericlase and perovskite that may cause slab and plume stagnation at mid-mantle depths. The degree of the stagnation depends on the degree of negative density anomaly at the spin transition zone. The training and testing samples for the machine learning models are produced by the numerical convection models with known magnitudes of density anomaly (as the class labels of the samples). The volume fractions of the stagnated slabs and plumes which can be considered as measures for the degree of stagnation are assigned as sample features. The machine learning models can determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at mid-mantle depths. Employing support vector machine (SVM) algorithms we show that SML techniques can successfully predict the magnitude of the mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex problems in mantle dynamics by employing deep learning algorithms for estimation of mantle properties such as viscosity, elastic parameters, and thermal and chemical anomalies.

  3. Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

    PubMed

    Janik, M; Bossew, P; Kurihara, O

    2018-07-15

    Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way. In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon ( 222 Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms "learn" from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques. By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values. Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical predictors, while "day of the year" is a statistical proxy or surrogate for missing or unknown predictors. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Machine learning and radiology.

    PubMed

    Wang, Shijun; Summers, Ronald M

    2012-07-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.

  5. Taxi-Out Time Prediction for Departures at Charlotte Airport Using Machine Learning Techniques

    NASA Technical Reports Server (NTRS)

    Lee, Hanbong; Malik, Waqar; Jung, Yoon C.

    2016-01-01

    Predicting the taxi-out times of departures accurately is important for improving airport efficiency and takeoff time predictability. In this paper, we attempt to apply machine learning techniques to actual traffic data at Charlotte Douglas International Airport for taxi-out time prediction. To find the key factors affecting aircraft taxi times, surface surveillance data is first analyzed. From this data analysis, several variables, including terminal concourse, spot, runway, departure fix and weight class, are selected for taxi time prediction. Then, various machine learning methods such as linear regression, support vector machines, k-nearest neighbors, random forest, and neural networks model are applied to actual flight data. Different traffic flow and weather conditions at Charlotte airport are also taken into account for more accurate prediction. The taxi-out time prediction results show that linear regression and random forest techniques can provide the most accurate prediction in terms of root-mean-square errors. We also discuss the operational complexity and uncertainties that make it difficult to predict the taxi times accurately.

  6. Exploration of Machine Learning Approaches to Predict Pavement Performance

    DOT National Transportation Integrated Search

    2018-03-23

    Machine learning (ML) techniques were used to model and predict pavement condition index (PCI) for various pavement types using a variety of input variables. The primary objective of this research was to develop and assess PCI predictive models for t...

  7. Position Paper: Applying Machine Learning to Software Analysis to Achieve Trusted, Repeatable Scientific Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prowell, Stacy J; Symons, Christopher T

    2015-01-01

    Producing trusted results from high-performance codes is essential for policy and has significant economic impact. We propose combining rigorous analytical methods with machine learning techniques to achieve the goal of repeatable, trustworthy scientific computing.

  8. Using deep learning for content-based medical image retrieval

    NASA Astrophysics Data System (ADS)

    Sun, Qinpei; Yang, Yuanyuan; Sun, Jianyong; Yang, Zhiming; Zhang, Jianguo

    2017-03-01

    Content-Based medical image retrieval (CBMIR) is been highly active research area from past few years. The retrieval performance of a CBMIR system crucially depends on the feature representation, which have been extensively studied by researchers for decades. Although a variety of techniques have been proposed, it remains one of the most challenging problems in current CBMIR research, which is mainly due to the well-known "semantic gap" issue that exists between low-level image pixels captured by machines and high-level semantic concepts perceived by human[1]. Recent years have witnessed some important advances of new techniques in machine learning. One important breakthrough technique is known as "deep learning". Unlike conventional machine learning methods that are often using "shallow" architectures, deep learning mimics the human brain that is organized in a deep architecture and processes information through multiple stages of transformation and representation. This means that we do not need to spend enormous energy to extract features manually. In this presentation, we propose a novel framework which uses deep learning to retrieval the medical image to improve the accuracy and speed of a CBIR in integrated RIS/PACS.

  9. Machine Learning for Medical Imaging

    PubMed Central

    Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L.

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. ©RSNA, 2017 PMID:28212054

  10. Machine Learning for Medical Imaging.

    PubMed

    Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L

    2017-01-01

    Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.

  11. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach

    PubMed Central

    Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets–Maximum Unbiased Validation Dataset–which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6. PMID:29652912

  12. Multiple-Swarm Ensembles: Improving the Predictive Power and Robustness of Predictive Models and Its Use in Computational Biology.

    PubMed

    Alves, Pedro; Liu, Shuang; Wang, Daifeng; Gerstein, Mark

    2018-01-01

    Machine learning is an integral part of computational biology, and has already shown its use in various applications, such as prognostic tests. In the last few years in the non-biological machine learning community, ensembling techniques have shown their power in data mining competitions such as the Netflix challenge; however, such methods have not found wide use in computational biology. In this work, we endeavor to show how ensembling techniques can be applied to practical problems, including problems in the field of bioinformatics, and how they often outperform other machine learning techniques in both predictive power and robustness. Furthermore, we develop a methodology of ensembling, Multi-Swarm Ensemble (MSWE) by using multiple particle swarm optimizations and demonstrate its ability to further enhance the performance of ensembles.

  13. On the Conditioning of Machine-Learning-Assisted Turbulence Modeling

    NASA Astrophysics Data System (ADS)

    Wu, Jinlong; Sun, Rui; Wang, Qiqi; Xiao, Heng

    2017-11-01

    Recently, several researchers have demonstrated that machine learning techniques can be used to improve the RANS modeled Reynolds stress by training on available database of high fidelity simulations. However, obtaining improved mean velocity field remains an unsolved challenge, restricting the predictive capability of current machine-learning-assisted turbulence modeling approaches. In this work we define a condition number to evaluate the model conditioning of data-driven turbulence modeling approaches, and propose a stability-oriented machine learning framework to model Reynolds stress. Two canonical flows, the flow in a square duct and the flow over periodic hills, are investigated to demonstrate the predictive capability of the proposed framework. The satisfactory prediction performance of mean velocity field for both flows demonstrates the predictive capability of the proposed framework for machine-learning-assisted turbulence modeling. With showing the capability of improving the prediction of mean flow field, the proposed stability-oriented machine learning framework bridges the gap between the existing machine-learning-assisted turbulence modeling approaches and the demand of predictive capability of turbulence models in real applications.

  14. Biomarkers for Musculoskeletal Pain Conditions: Use of Brain Imaging and Machine Learning.

    PubMed

    Boissoneault, Jeff; Sevel, Landrew; Letzen, Janelle; Robinson, Michael; Staud, Roland

    2017-01-01

    Chronic musculoskeletal pain condition often shows poor correlations between tissue abnormalities and clinical pain. Therefore, classification of pain conditions like chronic low back pain, osteoarthritis, and fibromyalgia depends mostly on self report and less on objective findings like X-ray or magnetic resonance imaging (MRI) changes. However, recent advances in structural and functional brain imaging have identified brain abnormalities in chronic pain conditions that can be used for illness classification. Because the analysis of complex and multivariate brain imaging data is challenging, machine learning techniques have been increasingly utilized for this purpose. The goal of machine learning is to train specific classifiers to best identify variables of interest on brain MRIs (i.e., biomarkers). This report describes classification techniques capable of separating MRI-based brain biomarkers of chronic pain patients from healthy controls with high accuracy (70-92%) using machine learning, as well as critical scientific, practical, and ethical considerations related to their potential clinical application. Although self-report remains the gold standard for pain assessment, machine learning may aid in the classification of chronic pain disorders like chronic back pain and fibromyalgia as well as provide mechanistic information regarding their neural correlates.

  15. Logic Learning Machine and standard supervised methods for Hodgkin's lymphoma prognosis using gene expression data and clinical variables.

    PubMed

    Parodi, Stefano; Manneschi, Chiara; Verda, Damiano; Ferrari, Enrico; Muselli, Marco

    2018-03-01

    This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin's lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin's lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms ( k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene ( XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin's lymphoma patients.

  16. Machine learning for autonomous crystal structure identification.

    PubMed

    Reinhart, Wesley F; Long, Andrew W; Howard, Michael P; Ferguson, Andrew L; Panagiotopoulos, Athanassios Z

    2017-07-21

    We present a machine learning technique to discover and distinguish relevant ordered structures from molecular simulation snapshots or particle tracking data. Unlike other popular methods for structural identification, our technique requires no a priori description of the target structures. Instead, we use nonlinear manifold learning to infer structural relationships between particles according to the topology of their local environment. This graph-based approach yields unbiased structural information which allows us to quantify the crystalline character of particles near defects, grain boundaries, and interfaces. We demonstrate the method by classifying particles in a simulation of colloidal crystallization, and show that our method identifies structural features that are missed by standard techniques.

  17. Machine learning and data science in soft materials engineering

    NASA Astrophysics Data System (ADS)

    Ferguson, Andrew L.

    2018-01-01

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by ‘de-jargonizing’ data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  18. Machine learning and data science in soft materials engineering.

    PubMed

    Ferguson, Andrew L

    2018-01-31

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  19. Designing a holistic end-to-end intelligent network analysis and security platform

    NASA Astrophysics Data System (ADS)

    Alzahrani, M.

    2018-03-01

    Firewall protects a network from outside attacks, however, once an attack entering a network, it is difficult to detect. Recent significance accidents happened. i.e.: millions of Yahoo email account were stolen and crucial data from institutions are held for ransom. Within two year Yahoo’s system administrators were not aware that there are intruder inside the network. This happened due to the lack of intelligent tools to monitor user behaviour in internal network. This paper discusses a design of an intelligent anomaly/malware detection system with proper proactive actions. The aim is to equip the system administrator with a proper tool to battle the insider attackers. The proposed system adopts machine learning to analyse user’s behaviour through the runtime behaviour of each node in the network. The machine learning techniques include: deep learning, evolving machine learning perceptron, hybrid of Neural Network and Fuzzy, as well as predictive memory techniques. The proposed system is expanded to deal with larger network using agent techniques.

  20. Comparison of Automated and Manual Recording of Brief Episodes of Intracranial Hypertension and Cerebral Hypoperfusion and Their Association with Outcome After Severe Traumatic Brain Injury

    DTIC Science & Technology

    2017-03-01

    neuro ICP care beyond trauma care. 15. SUBJECT TERMS Advanced machine learning techniques, intracranial pressure, vital signs, monitoring...death and disability in combat casualties [1,2]. Approximately 2 million head injuries occur annually in the United States, resulting in more than...editor. Machine learning and data mining in pattern recognition. Proceedings of the 8th International Workshop on Machine Learning and Data Mining in

  1. Machine Learning for Biological Trajectory Classification Applications

    NASA Technical Reports Server (NTRS)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  2. Multi-centre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder.

    PubMed

    Mwangi, Benson; Ebmeier, Klaus P; Matthews, Keith; Steele, J Douglas

    2012-05-01

    Quantitative abnormalities of brain structure in patients with major depressive disorder have been reported at a group level for decades. However, these structural differences appear subtle in comparison with conventional radiologically defined abnormalities, with considerable inter-subject variability. Consequently, it has not been possible to readily identify scans from patients with major depressive disorder at an individual level. Recently, machine learning techniques such as relevance vector machines and support vector machines have been applied to predictive classification of individual scans with variable success. Here we describe a novel hybrid method, which combines machine learning with feature selection and characterization, with the latter aimed at maximizing the accuracy of machine learning prediction. The method was tested using a multi-centre dataset of T(1)-weighted 'structural' scans. A total of 62 patients with major depressive disorder and matched controls were recruited from referred secondary care clinical populations in Aberdeen and Edinburgh, UK. The generalization ability and predictive accuracy of the classifiers was tested using data left out of the training process. High prediction accuracy was achieved (~90%). While feature selection was important for maximizing high predictive accuracy with machine learning, feature characterization contributed only a modest improvement to relevance vector machine-based prediction (~5%). Notably, while the only information provided for training the classifiers was T(1)-weighted scans plus a categorical label (major depressive disorder versus controls), both relevance vector machine and support vector machine 'weighting factors' (used for making predictions) correlated strongly with subjective ratings of illness severity. These results indicate that machine learning techniques have the potential to inform clinical practice and research, as they can make accurate predictions about brain scan data from individual subjects. Furthermore, machine learning weighting factors may reflect an objective biomarker of major depressive disorder illness severity, based on abnormalities of brain structure.

  3. Energy landscapes for machine learning

    NASA Astrophysics Data System (ADS)

    Ballard, Andrew J.; Das, Ritankar; Martiniani, Stefano; Mehta, Dhagash; Sagun, Levent; Stevenson, Jacob D.; Wales, David J.

    Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to gain new insight into the solution space involved in training and the nature of the corresponding predictions. In particular, we can define quantities analogous to molecular structure, thermodynamics, and kinetics, and relate these emergent properties to the structure of the underlying landscape. This Perspective aims to describe these analogies with examples from recent applications, and suggest avenues for new interdisciplinary research.

  4. Machine Learning Toolkit for Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-03-31

    Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less

  5. Using Machine Learning Techniques in the Analysis of Oceanographic Data

    NASA Astrophysics Data System (ADS)

    Falcinelli, K. E.; Abuomar, S.

    2017-12-01

    Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.

  6. Feature-Free Activity Classification of Inertial Sensor Data With Machine Vision Techniques: Method, Development, and Evaluation.

    PubMed

    Dominguez Veiga, Jose Juan; O'Reilly, Martin; Whelan, Darragh; Caulfield, Brian; Ward, Tomas E

    2017-08-04

    Inertial sensors are one of the most commonly used sources of data for human activity recognition (HAR) and exercise detection (ED) tasks. The time series produced by these sensors are generally analyzed through numerical methods. Machine learning techniques such as random forests or support vector machines are popular in this field for classification efforts, but they need to be supported through the isolation of a potentially large number of additionally crafted features derived from the raw data. This feature preprocessing step can involve nontrivial digital signal processing (DSP) techniques. However, in many cases, the researchers interested in this type of activity recognition problems do not possess the necessary technical background for this feature-set development. The study aimed to present a novel application of established machine vision methods to provide interested researchers with an easier entry path into the HAR and ED fields. This can be achieved by removing the need for deep DSP skills through the use of transfer learning. This can be done by using a pretrained convolutional neural network (CNN) developed for machine vision purposes for exercise classification effort. The new method should simply require researchers to generate plots of the signals that they would like to build classifiers with, store them as images, and then place them in folders according to their training label before retraining the network. We applied a CNN, an established machine vision technique, to the task of ED. Tensorflow, a high-level framework for machine learning, was used to facilitate infrastructure needs. Simple time series plots generated directly from accelerometer and gyroscope signals are used to retrain an openly available neural network (Inception), originally developed for machine vision tasks. Data from 82 healthy volunteers, performing 5 different exercises while wearing a lumbar-worn inertial measurement unit (IMU), was collected. The ability of the proposed method to automatically classify the exercise being completed was assessed using this dataset. For comparative purposes, classification using the same dataset was also performed using the more conventional approach of feature-extraction and classification using random forest classifiers. With the collected dataset and the proposed method, the different exercises could be recognized with a 95.89% (3827/3991) accuracy, which is competitive with current state-of-the-art techniques in ED. The high level of accuracy attained with the proposed approach indicates that the waveform morphologies in the time-series plots for each of the exercises is sufficiently distinct among the participants to allow the use of machine vision approaches. The use of high-level machine learning frameworks, coupled with the novel use of machine vision techniques instead of complex manually crafted features, may facilitate access to research in the HAR field for individuals without extensive digital signal processing or machine learning backgrounds. ©Jose Juan Dominguez Veiga, Martin O'Reilly, Darragh Whelan, Brian Caulfield, Tomas E Ward. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 04.08.2017.

  7. Feature-Free Activity Classification of Inertial Sensor Data With Machine Vision Techniques: Method, Development, and Evaluation

    PubMed Central

    O'Reilly, Martin; Whelan, Darragh; Caulfield, Brian; Ward, Tomas E

    2017-01-01

    Background Inertial sensors are one of the most commonly used sources of data for human activity recognition (HAR) and exercise detection (ED) tasks. The time series produced by these sensors are generally analyzed through numerical methods. Machine learning techniques such as random forests or support vector machines are popular in this field for classification efforts, but they need to be supported through the isolation of a potentially large number of additionally crafted features derived from the raw data. This feature preprocessing step can involve nontrivial digital signal processing (DSP) techniques. However, in many cases, the researchers interested in this type of activity recognition problems do not possess the necessary technical background for this feature-set development. Objective The study aimed to present a novel application of established machine vision methods to provide interested researchers with an easier entry path into the HAR and ED fields. This can be achieved by removing the need for deep DSP skills through the use of transfer learning. This can be done by using a pretrained convolutional neural network (CNN) developed for machine vision purposes for exercise classification effort. The new method should simply require researchers to generate plots of the signals that they would like to build classifiers with, store them as images, and then place them in folders according to their training label before retraining the network. Methods We applied a CNN, an established machine vision technique, to the task of ED. Tensorflow, a high-level framework for machine learning, was used to facilitate infrastructure needs. Simple time series plots generated directly from accelerometer and gyroscope signals are used to retrain an openly available neural network (Inception), originally developed for machine vision tasks. Data from 82 healthy volunteers, performing 5 different exercises while wearing a lumbar-worn inertial measurement unit (IMU), was collected. The ability of the proposed method to automatically classify the exercise being completed was assessed using this dataset. For comparative purposes, classification using the same dataset was also performed using the more conventional approach of feature-extraction and classification using random forest classifiers. Results With the collected dataset and the proposed method, the different exercises could be recognized with a 95.89% (3827/3991) accuracy, which is competitive with current state-of-the-art techniques in ED. Conclusions The high level of accuracy attained with the proposed approach indicates that the waveform morphologies in the time-series plots for each of the exercises is sufficiently distinct among the participants to allow the use of machine vision approaches. The use of high-level machine learning frameworks, coupled with the novel use of machine vision techniques instead of complex manually crafted features, may facilitate access to research in the HAR field for individuals without extensive digital signal processing or machine learning backgrounds. PMID:28778851

  8. Learning About Climate and Atmospheric Models Through Machine Learning

    NASA Astrophysics Data System (ADS)

    Lucas, D. D.

    2017-12-01

    From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  9. Linear- and Repetitive Feature Detection Within Remotely Sensed Imagery

    DTIC Science & Technology

    2017-04-01

    applicable to Python or other pro- gramming languages with image- processing capabilities. 4.1 Classification machine learning The first methodology uses...remotely sensed images that are in panchromatic or true-color formats. Image- processing techniques, in- cluding Hough transforms, machine learning, and...data fusion .................................................................................................... 44 6.3 Context-based processing

  10. An automatic taxonomy of galaxy morphology using unsupervised machine learning

    NASA Astrophysics Data System (ADS)

    Hocking, Alex; Geach, James E.; Sun, Yi; Davey, Neil

    2018-01-01

    We present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previous unsupervised machine learning approaches used in astronomy we use no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. We demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the algorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS 0416.1-2403), we show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an 'early' or 'late' type galaxy is. We then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of approximately 60 000 classifications. We show how the automatic classification groups galaxies of similar morphological (and photometric) type and make the classifications public via a catalogue, a visual catalogue and galaxy similarity search. We compare the CANDELS machine-based classifications to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping between Galaxy Zoo and our hierarchical labelling, we demonstrate a good level of concordance between human and machine classifications. Finally, we show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging.

  11. Machine learning in cardiovascular medicine: are we there yet?

    PubMed

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  12. Applications of Support Vector Machines In Chemo And Bioinformatics

    NASA Astrophysics Data System (ADS)

    Jayaraman, V. K.; Sundararajan, V.

    2010-10-01

    Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.

  13. Machine Learning Prediction of the Energy Gap of Graphene Nanoflakes Using Topological Autocorrelation Vectors.

    PubMed

    Fernandez, Michael; Abreu, Jose I; Shi, Hongqing; Barnard, Amanda S

    2016-11-14

    The possibility of band gap engineering in graphene opens countless new opportunities for application in nanoelectronics. In this work, the energy gaps of 622 computationally optimized graphene nanoflakes were mapped to topological autocorrelation vectors using machine learning techniques. Machine learning modeling revealed that the most relevant correlations appear at topological distances in the range of 1 to 42 with prediction accuracy higher than 80%. The data-driven model can statistically discriminate between graphene nanoflakes with different energy gaps on the basis of their molecular topology.

  14. Experimental Machine Learning of Quantum States

    NASA Astrophysics Data System (ADS)

    Gao, Jun; Qiao, Lu-Feng; Jiao, Zhi-Qiang; Ma, Yue-Chi; Hu, Cheng-Qiu; Ren, Ruo-Jing; Yang, Ai-Lin; Tang, Hao; Yung, Man-Hong; Jin, Xian-Min

    2018-06-01

    Quantum information technologies provide promising applications in communication and computation, while machine learning has become a powerful technique for extracting meaningful structures in "big data." A crossover between quantum information and machine learning represents a new interdisciplinary area stimulating progress in both fields. Traditionally, a quantum state is characterized by quantum-state tomography, which is a resource-consuming process when scaled up. Here we experimentally demonstrate a machine-learning approach to construct a quantum-state classifier for identifying the separability of quantum states. We show that it is possible to experimentally train an artificial neural network to efficiently learn and classify quantum states, without the need of obtaining the full information of the states. We also show how adding a hidden layer of neurons to the neural network can significantly boost the performance of the state classifier. These results shed new light on how classification of quantum states can be achieved with limited resources, and represent a step towards machine-learning-based applications in quantum information processing.

  15. An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features.

    PubMed

    Nandi, Sutanu; Subramanian, Abhishek; Sarkar, Ram Rup

    2017-07-25

    Prediction of essential genes helps to identify a minimal set of genes that are absolutely required for the appropriate functioning and survival of a cell. The available machine learning techniques for essential gene prediction have inherent problems, like imbalanced provision of training datasets, biased choice of the best model for a given balanced dataset, choice of a complex machine learning algorithm, and data-based automated selection of biologically relevant features for classification. Here, we propose a simple support vector machine-based learning strategy for the prediction of essential genes in Escherichia coli K-12 MG1655 metabolism that integrates a non-conventional combination of an appropriate sample balanced training set, a unique organism-specific genotype, phenotype attributes that characterize essential genes, and optimal parameters of the learning algorithm to generate the best machine learning model (the model with the highest accuracy among all the models trained for different sample training sets). For the first time, we also introduce flux-coupled metabolic subnetwork-based features for enhancing the classification performance. Our strategy proves to be superior as compared to previous SVM-based strategies in obtaining a biologically relevant classification of genes with high sensitivity and specificity. This methodology was also trained with datasets of other recent supervised classification techniques for essential gene classification and tested using reported test datasets. The testing accuracy was always high as compared to the known techniques, proving that our method outperforms known methods. Observations from our study indicate that essential genes are conserved among homologous bacterial species, demonstrate high codon usage bias, GC content and gene expression, and predominantly possess a tendency to form physiological flux modules in metabolism.

  16. Applying machine learning classification techniques to automate sky object cataloguing

    NASA Astrophysics Data System (ADS)

    Fayyad, Usama M.; Doyle, Richard J.; Weir, W. Nick; Djorgovski, Stanislav

    1993-08-01

    We describe the application of an Artificial Intelligence machine learning techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Mt. Palomar Northern Sky Survey is nearly completed. This survey provides comprehensive coverage of the northern celestial hemisphere in the form of photographic plates. The plates are being transformed into digitized images whose quality will probably not be surpassed in the next ten to twenty years. The images are expected to contain on the order of 107 galaxies and 108 stars. Astronomers wish to determine which of these sky objects belong to various classes of galaxies and stars. Unfortunately, the size of this data set precludes analysis in an exclusively manual fashion. Our approach is to develop a software system which integrates the functions of independently developed techniques for image processing and data classification. Digitized sky images are passed through image processing routines to identify sky objects and to extract a set of features for each object. These routines are used to help select a useful set of attributes for classifying sky objects. Then GID3 (Generalized ID3) and O-B Tree, two inductive learning techniques, learns classification decision trees from examples. These classifiers will then be applied to new data. These developmnent process is highly interactive, with astronomer input playing a vital role. Astronomers refine the feature set used to construct sky object descriptions, and evaluate the performance of the automated classification technique on new data. This paper gives an overview of the machine learning techniques with an emphasis on their general applicability, describes the details of our specific application, and reports the initial encouraging results. The results indicate that our machine learning approach is well-suited to the problem. The primary benefit of the approach is increased data reduction throughput. Another benefit is consistency of classification. The classification rules which are the product of the inductive learning techniques will form an objective, examinable basis for classifying sky objects. A final, not to be underestimated benefit is that astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems based on automatically catalogued data.

  17. Machine Learning Based Evaluation of Reading and Writing Difficulties.

    PubMed

    Iwabuchi, Mamoru; Hirabayashi, Rumi; Nakamura, Kenryu; Dim, Nem Khan

    2017-01-01

    The possibility of auto evaluation of reading and writing difficulties was investigated using non-parametric machine learning (ML) regression technique for URAWSS (Understanding Reading and Writing Skills of Schoolchildren) [1] test data of 168 children of grade 1 - 9. The result showed that the ML had better prediction than the ordinary rule-based decision.

  18. Acquiring Software Design Schemas: A Machine Learning Perspective

    NASA Technical Reports Server (NTRS)

    Harandi, Mehdi T.; Lee, Hing-Yan

    1991-01-01

    In this paper, we describe an approach based on machine learning that acquires software design schemas from design cases of existing applications. An overview of the technique, design representation, and acquisition system are presented. the paper also addresses issues associated with generalizing common features such as biases. The generalization process is illustrated using an example.

  19. Episode forecasting in bipolar disorder: Is energy better than mood?

    PubMed

    Ortiz, Abigail; Bradler, Kamil; Hintze, Arend

    2018-01-22

    Bipolar disorder is a severe mood disorder characterized by alternating episodes of mania and depression. Several interventions have been developed to decrease high admission rates and high suicides rates associated with the illness, including psychoeducation and early episode detection, with mixed results. More recently, machine learning approaches have been used to aid clinical diagnosis or to detect a particular clinical state; however, contradictory results arise from confusion around which of the several automatically generated data are the most contributory and useful to detect a particular clinical state. Our aim for this study was to apply machine learning techniques and nonlinear analyses to a physiological time series dataset in order to find the best predictor for forecasting episodes in mood disorders. We employed three different techniques: entropy calculations and two different machine learning approaches (genetic programming and Markov Brains as classifiers) to determine whether mood, energy or sleep was the best predictor to forecast a mood episode in a physiological time series. Evening energy was the best predictor for both manic and depressive episodes in each of the three aforementioned techniques. This suggests that energy might be a better predictor than mood for forecasting mood episodes in bipolar disorder and that these particular machine learning approaches are valuable tools to be used clinically. Energy should be considered as an important factor for episode prediction. Machine learning approaches provide better tools to forecast episodes and to increase our understanding of the processes that underlie mood regulation. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. Cardiac imaging: working towards fully-automated machine analysis & interpretation.

    PubMed

    Slomka, Piotr J; Dey, Damini; Sitek, Arkadiusz; Motwani, Manish; Berman, Daniel S; Germano, Guido

    2017-03-01

    Non-invasive imaging plays a critical role in managing patients with cardiovascular disease. Although subjective visual interpretation remains the clinical mainstay, quantitative analysis facilitates objective, evidence-based management, and advances in clinical research. This has driven developments in computing and software tools aimed at achieving fully automated image processing and quantitative analysis. In parallel, machine learning techniques have been used to rapidly integrate large amounts of clinical and quantitative imaging data to provide highly personalized individual patient-based conclusions. Areas covered: This review summarizes recent advances in automated quantitative imaging in cardiology and describes the latest techniques which incorporate machine learning principles. The review focuses on the cardiac imaging techniques which are in wide clinical use. It also discusses key issues and obstacles for these tools to become utilized in mainstream clinical practice. Expert commentary: Fully-automated processing and high-level computer interpretation of cardiac imaging are becoming a reality. Application of machine learning to the vast amounts of quantitative data generated per scan and integration with clinical data also facilitates a move to more patient-specific interpretation. These developments are unlikely to replace interpreting physicians but will provide them with highly accurate tools to detect disease, risk-stratify, and optimize patient-specific treatment. However, with each technological advance, we move further from human dependence and closer to fully-automated machine interpretation.

  1. Image analysis and machine learning for detecting malaria.

    PubMed

    Poostchi, Mahdieh; Silamut, Kamolrat; Maude, Richard J; Jaeger, Stefan; Thoma, George

    2018-04-01

    Malaria remains a major burden on global health, with roughly 200 million cases worldwide and more than 400,000 deaths per year. Besides biomedical research and political efforts, modern information technology is playing a key role in many attempts at fighting the disease. One of the barriers toward a successful mortality reduction has been inadequate malaria diagnosis in particular. To improve diagnosis, image analysis software and machine learning methods have been used to quantify parasitemia in microscopic blood slides. This article gives an overview of these techniques and discusses the current developments in image analysis and machine learning for microscopic malaria diagnosis. We organize the different approaches published in the literature according to the techniques used for imaging, image preprocessing, parasite detection and cell segmentation, feature computation, and automatic cell classification. Readers will find the different techniques listed in tables, with the relevant articles cited next to them, for both thin and thick blood smear images. We also discussed the latest developments in sections devoted to deep learning and smartphone technology for future malaria diagnosis. Published by Elsevier Inc.

  2. Applications of Deep Learning and Reinforcement Learning to Biological Data.

    PubMed

    Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano

    2018-06-01

    Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.

  3. On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products.

    PubMed

    Varshney, Kush R; Alemzadeh, Homa

    2017-09-01

    Machine learning algorithms increasingly influence our decisions and interact with us in all parts of our daily lives. Therefore, just as we consider the safety of power plants, highways, and a variety of other engineered socio-technical systems, we must also take into account the safety of systems involving machine learning. Heretofore, the definition of safety has not been formalized in a machine learning context. In this article, we do so by defining machine learning safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. We then use this definition to examine safety in all sorts of applications in cyber-physical systems, decision sciences, and data products. We find that the foundational principle of modern statistical machine learning, empirical risk minimization, is not always a sufficient objective. We discuss how four different categories of strategies for achieving safety in engineering, including inherently safe design, safety reserves, safe fail, and procedural safeguards can be mapped to a machine learning context. We then discuss example techniques that can be adopted in each category, such as considering interpretability and causality of predictive models, objective functions beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software and open data.

  4. Stochastic subset selection for learning with kernel machines.

    PubMed

    Rhinelander, Jason; Liu, Xiaoping P

    2012-06-01

    Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.

  5. Machine learning of molecular properties: Locality and active learning

    NASA Astrophysics Data System (ADS)

    Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.

    2018-06-01

    In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.

  6. Machine-learning techniques for fast and accurate feature localization in holograms of colloidal particles

    NASA Astrophysics Data System (ADS)

    Hannel, Mark D.; Abdulali, Aidan; O'Brien, Michael; Grier, David G.

    2018-06-01

    Holograms of colloidal particles can be analyzed with the Lorenz-Mie theory of light scattering to measure individual particles' three-dimensional positions with nanometer precision while simultaneously estimating their sizes and refractive indexes. Extracting this wealth of information begins by detecting and localizing features of interest within individual holograms. Conventionally approached with heuristic algorithms, this image analysis problem can be solved faster and more generally with machine-learning techniques. We demonstrate that two popular machine-learning algorithms, cascade classifiers and deep convolutional neural networks (CNN), can solve the feature-localization problem orders of magnitude faster than current state-of-the-art techniques. Our CNN implementation localizes holographic features precisely enough to bootstrap more detailed analyses based on the Lorenz-Mie theory of light scattering. The wavelet-based Haar cascade proves to be less precise, but is so computationally efficient that it creates new opportunities for applications that emphasize speed and low cost. We demonstrate its use as a real-time targeting system for holographic optical trapping.

  7. Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor

    DOE PAGES

    Faulon, Jean-Loup; Misra, Milind; Martin, Shawn; ...

    2007-11-23

    Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. Additionally, there is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformaticsmore » representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Lastly, such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets.« less

  8. Imaging nanoscale lattice variations by machine learning of x-ray diffraction microscopy data

    DOE PAGES

    Laanait, Nouamane; Zhang, Zhan; Schlepütz, Christian M.

    2016-08-09

    In this paper, we present a novel methodology based on machine learning to extract lattice variations in crystalline materials, at the nanoscale, from an x-ray Bragg diffraction-based imaging technique. By employing a full-field microscopy setup, we capture real space images of materials, with imaging contrast determined solely by the x-ray diffracted signal. The data sets that emanate from this imaging technique are a hybrid of real space information (image spatial support) and reciprocal lattice space information (image contrast), and are intrinsically multidimensional (5D). By a judicious application of established unsupervised machine learning techniques and multivariate analysis to this multidimensional datamore » cube, we show how to extract features that can be ascribed physical interpretations in terms of common structural distortions, such as lattice tilts and dislocation arrays. Finally, we demonstrate this 'big data' approach to x-ray diffraction microscopy by identifying structural defects present in an epitaxial ferroelectric thin-film of lead zirconate titanate.« less

  9. Imaging nanoscale lattice variations by machine learning of x-ray diffraction microscopy data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Laanait, Nouamane; Zhang, Zhan; Schlepütz, Christian M.

    In this paper, we present a novel methodology based on machine learning to extract lattice variations in crystalline materials, at the nanoscale, from an x-ray Bragg diffraction-based imaging technique. By employing a full-field microscopy setup, we capture real space images of materials, with imaging contrast determined solely by the x-ray diffracted signal. The data sets that emanate from this imaging technique are a hybrid of real space information (image spatial support) and reciprocal lattice space information (image contrast), and are intrinsically multidimensional (5D). By a judicious application of established unsupervised machine learning techniques and multivariate analysis to this multidimensional datamore » cube, we show how to extract features that can be ascribed physical interpretations in terms of common structural distortions, such as lattice tilts and dislocation arrays. Finally, we demonstrate this 'big data' approach to x-ray diffraction microscopy by identifying structural defects present in an epitaxial ferroelectric thin-film of lead zirconate titanate.« less

  10. Machine-learned and codified synthesis parameters of oxide materials

    NASA Astrophysics Data System (ADS)

    Kim, Edward; Huang, Kevin; Tomala, Alex; Matthews, Sara; Strubell, Emma; Saunders, Adam; McCallum, Andrew; Olivetti, Elsa

    2017-09-01

    Predictive materials design has rapidly accelerated in recent years with the advent of large-scale resources, such as materials structure and property databases generated by ab initio computations. In the absence of analogous ab initio frameworks for materials synthesis, high-throughput and machine learning techniques have recently been harnessed to generate synthesis strategies for select materials of interest. Still, a community-accessible, autonomously-compiled synthesis planning resource which spans across materials systems has not yet been developed. In this work, we present a collection of aggregated synthesis parameters computed using the text contained within over 640,000 journal articles using state-of-the-art natural language processing and machine learning techniques. We provide a dataset of synthesis parameters, compiled autonomously across 30 different oxide systems, in a format optimized for planning novel syntheses of materials.

  11. Radio Frequency Interference Detection using Machine Learning.

    NASA Astrophysics Data System (ADS)

    Mosiane, Olorato; Oozeer, Nadeem; Aniyan, Arun; Bassett, Bruce A.

    2017-05-01

    Radio frequency interference (RFI) has plagued radio astronomy which potentially might be as bad or worse by the time the Square Kilometre Array (SKA) comes up. RFI can be either internal (generated by instruments) or external that originates from intentional or unintentional radio emission generated by man. With the huge amount of data that will be available with up coming radio telescopes, an automated aproach will be required to detect RFI. In this paper to try automate this process we present the result of applying machine learning techniques to cross match RFI from the Karoo Array Telescope (KAT-7) data. We found that not all the features selected to characterise RFI are always important. We further investigated 3 machine learning techniques and conclude that the Random forest classifier performs with a 98% Area Under Curve and 91% recall in detecting RFI.

  12. Semantics of User Interface for Image Retrieval: Possibility Theory and Learning Techniques.

    ERIC Educational Resources Information Center

    Crehange, M.; And Others

    1989-01-01

    Discusses the need for a rich semantics for the user interface in interactive image retrieval and presents two methods for building such interfaces: possibility theory applied to fuzzy data retrieval, and a machine learning technique applied to learning the user's deep need. Prototypes developed using videodisks and knowledge-based software are…

  13. Signature Verification Using N-tuple Learning Machine.

    PubMed

    Maneechot, Thanin; Kitjaidure, Yuttana

    2005-01-01

    This research presents new algorithm for signature verification using N-tuple learning machine. The features are taken from handwritten signature on Digital Tablet (On-line). This research develops recognition algorithm using four features extraction, namely horizontal and vertical pen tip position(x-y position), pen tip pressure, and pen altitude angles. Verification uses N-tuple technique with Gaussian thresholding.

  14. Machine learning techniques for fault isolation and sensor placement

    NASA Technical Reports Server (NTRS)

    Carnes, James R.; Fisher, Douglas H.

    1993-01-01

    Fault isolation and sensor placement are vital for monitoring and diagnosis. A sensor conveys information about a system's state that guides troubleshooting if problems arise. We are using machine learning methods to uncover behavioral patterns over snapshots of system simulations that will aid fault isolation and sensor placement, with an eye towards minimality, fault coverage, and noise tolerance.

  15. A Machine Learning Concept for DTN Routing

    NASA Technical Reports Server (NTRS)

    Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos

    2017-01-01

    This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.

  16. Special Machines; Apparel Manufacturing: 9377.10.

    ERIC Educational Resources Information Center

    Dade County Public Schools, Miami, FL.

    This course allows students who are interested in careers in apparel manufacturing to learn the techniques for operating the various types of special machines used for finishing garments professionally and for specialty work. Course content includes goals, specific objectives, orientation, safety practices, special machines, assembling a child's…

  17. Space Weather in the Machine Learning Era: A Multidisciplinary Approach

    NASA Astrophysics Data System (ADS)

    Camporeale, E.; Wing, S.; Johnson, J.; Jackman, C. M.; McGranaghan, R.

    2018-01-01

    The workshop entitled Space Weather: A Multidisciplinary Approach took place at the Lorentz Center, University of Leiden, Netherlands, on 25-29 September 2017. The aim of this workshop was to bring together members of the Space Weather, Mathematics, Statistics, and Computer Science communities to address the use of advanced techniques such as Machine Learning, Information Theory, and Deep Learning, to better understand the Sun-Earth system and to improve space weather forecasting. Although individual efforts have been made toward this goal, the community consensus is that establishing interdisciplinary collaborations is the most promising strategy for fully utilizing the potential of these advanced techniques in solving Space Weather-related problems.

  18. Classification of highly unbalanced CYP450 data of drugs using cost sensitive machine learning techniques.

    PubMed

    Eitrich, T; Kless, A; Druska, C; Meyer, W; Grotendorst, J

    2007-01-01

    In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.

  19. Modeling Geomagnetic Variations using a Machine Learning Framework

    NASA Astrophysics Data System (ADS)

    Cheung, C. M. M.; Handmer, C.; Kosar, B.; Gerules, G.; Poduval, B.; Mackintosh, G.; Munoz-Jaramillo, A.; Bobra, M.; Hernandez, T.; McGranaghan, R. M.

    2017-12-01

    We present a framework for data-driven modeling of Heliophysics time series data. The Solar Terrestrial Interaction Neural net Generator (STING) is an open source python module built on top of state-of-the-art statistical learning frameworks (traditional machine learning methods as well as deep learning). To showcase the capability of STING, we deploy it for the problem of predicting the temporal variation of geomagnetic fields. The data used includes solar wind measurements from the OMNI database and geomagnetic field data taken by magnetometers at US Geological Survey observatories. We examine the predictive capability of different machine learning techniques (recurrent neural networks, support vector machines) for a range of forecasting times (minutes to 12 hours). STING is designed to be extensible to other types of data. We show how STING can be used on large sets of data from different sensors/observatories and adapted to tackle other problems in Heliophysics.

  20. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  1. Applications of machine learning in cancer prediction and prognosis.

    PubMed

    Cruz, Joseph A; Wishart, David S

    2007-02-11

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  2. Live animal assessments of rump fat and muscle score in Angus cows and steers using 3-dimensional imaging.

    PubMed

    McPhee, M J; Walmsley, B J; Skinner, B; Littler, B; Siddell, J P; Cafe, L M; Wilkins, J F; Oddy, V H; Alempijevic, A

    2017-04-01

    The objective of this study was to develop a proof of concept for using off-the-shelf Red Green Blue-Depth (RGB-D) Microsoft Kinect cameras to objectively assess P8 rump fat (P8 fat; mm) and muscle score (MS) traits in Angus cows and steers. Data from low and high muscled cattle (156 cows and 79 steers) were collected at multiple locations and time points. The following steps were required for the 3-dimensional (3D) image data and subsequent machine learning techniques to learn the traits: 1) reduce the high dimensionality of the point cloud data by extracting features from the input signals to produce a compact and representative feature vector, 2) perform global optimization of the signatures using machine learning algorithms and a parallel genetic algorithm, and 3) train a sensor model using regression-supervised learning techniques on the ultrasound P8 fat and the classified learning techniques for the assessed MS for each animal in the data set. The correlation of estimating hip height (cm) between visually measured and assessed 3D data from RGB-D cameras on cows and steers was 0.75 and 0.90, respectively. The supervised machine learning and global optimization approach correctly classified MS (mean [SD]) 80 (4.7) and 83% [6.6%] for cows and steers, respectively. Kappa tests of MS were 0.74 and 0.79 in cows and steers, respectively, indicating substantial agreement between visual assessment and the learning approaches of RGB-D camera images. A stratified 10-fold cross-validation for P8 fat did not find any differences in the mean bias ( = 0.62 and = 0.42 for cows and steers, respectively). The root mean square error of P8 fat was 1.54 and 1.00 mm for cows and steers, respectively. Additional data is required to strengthen the capacity of machine learning to estimate measured P8 fat and assessed MS. Data sets for and continental cattle are also required to broaden the use of 3D cameras to assess cattle. The results demonstrate the importance of capturing curvature as a form of representing body shape. A data-driven model from shape to trait has established a proof of concept using optimized machine learning techniques to assess P8 fat and MS in Angus cows and steers.

  3. One-Class Classification-Based Real-Time Activity Error Detection in Smart Homes.

    PubMed

    Das, Barnan; Cook, Diane J; Krishnan, Narayanan C; Schmitter-Edgecombe, Maureen

    2016-08-01

    Caring for individuals with dementia is frequently associated with extreme physical and emotional stress, which often leads to depression. Smart home technology and advances in machine learning techniques can provide innovative solutions to reduce caregiver burden. One key service that caregivers provide is prompting individuals with memory limitations to initiate and complete daily activities. We hypothesize that sensor technologies combined with machine learning techniques can automate the process of providing reminder-based interventions. The first step towards automated interventions is to detect when an individual faces difficulty with activities. We propose machine learning approaches based on one-class classification that learn normal activity patterns. When we apply these classifiers to activity patterns that were not seen before, the classifiers are able to detect activity errors, which represent potential prompt situations. We validate our approaches on smart home sensor data obtained from older adult participants, some of whom faced difficulties performing routine activities and thus committed errors.

  4. Cardiac imaging: working towards fully-automated machine analysis & interpretation

    PubMed Central

    Slomka, Piotr J; Dey, Damini; Sitek, Arkadiusz; Motwani, Manish; Berman, Daniel S; Germano, Guido

    2017-01-01

    Introduction Non-invasive imaging plays a critical role in managing patients with cardiovascular disease. Although subjective visual interpretation remains the clinical mainstay, quantitative analysis facilitates objective, evidence-based management, and advances in clinical research. This has driven developments in computing and software tools aimed at achieving fully automated image processing and quantitative analysis. In parallel, machine learning techniques have been used to rapidly integrate large amounts of clinical and quantitative imaging data to provide highly personalized individual patient-based conclusions. Areas covered This review summarizes recent advances in automated quantitative imaging in cardiology and describes the latest techniques which incorporate machine learning principles. The review focuses on the cardiac imaging techniques which are in wide clinical use. It also discusses key issues and obstacles for these tools to become utilized in mainstream clinical practice. Expert commentary Fully-automated processing and high-level computer interpretation of cardiac imaging are becoming a reality. Application of machine learning to the vast amounts of quantitative data generated per scan and integration with clinical data also facilitates a move to more patient-specific interpretation. These developments are unlikely to replace interpreting physicians but will provide them with highly accurate tools to detect disease, risk-stratify, and optimize patient-specific treatment. However, with each technological advance, we move further from human dependence and closer to fully-automated machine interpretation. PMID:28277804

  5. Adaptive design of an X-ray magnetic circular dichroism spectroscopy experiment with Gaussian process modelling

    NASA Astrophysics Data System (ADS)

    Ueno, Tetsuro; Hino, Hideitsu; Hashimoto, Ai; Takeichi, Yasuo; Sawada, Masahiro; Ono, Kanta

    2018-01-01

    Spectroscopy is a widely used experimental technique, and enhancing its efficiency can have a strong impact on materials research. We propose an adaptive design for spectroscopy experiments that uses a machine learning technique to improve efficiency. We examined X-ray magnetic circular dichroism (XMCD) spectroscopy for the applicability of a machine learning technique to spectroscopy. An XMCD spectrum was predicted by Gaussian process modelling with learning of an experimental spectrum using a limited number of observed data points. Adaptive sampling of data points with maximum variance of the predicted spectrum successfully reduced the total data points for the evaluation of magnetic moments while providing the required accuracy. The present method reduces the time and cost for XMCD spectroscopy and has potential applicability to various spectroscopies.

  6. Machine learning phases of matter

    NASA Astrophysics Data System (ADS)

    Carrasquilla, Juan; Melko, Roger G.

    2017-02-01

    Condensed-matter physics is the study of the collective behaviour of infinitely complex assemblies of electrons, nuclei, magnetic moments, atoms or qubits. This complexity is reflected in the size of the state space, which grows exponentially with the number of particles, reminiscent of the `curse of dimensionality' commonly encountered in machine learning. Despite this curse, the machine learning community has developed techniques with remarkable abilities to recognize, classify, and characterize complex sets of data. Here, we show that modern machine learning architectures, such as fully connected and convolutional neural networks, can identify phases and phase transitions in a variety of condensed-matter Hamiltonians. Readily programmable through modern software libraries, neural networks can be trained to detect multiple types of order parameter, as well as highly non-trivial states with no conventional order, directly from raw state configurations sampled with Monte Carlo.

  7. Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features

    PubMed Central

    Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

    2017-01-01

    Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization. PMID:28599282

  8. Analysis of Machine Learning Techniques for Heart Failure Readmissions.

    PubMed

    Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

    2016-11-01

    The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.

  9. Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features.

    PubMed

    Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

    2017-07-18

    Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.

  10. COMPOSER: A Probabilistic Solution to the Utility Problem in Speed-up Learning.

    ERIC Educational Resources Information Center

    Gratch, Jonathan; DeJong, Gerald

    In machine learning there is considerable interest in techniques which improve planning ability. Initial investigations have identified a wide variety of techniques to address this issue. Progress has been hampered by the utility problem, a basic tradeoff between the benefit of learned knowledge and the cost to locate and apply relevant knowledge.…

  11. Differentially Private Empirical Risk Minimization

    PubMed Central

    Chaudhuri, Kamalika; Monteleoni, Claire; Sarwate, Anand D.

    2011-01-01

    Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ε-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance. PMID:21892342

  12. Confabulation Based Sentence Completion for Machine Reading

    DTIC Science & Technology

    2010-11-01

    making sentence completion an indispensible component of machine reading. Cogent confabulation is a bio-inspired computational model that mimics the...thus making sentence completion an indispensible component of machine reading. Cogent confabulation is a bio-inspired computational model that mimics...University Press, 1992. [2] H. Motoda and K. Yoshida, “Machine learning techniques to make computers easier to use,” Proceedings of the Fifteenth

  13. Machine learning based Intelligent cognitive network using fog computing

    NASA Astrophysics Data System (ADS)

    Lu, Jingyang; Li, Lun; Chen, Genshe; Shen, Dan; Pham, Khanh; Blasch, Erik

    2017-05-01

    In this paper, a Cognitive Radio Network (CRN) based on artificial intelligence is proposed to distribute the limited radio spectrum resources more efficiently. The CRN framework can analyze the time-sensitive signal data close to the signal source using fog computing with different types of machine learning techniques. Depending on the computational capabilities of the fog nodes, different features and machine learning techniques are chosen to optimize spectrum allocation. Also, the computing nodes send the periodic signal summary which is much smaller than the original signal to the cloud so that the overall system spectrum source allocation strategies are dynamically updated. Applying fog computing, the system is more adaptive to the local environment and robust to spectrum changes. As most of the signal data is processed at the fog level, it further strengthens the system security by reducing the communication burden of the communications network.

  14. Causal inference in economics and marketing.

    PubMed

    Varian, Hal R

    2016-07-05

    This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.

  15. Causal inference in economics and marketing

    PubMed Central

    Varian, Hal R.

    2016-01-01

    This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual—a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference. PMID:27382144

  16. Exploiting the Dynamics of Soft Materials for Machine Learning

    PubMed Central

    Hauser, Helmut; Li, Tao; Pfeifer, Rolf

    2018-01-01

    Abstract Soft materials are increasingly utilized for various purposes in many engineering applications. These materials have been shown to perform a number of functions that were previously difficult to implement using rigid materials. Here, we argue that the diverse dynamics generated by actuating soft materials can be effectively used for machine learning purposes. This is demonstrated using a soft silicone arm through a technique of multiplexing, which enables the rich transient dynamics of the soft materials to be fully exploited as a computational resource. The computational performance of the soft silicone arm is examined through two standard benchmark tasks. Results show that the soft arm compares well to or even outperforms conventional machine learning techniques under multiple conditions. We then demonstrate that this system can be used for the sensory time series prediction problem for the soft arm itself, which suggests its immediate applicability to a real-world machine learning problem. Our approach, on the one hand, represents a radical departure from traditional computational methods, whereas on the other hand, it fits nicely into a more general perspective of computation by way of exploiting the properties of physical materials in the real world. PMID:29708857

  17. Exploiting the Dynamics of Soft Materials for Machine Learning.

    PubMed

    Nakajima, Kohei; Hauser, Helmut; Li, Tao; Pfeifer, Rolf

    2018-06-01

    Soft materials are increasingly utilized for various purposes in many engineering applications. These materials have been shown to perform a number of functions that were previously difficult to implement using rigid materials. Here, we argue that the diverse dynamics generated by actuating soft materials can be effectively used for machine learning purposes. This is demonstrated using a soft silicone arm through a technique of multiplexing, which enables the rich transient dynamics of the soft materials to be fully exploited as a computational resource. The computational performance of the soft silicone arm is examined through two standard benchmark tasks. Results show that the soft arm compares well to or even outperforms conventional machine learning techniques under multiple conditions. We then demonstrate that this system can be used for the sensory time series prediction problem for the soft arm itself, which suggests its immediate applicability to a real-world machine learning problem. Our approach, on the one hand, represents a radical departure from traditional computational methods, whereas on the other hand, it fits nicely into a more general perspective of computation by way of exploiting the properties of physical materials in the real world.

  18. A review on machine learning principles for multi-view biological data integration.

    PubMed

    Li, Yifeng; Wu, Fang-Xiang; Ngom, Alioune

    2018-03-01

    Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

  19. Combining macula clinical signs and patient characteristics for age-related macular degeneration diagnosis: a machine learning approach.

    PubMed

    Fraccaro, Paolo; Nicolo, Massimo; Bonetto, Monica; Giacomini, Mauro; Weller, Peter; Traverso, Carlo Enrico; Prosperi, Mattia; OSullivan, Dympna

    2015-01-27

    To investigate machine learning methods, ranging from simpler interpretable techniques to complex (non-linear) "black-box" approaches, for automated diagnosis of Age-related Macular Degeneration (AMD). Data from healthy subjects and patients diagnosed with AMD or other retinal diseases were collected during routine visits via an Electronic Health Record (EHR) system. Patients' attributes included demographics and, for each eye, presence/absence of major AMD-related clinical signs (soft drusen, retinal pigment epitelium, defects/pigment mottling, depigmentation area, subretinal haemorrhage, subretinal fluid, macula thickness, macular scar, subretinal fibrosis). Interpretable techniques known as white box methods including logistic regression and decision trees as well as less interpreitable techniques known as black box methods, such as support vector machines (SVM), random forests and AdaBoost, were used to develop models (trained and validated on unseen data) to diagnose AMD. The gold standard was confirmed diagnosis of AMD by physicians. Sensitivity, specificity and area under the receiver operating characteristic (AUC) were used to assess performance. Study population included 487 patients (912 eyes). In terms of AUC, random forests, logistic regression and adaboost showed a mean performance of (0.92), followed by SVM and decision trees (0.90). All machine learning models identified soft drusen and age as the most discriminating variables in clinicians' decision pathways to diagnose AMD. Both black-box and white box methods performed well in identifying diagnoses of AMD and their decision pathways. Machine learning models developed through the proposed approach, relying on clinical signs identified by retinal specialists, could be embedded into EHR to provide physicians with real time (interpretable) support.

  20. Epileptic seizure detection in EEG signal using machine learning techniques.

    PubMed

    Jaiswal, Abeg Kumar; Banka, Haider

    2018-03-01

    Epilepsy is a well-known nervous system disorder characterized by seizures. Electroencephalograms (EEGs), which capture brain neural activity, can detect epilepsy. Traditional methods for analyzing an EEG signal for epileptic seizure detection are time-consuming. Recently, several automated seizure detection frameworks using machine learning technique have been proposed to replace these traditional methods. The two basic steps involved in machine learning are feature extraction and classification. Feature extraction reduces the input pattern space by keeping informative features and the classifier assigns the appropriate class label. In this paper, we propose two effective approaches involving subpattern based PCA (SpPCA) and cross-subpattern correlation-based PCA (SubXPCA) with Support Vector Machine (SVM) for automated seizure detection in EEG signals. Feature extraction was performed using SpPCA and SubXPCA. Both techniques explore the subpattern correlation of EEG signals, which helps in decision-making process. SVM is used for classification of seizure and non-seizure EEG signals. The SVM was trained with radial basis kernel. All the experiments have been carried out on the benchmark epilepsy EEG dataset. The entire dataset consists of 500 EEG signals recorded under different scenarios. Seven different experimental cases for classification have been conducted. The classification accuracy was evaluated using tenfold cross validation. The classification results of the proposed approaches have been compared with the results of some of existing techniques proposed in the literature to establish the claim.

  1. Machine learning techniques applied to the determination of road suitability for the transportation of dangerous substances.

    PubMed

    Matías, J M; Taboada, J; Ordóñez, C; Nieto, P G

    2007-08-17

    This article describes a methodology to model the degree of remedial action required to make short stretches of a roadway suitable for dangerous goods transport (DGT), particularly pollutant substances, using different variables associated with the characteristics of each segment. Thirty-one factors determining the impact of an accident on a particular stretch of road were identified and subdivided into two major groups: accident probability factors and accident severity factors. Given the number of factors determining the state of a particular road segment, the only viable statistical methods for implementing the model were machine learning techniques, such as multilayer perceptron networks (MLPs), classification trees (CARTs) and support vector machines (SVMs). The results produced by these techniques on a test sample were more favourable than those produced by traditional discriminant analysis, irrespective of whether dimensionality reduction techniques were applied. The best results were obtained using SVMs specifically adapted to ordinal data. This technique takes advantage of the ordinal information contained in the data without penalising the computational load. Furthermore, the technique permits the estimation of the utility function that is latent in expert knowledge.

  2. Applications of Machine Learning in Cancer Prediction and Prognosis

    PubMed Central

    Cruz, Joseph A.; Wishart, David S.

    2006-01-01

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15–25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression. PMID:19458758

  3. Chemically intuited, large-scale screening of MOFs by machine learning techniques

    NASA Astrophysics Data System (ADS)

    Borboudakis, Giorgos; Stergiannakos, Taxiarchis; Frysali, Maria; Klontzas, Emmanuel; Tsamardinos, Ioannis; Froudakis, George E.

    2017-10-01

    A novel computational methodology for large-scale screening of MOFs is applied to gas storage with the use of machine learning technologies. This approach is a promising trade-off between the accuracy of ab initio methods and the speed of classical approaches, strategically combined with chemical intuition. The results demonstrate that the chemical properties of MOFs are indeed predictable (stochastically, not deterministically) using machine learning methods and automated analysis protocols, with the accuracy of predictions increasing with sample size. Our initial results indicate that this methodology is promising to apply not only to gas storage in MOFs but in many other material science projects.

  4. Data mining in bioinformatics using Weka.

    PubMed

    Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H

    2004-10-12

    The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.

  5. Comparative evaluation of features and techniques for identifying activity type and estimating energy cost from accelerometer data

    PubMed Central

    Kate, Rohit J.; Swartz, Ann M.; Welch, Whitney A.; Strath, Scott J.

    2016-01-01

    Wearable accelerometers can be used to objectively assess physical activity. However, the accuracy of this assessment depends on the underlying method used to process the time series data obtained from accelerometers. Several methods have been proposed that use this data to identify the type of physical activity and estimate its energy cost. Most of the newer methods employ some machine learning technique along with suitable features to represent the time series data. This paper experimentally compares several of these techniques and features on a large dataset of 146 subjects doing eight different physical activities wearing an accelerometer on the hip. Besides features based on statistics, distance based features and simple discrete features straight from the time series were also evaluated. On the physical activity type identification task, the results show that using more features significantly improve results. Choice of machine learning technique was also found to be important. However, on the energy cost estimation task, choice of features and machine learning technique were found to be less influential. On that task, separate energy cost estimation models trained specifically for each type of physical activity were found to be more accurate than a single model trained for all types of physical activities. PMID:26862679

  6. Integrating Machine Learning into Space Operations

    NASA Astrophysics Data System (ADS)

    Kelly, K. G.

    There are significant challenges with managing activities in space, which for the scope of this paper are primarily the identification of objects in orbit, maintaining accurate estimates of the orbits of those objects, detecting changes to those orbits, warning of possible collisions between objects and detection of anomalous behavior. The challenges come from the large amounts of data to be processed, which is often incomplete and noisy, limitations on the ability to influence objects in space and the overall strategic importance of space to national interests. The focus of this paper is on defining an approach to leverage the improved capabilities that are possible using state of the art machine learning in a way that empowers operations personnel without sacrificing the security and mission assurance associated with manual operations performed by trained personnel. There has been significant research in the development of algorithms and techniques for applying machine learning in this domain, but deploying new techniques into such a mission critical domain is difficult and time consuming. Establishing a common framework could improve the efficiency with which new techniques are integrated into operations and the overall effectiveness at providing improvements.

  7. Discrimination of plant root zone water status in greenhouse production based on phenotyping and machine learning techniques.

    PubMed

    Guo, Doudou; Juan, Jiaxiang; Chang, Liying; Zhang, Jingjin; Huang, Danfeng

    2017-08-15

    Plant-based sensing on water stress can provide sensitive and direct reference for precision irrigation system in greenhouse. However, plant information acquisition, interpretation, and systematical application remain insufficient. This study developed a discrimination method for plant root zone water status in greenhouse by integrating phenotyping and machine learning techniques. Pakchoi plants were used and treated by three root zone moisture levels, 40%, 60%, and 80% relative water content. Three classification models, Random Forest (RF), Neural Network (NN), and Support Vector Machine (SVM) were developed and validated in different scenarios with overall accuracy over 90% for all. SVM model had the highest value, but it required the longest training time. All models had accuracy over 85% in all scenarios, and more stable performance was observed in RF model. Simplified SVM model developed by the top five most contributing traits had the largest accuracy reduction as 29.5%, while simplified RF and NN model still maintained approximately 80%. For real case application, factors such as operation cost, precision requirement, and system reaction time should be synthetically considered in model selection. Our work shows it is promising to discriminate plant root zone water status by implementing phenotyping and machine learning techniques for precision irrigation management.

  8. Application of Machine Learning to Rotorcraft Health Monitoring

    NASA Technical Reports Server (NTRS)

    Cody, Tyler; Dempsey, Paula J.

    2017-01-01

    Machine learning is a powerful tool for data exploration and model building with large data sets. This project aimed to use machine learning techniques to explore the inherent structure of data from rotorcraft gear tests, relationships between features and damage states, and to build a system for predicting gear health for future rotorcraft transmission applications. Classical machine learning techniques are difficult, if not irresponsible to apply to time series data because many make the assumption of independence between samples. To overcome this, Hidden Markov Models were used to create a binary classifier for identifying scuffing transitions and Recurrent Neural Networks were used to leverage long distance relationships in predicting discrete damage states. When combined in a workflow, where the binary classifier acted as a filter for the fatigue monitor, the system was able to demonstrate accuracy in damage state prediction and scuffing identification. The time dependent nature of the data restricted data exploration to collecting and analyzing data from the model selection process. The limited amount of available data was unable to give useful information, and the division of training and testing sets tended to heavily influence the scores of the models across combinations of features and hyper-parameters. This work built a framework for tracking scuffing and fatigue on streaming data and demonstrates that machine learning has much to offer rotorcraft health monitoring by using Bayesian learning and deep learning methods to capture the time dependent nature of the data. Suggested future work is to implement the framework developed in this project using a larger variety of data sets to test the generalization capabilities of the models and allow for data exploration.

  9. How much information is in a jet?

    NASA Astrophysics Data System (ADS)

    Datta, Kaustuv; Larkoski, Andrew

    2017-06-01

    Machine learning techniques are increasingly being applied toward data analyses at the Large Hadron Collider, especially with applications for discrimination of jets with different originating particles. Previous studies of the power of machine learning to jet physics have typically employed image recognition, natural language processing, or other algorithms that have been extensively developed in computer science. While these studies have demonstrated impressive discrimination power, often exceeding that of widely-used observables, they have been formulated in a non-constructive manner and it is not clear what additional information the machines are learning. In this paper, we study machine learning for jet physics constructively, expressing all of the information in a jet onto sets of observables that completely and minimally span N-body phase space. For concreteness, we study the application of machine learning for discrimination of boosted, hadronic decays of Z bosons from jets initiated by QCD processes. Our results demonstrate that the information in a jet that is useful for discrimination power of QCD jets from Z bosons is saturated by only considering observables that are sensitive to 4-body (8 dimensional) phase space.

  10. Introduction to machine learning for brain imaging.

    PubMed

    Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert

    2011-05-15

    Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. Copyright © 2010 Elsevier Inc. All rights reserved.

  11. Supervised machine learning techniques to predict binding affinity. A study for cyclin-dependent kinase 2.

    PubMed

    de Ávila, Maurício Boff; Xavier, Mariana Morrone; Pintro, Val Oliveira; de Azevedo, Walter Filgueira

    2017-12-09

    Here we report the development of a machine-learning model to predict binding affinity based on the crystallographic structures of protein-ligand complexes. We used an ensemble of crystallographic structures (resolution better than 1.5 Å resolution) for which half-maximal inhibitory concentration (IC 50 ) data is available. Polynomial scoring functions were built using as explanatory variables the energy terms present in the MolDock and PLANTS scoring functions. Prediction performance was tested and the supervised machine learning models showed improvement in the prediction power, when compared with PLANTS and MolDock scoring functions. In addition, the machine-learning model was applied to predict binding affinity of CDK2, which showed a better performance when compared with AutoDock4, AutoDock Vina, MolDock, and PLANTS scores. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Generating a Spanish Affective Dictionary with Supervised Learning Techniques

    ERIC Educational Resources Information Center

    Bermudez-Gonzalez, Daniel; Miranda-Jiménez, Sabino; García-Moreno, Raúl-Ulises; Calderón-Nepamuceno, Dora

    2016-01-01

    Nowadays, machine learning techniques are being used in several Natural Language Processing (NLP) tasks such as Opinion Mining (OM). OM is used to analyse and determine the affective orientation of texts. Usually, OM approaches use affective dictionaries in order to conduct sentiment analysis. These lexicons are labeled manually with affective…

  13. Machine learning enhanced optical distance sensor

    NASA Astrophysics Data System (ADS)

    Amin, M. Junaid; Riza, N. A.

    2018-01-01

    Presented for the first time is a machine learning enhanced optical distance sensor. The distance sensor is based on our previously demonstrated distance measurement technique that uses an Electronically Controlled Variable Focus Lens (ECVFL) with a laser source to illuminate a target plane with a controlled optical beam spot. This spot with varying spot sizes is viewed by an off-axis camera and the spot size data is processed to compute the distance. In particular, proposed and demonstrated in this paper is the use of a regularized polynomial regression based supervised machine learning algorithm to enhance the accuracy of the operational sensor. The algorithm uses the acquired features and corresponding labels that are the actual target distance values to train a machine learning model. The optimized training model is trained over a 1000 mm (or 1 m) experimental target distance range. Using the machine learning algorithm produces a training set and testing set distance measurement errors of <0.8 mm and <2.2 mm, respectively. The test measurement error is at least a factor of 4 improvement over our prior sensor demonstration without the use of machine learning. Applications for the proposed sensor include industrial scenario distance sensing where target material specific training models can be generated to realize low <1% measurement error distance measurements.

  14. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

    PubMed Central

    Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.

    2017-01-01

    Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868

  15. Machine Learning in Intrusion Detection

    DTIC Science & Technology

    2005-07-01

    machine learning tasks. Anomaly detection provides the core technology for a broad spectrum of security-centric applications. In this dissertation, we examine various aspects of anomaly based intrusion detection in computer security. First, we present a new approach to learn program behavior for intrusion detection. Text categorization techniques are adopted to convert each process to a vector and calculate the similarity between two program activities. Then the k-nearest neighbor classifier is employed to classify program behavior as normal or intrusive. We demonstrate

  16. Learning micro incision surgery without the learning curve

    PubMed Central

    Navin, Shoba; Parikh, Rajul

    2008-01-01

    We describe a method of learning micro incision cataract surgery painlessly with the minimum of learning curves. A large-bore or standard anterior chamber maintainer (ACM) facilitates learning without change of machine or preferred surgical technique. Experience with the use of an ACM during phacoemulsification is desirable. PMID:18292624

  17. Survey of Analysis of Crime Detection Techniques Using Data Mining and Machine Learning

    NASA Astrophysics Data System (ADS)

    Prabakaran, S.; Mitra, Shilpa

    2018-04-01

    Data mining is the field containing procedures for finding designs or patterns in a huge dataset, it includes strategies at the convergence of machine learning and database framework. It can be applied to various fields like future healthcare, market basket analysis, education, manufacturing engineering, crime investigation etc. Among these, crime investigation is an interesting application to process crime characteristics to help the society for a better living. This paper survey various data mining techniques used in this domain. This study may be helpful in designing new strategies for crime prediction and analysis.

  18. Extreme Learning Machine and Particle Swarm Optimization in optimizing CNC turning operation

    NASA Astrophysics Data System (ADS)

    Janahiraman, Tiagrajah V.; Ahmad, Nooraziah; Hani Nordin, Farah

    2018-04-01

    The CNC machine is controlled by manipulating cutting parameters that could directly influence the process performance. Many optimization methods has been applied to obtain the optimal cutting parameters for the desired performance function. Nonetheless, the industry still uses the traditional technique to obtain those values. Lack of knowledge on optimization techniques is the main reason for this issue to be prolonged. Therefore, the simple yet easy to implement, Optimal Cutting Parameters Selection System is introduced to help the manufacturer to easily understand and determine the best optimal parameters for their turning operation. This new system consists of two stages which are modelling and optimization. In modelling of input-output and in-process parameters, the hybrid of Extreme Learning Machine and Particle Swarm Optimization is applied. This modelling technique tend to converge faster than other artificial intelligent technique and give accurate result. For the optimization stage, again the Particle Swarm Optimization is used to get the optimal cutting parameters based on the performance function preferred by the manufacturer. Overall, the system can reduce the gap between academic world and the industry by introducing a simple yet easy to implement optimization technique. This novel optimization technique can give accurate result besides being the fastest technique.

  19. Machine learning and social network analysis applied to Alzheimer's disease biomarkers.

    PubMed

    Di Deco, Javier; González, Ana M; Díaz, Julia; Mato, Virginia; García-Frank, Daniel; Álvarez-Linera, Juan; Frank, Ana; Hernández-Tamames, Juan A

    2013-01-01

    Due to the fact that the number of deaths due Alzheimer is increasing, the scientists have a strong interest in early stage diagnostic of this disease. Alzheimer's patients show different kind of brain alterations, such as morphological, biochemical, functional, etc. Currently, using magnetic resonance imaging techniques is possible to obtain a huge amount of biomarkers; being difficult to appraise which of them can explain more properly how the pathology evolves instead of the normal ageing. Machine Learning methods facilitate an efficient analysis of complex data and can be used to discover which biomarkers are more informative. Moreover, automatic models can learn from historical data to suggest the diagnostic of new patients. Social Network Analysis (SNA) views social relationships in terms of network theory consisting of nodes and connections. The resulting graph-based structures are often very complex; there can be many kinds of connections between the nodes. SNA has emerged as a key technique in modern sociology. It has also gained a significant following in medicine, anthropology, biology, information science, etc., and has become a popular topic of speculation and study. This paper presents a review of machine learning and SNA techniques and then, a new approach to analyze the magnetic resonance imaging biomarkers with these techniques, obtaining relevant relationships that can explain the different phenotypes in dementia, in particular, different stages of Alzheimer's disease.

  20. Active learning machine learns to create new quantum experiments.

    PubMed

    Melnikov, Alexey A; Poulsen Nautrup, Hendrik; Krenn, Mario; Dunjko, Vedran; Tiersch, Markus; Zeilinger, Anton; Briegel, Hans J

    2018-02-06

    How useful can machine learning be in a quantum laboratory? Here we raise the question of the potential of intelligent machines in the context of scientific research. A major motivation for the present work is the unknown reachability of various entanglement classes in quantum experiments. We investigate this question by using the projective simulation model, a physics-oriented approach to artificial intelligence. In our approach, the projective simulation system is challenged to design complex photonic quantum experiments that produce high-dimensional entangled multiphoton states, which are of high interest in modern quantum experiments. The artificial intelligence system learns to create a variety of entangled states and improves the efficiency of their realization. In the process, the system autonomously (re)discovers experimental techniques which are only now becoming standard in modern quantum optical experiments-a trait which was not explicitly demanded from the system but emerged through the process of learning. Such features highlight the possibility that machines could have a significantly more creative role in future research.

  1. Integrating machine learning techniques and high-resolution imagery to generate GIS-ready information for urban water consumption studies

    NASA Astrophysics Data System (ADS)

    Wolf, Nils; Hof, Angela

    2012-10-01

    Urban sprawl driven by shifts in tourism development produces new suburban landscapes of water consumption on Mediterranean coasts. Golf courses, ornamental, 'Atlantic' gardens and swimming pools are the most striking artefacts of this transformation, threatening the local water supply systems and exacerbating water scarcity. In the face of climate change, urban landscape irrigation is becoming increasingly important from a resource management point of view. This paper adopts urban remote sensing towards a targeted mapping approach using machine learning techniques and highresolution satellite imagery (WorldView-2) to generate GIS-ready information for urban water consumption studies. Swimming pools, vegetation and - as a subgroup of vegetation - turf grass are extracted as important determinants of water consumption. For image analysis, the complex nature of urban environments suggests spatial-spectral classification, i.e. the complementary use of the spectral signature and spatial descriptors. Multiscale image segmentation provides means to extract the spatial descriptors - namely object feature layers - which can be concatenated at pixel level to the spectral signature. This study assesses the value of object features using different machine learning techniques and amounts of labeled information for learning. The results indicate the benefit of the spatial-spectral approach if combined with appropriate classifiers like tree-based ensembles or support vector machines, which can handle high dimensionality. Finally, a Random Forest classifier was chosen to deliver the classified input data for the estimation of evaporative water loss and net landscape irrigation requirements.

  2. Classifying Structures in the ISM with Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Beaumont, Christopher; Goodman, A. A.; Williams, J. P.

    2011-01-01

    The processes which govern molecular cloud evolution and star formation often sculpt structures in the ISM: filaments, pillars, shells, outflows, etc. Because of their morphological complexity, these objects are often identified manually. Manual classification has several disadvantages; the process is subjective, not easily reproducible, and does not scale well to handle increasingly large datasets. We have explored to what extent machine learning algorithms can be trained to autonomously identify specific morphological features in molecular cloud datasets. We show that the Support Vector Machine algorithm can successfully locate filaments and outflows blended with other emission structures. When the objects of interest are morphologically distinct from the surrounding emission, this autonomous classification achieves >90% accuracy. We have developed a set of IDL-based tools to apply this technique to other datasets.

  3. Boosting compound-protein interaction prediction by deep learning.

    PubMed

    Tian, Kai; Shao, Mingyu; Wang, Yang; Guan, Jihong; Zhou, Shuigeng

    2016-11-01

    The identification of interactions between compounds and proteins plays an important role in network pharmacology and drug discovery. However, experimentally identifying compound-protein interactions (CPIs) is generally expensive and time-consuming, computational approaches are thus introduced. Among these, machine-learning based methods have achieved a considerable success. However, due to the nonlinear and imbalanced nature of biological data, many machine learning approaches have their own limitations. Recently, deep learning techniques show advantages over many state-of-the-art machine learning methods in some applications. In this study, we aim at improving the performance of CPI prediction based on deep learning, and propose a method called DL-CPI (the abbreviation of Deep Learning for Compound-Protein Interactions prediction), which employs deep neural network (DNN) to effectively learn the representations of compound-protein pairs. Extensive experiments show that DL-CPI can learn useful features of compound-protein pairs by a layerwise abstraction, and thus achieves better prediction performance than existing methods on both balanced and imbalanced datasets. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Human semi-supervised learning.

    PubMed

    Gibson, Bryan R; Rogers, Timothy T; Zhu, Xiaojin

    2013-01-01

    Most empirical work in human categorization has studied learning in either fully supervised or fully unsupervised scenarios. Most real-world learning scenarios, however, are semi-supervised: Learners receive a great deal of unlabeled information from the world, coupled with occasional experiences in which items are directly labeled by a knowledgeable source. A large body of work in machine learning has investigated how learning can exploit both labeled and unlabeled data provided to a learner. Using equivalences between models found in human categorization and machine learning research, we explain how these semi-supervised techniques can be applied to human learning. A series of experiments are described which show that semi-supervised learning models prove useful for explaining human behavior when exposed to both labeled and unlabeled data. We then discuss some machine learning models that do not have familiar human categorization counterparts. Finally, we discuss some challenges yet to be addressed in the use of semi-supervised models for modeling human categorization. Copyright © 2013 Cognitive Science Society, Inc.

  5. Discovering charge density functionals and structure-property relationships with PROPhet: A general framework for coupling machine learning and first-principles methods

    DOE PAGES

    Kolb, Brian; Lentz, Levi C.; Kolpak, Alexie M.

    2017-04-26

    Modern ab initio methods have rapidly increased our understanding of solid state materials properties, chemical reactions, and the quantum interactions between atoms. However, poor scaling often renders direct ab initio calculations intractable for large or complex systems. There are two obvious avenues through which to remedy this problem: (i) develop new, less expensive methods to calculate system properties, or (ii) make existing methods faster. This paper describes an open source framework designed to pursue both of these avenues. PROPhet (short for PROPerty Prophet) utilizes machine learning techniques to find complex, non-linear mappings between sets of material or system properties. Themore » result is a single code capable of learning analytical potentials, non-linear density functionals, and other structure-property or property-property relationships. These capabilities enable highly accurate mesoscopic simulations, facilitate computation of expensive properties, and enable the development of predictive models for systematic materials design and optimization. Here, this work explores the coupling of machine learning to ab initio methods through means both familiar (e.g., the creation of various potentials and energy functionals) and less familiar (e.g., the creation of density functionals for arbitrary properties), serving both to demonstrate PROPhet’s ability to create exciting post-processing analysis tools and to open the door to improving ab initio methods themselves with these powerful machine learning techniques.« less

  6. Discovering charge density functionals and structure-property relationships with PROPhet: A general framework for coupling machine learning and first-principles methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kolb, Brian; Lentz, Levi C.; Kolpak, Alexie M.

    Modern ab initio methods have rapidly increased our understanding of solid state materials properties, chemical reactions, and the quantum interactions between atoms. However, poor scaling often renders direct ab initio calculations intractable for large or complex systems. There are two obvious avenues through which to remedy this problem: (i) develop new, less expensive methods to calculate system properties, or (ii) make existing methods faster. This paper describes an open source framework designed to pursue both of these avenues. PROPhet (short for PROPerty Prophet) utilizes machine learning techniques to find complex, non-linear mappings between sets of material or system properties. Themore » result is a single code capable of learning analytical potentials, non-linear density functionals, and other structure-property or property-property relationships. These capabilities enable highly accurate mesoscopic simulations, facilitate computation of expensive properties, and enable the development of predictive models for systematic materials design and optimization. Here, this work explores the coupling of machine learning to ab initio methods through means both familiar (e.g., the creation of various potentials and energy functionals) and less familiar (e.g., the creation of density functionals for arbitrary properties), serving both to demonstrate PROPhet’s ability to create exciting post-processing analysis tools and to open the door to improving ab initio methods themselves with these powerful machine learning techniques.« less

  7. Supervised Machine Learning for Regionalization of Environmental Data: Distribution of Uranium in Groundwater in Ukraine

    NASA Astrophysics Data System (ADS)

    Govorov, Michael; Gienko, Gennady; Putrenko, Viktor

    2018-05-01

    In this paper, several supervised machine learning algorithms were explored to define homogeneous regions of con-centration of uranium in surface waters in Ukraine using multiple environmental parameters. The previous study was focused on finding the primary environmental parameters related to uranium in ground waters using several methods of spatial statistics and unsupervised classification. At this step, we refined the regionalization using Artifi-cial Neural Networks (ANN) techniques including Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Convolutional Neural Network (CNN). The study is focused on building local ANN models which may significantly improve the prediction results of machine learning algorithms by taking into considerations non-stationarity and autocorrelation in spatial data.

  8. Non-invasive estimate of blood glucose and blood pressure from a photoplethysmograph by means of machine learning techniques.

    PubMed

    Monte-Moreno, Enric

    2011-10-01

    This work presents a system for a simultaneous non-invasive estimate of the blood glucose level (BGL) and the systolic (SBP) and diastolic (DBP) blood pressure, using a photoplethysmograph (PPG) and machine learning techniques. The method is independent of the person whose values are being measured and does not need calibration over time or subjects. The architecture of the system consists of a photoplethysmograph sensor, an activity detection module, a signal processing module that extracts features from the PPG waveform, and a machine learning algorithm that estimates the SBP, DBP and BGL values. The idea that underlies the system is that there is functional relationship between the shape of the PPG waveform and the blood pressure and glucose levels. As described in this paper we tested this method on 410 individuals without performing any personalized calibration. The results were computed after cross validation. The machine learning techniques tested were: ridge linear regression, a multilayer perceptron neural network, support vector machines and random forests. The best results were obtained with the random forest technique. In the case of blood pressure, the resulting coefficients of determination for reference vs. prediction were R(SBP)(2)=0.91, R(DBP)(2)=0.89, and R(BGL)(2)=0.90. For the glucose estimation, distribution of the points on a Clarke error grid placed 87.7% of points in zone A, 10.3% in zone B, and 1.9% in zone D. Blood pressure values complied with the grade B protocol of the British Hypertension society. An effective system for estimate of blood glucose and blood pressure from a photoplethysmograph is presented. The main advantage of the system is that for clinical use it complies with the grade B protocol of the British Hypertension society for the blood pressure and only in 1.9% of the cases did not detect hypoglycemia or hyperglycemia. Copyright © 2011 Elsevier B.V. All rights reserved.

  9. NMF-Based Image Quality Assessment Using Extreme Learning Machine.

    PubMed

    Wang, Shuigen; Deng, Chenwei; Lin, Weisi; Huang, Guang-Bin; Zhao, Baojun

    2017-01-01

    Numerous state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage process: distortion description followed by distortion effects pooling. As for the first stage, the distortion descriptors or measurements are expected to be effective representatives of human visual variations, while the second stage should well express the relationship among quality descriptors and the perceptual visual quality. However, most of the existing quality descriptors (e.g., luminance, contrast, and gradient) do not seem to be consistent with human perception, and the effects pooling is often done in ad-hoc ways. In this paper, we propose a novel full-reference IQA metric. It applies non-negative matrix factorization (NMF) to measure image degradations by making use of the parts-based representation of NMF. On the other hand, a new machine learning technique [extreme learning machine (ELM)] is employed to address the limitations of the existing pooling techniques. Compared with neural networks and support vector regression, ELM can achieve higher learning accuracy with faster learning speed. Extensive experimental results demonstrate that the proposed metric has better performance and lower computational complexity in comparison with the relevant state-of-the-art approaches.

  10. Machine Learning Force Field Parameters from Ab Initio Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Ying; Li, Hui; Pickard, Frank C.

    Machine learning (ML) techniques with the genetic algorithm (GA) have been applied to determine a polarizable force field parameters using only ab initio data from quantum mechanics (QM) calculations of molecular clusters at the MP2/6-31G(d,p), DFMP2(fc)/jul-cc-pVDZ, and DFMP2(fc)/jul-cc-pVTZ levels to predict experimental condensed phase properties (i.e., density and heat of vaporization). The performance of this ML/GA approach is demonstrated on 4943 dimer electrostatic potentials and 1250 cluster interaction energies for methanol. Excellent agreement between the training data set from QM calculations and the optimized force field model was achieved. The results were further improved by introducing an offset factor duringmore » the machine learning process to compensate for the discrepancy between the QM calculated energy and the energy reproduced by optimized force field, while maintaining the local “shape” of the QM energy surface. Throughout the machine learning process, experimental observables were not involved in the objective function, but were only used for model validation. The best model, optimized from the QM data at the DFMP2(fc)/jul-cc-pVTZ level, appears to perform even better than the original AMOEBA force field (amoeba09.prm), which was optimized empirically to match liquid properties. The present effort shows the possibility of using machine learning techniques to develop descriptive polarizable force field using only QM data. The ML/GA strategy to optimize force fields parameters described here could easily be extended to other molecular systems.« less

  11. Multimodal Learning Analytics and Education Data Mining: Using Computational Technologies to Measure Complex Learning Tasks

    ERIC Educational Resources Information Center

    Blikstein, Paulo; Worsley, Marcelo

    2016-01-01

    New high-frequency multimodal data collection technologies and machine learning analysis techniques could offer new insights into learning, especially when students have the opportunity to generate unique, personalized artifacts, such as computer programs, robots, and solutions engineering challenges. To date most of the work on learning analytics…

  12. Creating Turbulent Flow Realizations with Generative Adversarial Networks

    NASA Astrophysics Data System (ADS)

    King, Ryan; Graf, Peter; Chertkov, Michael

    2017-11-01

    Generating valid inflow conditions is a crucial, yet computationally expensive, step in unsteady turbulent flow simulations. We demonstrate a new technique for rapid generation of turbulent inflow realizations that leverages recent advances in machine learning for image generation using a deep convolutional generative adversarial network (DCGAN). The DCGAN is an unsupervised machine learning technique consisting of two competing neural networks that are trained against each other using backpropagation. One network, the generator, tries to produce samples from the true distribution of states, while the discriminator tries to distinguish between true and synthetic samples. We present results from a fully-trained DCGAN that is able to rapidly draw random samples from the full distribution of possible inflow states without needing to solve the Navier-Stokes equations, eliminating the costly process of spinning up inflow turbulence. This suggests a new paradigm in physics informed machine learning where the turbulence physics can be encoded in either the discriminator or generator. Finally, we also propose additional applications such as feature identification and subgrid scale modeling.

  13. Use of sentiment analysis for capturing patient experience from free-text comments posted online.

    PubMed

    Greaves, Felix; Ramirez-Cano, Daniel; Millett, Christopher; Darzi, Ara; Donaldson, Liam

    2013-11-01

    There are large amounts of unstructured, free-text information about quality of health care available on the Internet in blogs, social networks, and on physician rating websites that are not captured in a systematic way. New analytical techniques, such as sentiment analysis, may allow us to understand and use this information more effectively to improve the quality of health care. We attempted to use machine learning to understand patients' unstructured comments about their care. We used sentiment analysis techniques to categorize online free-text comments by patients as either positive or negative descriptions of their health care. We tried to automatically predict whether a patient would recommend a hospital, whether the hospital was clean, and whether they were treated with dignity from their free-text description, compared to the patient's own quantitative rating of their care. We applied machine learning techniques to all 6412 online comments about hospitals on the English National Health Service website in 2010 using Weka data-mining software. We also compared the results obtained from sentiment analysis with the paper-based national inpatient survey results at the hospital level using Spearman rank correlation for all 161 acute adult hospital trusts in England. There was 81%, 84%, and 89% agreement between quantitative ratings of care and those derived from free-text comments using sentiment analysis for cleanliness, being treated with dignity, and overall recommendation of hospital respectively (kappa scores: .40-.74, P<.001 for all). We observed mild to moderate associations between our machine learning predictions and responses to the large patient survey for the three categories examined (Spearman rho 0.37-0.51, P<.001 for all). The prediction accuracy that we have achieved using this machine learning process suggests that we are able to predict, from free-text, a reasonably accurate assessment of patients' opinion about different performance aspects of a hospital and that these machine learning predictions are associated with results of more conventional surveys.

  14. Machine learning approaches to analysing textual injury surveillance data: a systematic review.

    PubMed

    Vallmuur, Kirsten

    2015-06-01

    To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Systematic review. The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Prediction of lung cancer patient survival via supervised machine learning classification techniques.

    PubMed

    Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B

    2017-12-01

    Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Deep Learning Neural Networks and Bayesian Neural Networks in Data Analysis

    NASA Astrophysics Data System (ADS)

    Chernoded, Andrey; Dudko, Lev; Myagkov, Igor; Volkov, Petr

    2017-10-01

    Most of the modern analyses in high energy physics use signal-versus-background classification techniques of machine learning methods and neural networks in particular. Deep learning neural network is the most promising modern technique to separate signal and background and now days can be widely and successfully implemented as a part of physical analysis. In this article we compare Deep learning and Bayesian neural networks application as a classifiers in an instance of top quark analysis.

  17. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    PubMed

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  18. Automatic Classification of Sub-Techniques in Classical Cross-Country Skiing Using a Machine Learning Algorithm on Micro-Sensor Data

    PubMed Central

    Seeberg, Trine M.; Tjønnås, Johannes; Haugnes, Pål; Sandbakk, Øyvind

    2017-01-01

    The automatic classification of sub-techniques in classical cross-country skiing provides unique possibilities for analyzing the biomechanical aspects of outdoor skiing. This is currently possible due to the miniaturization and flexibility of wearable inertial measurement units (IMUs) that allow researchers to bring the laboratory to the field. In this study, we aimed to optimize the accuracy of the automatic classification of classical cross-country skiing sub-techniques by using two IMUs attached to the skier’s arm and chest together with a machine learning algorithm. The novelty of our approach is the reliable detection of individual cycles using a gyroscope on the skier’s arm, while a neural network machine learning algorithm robustly classifies each cycle to a sub-technique using sensor data from an accelerometer on the chest. In this study, 24 datasets from 10 different participants were separated into the categories training-, validation- and test-data. Overall, we achieved a classification accuracy of 93.9% on the test-data. Furthermore, we illustrate how an accurate classification of sub-techniques can be combined with data from standard sports equipment including position, altitude, speed and heart rate measuring systems. Combining this information has the potential to provide novel insight into physiological and biomechanical aspects valuable to coaches, athletes and researchers. PMID:29283421

  19. Machine learning-based methods for prediction of linear B-cell epitopes.

    PubMed

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  20. Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies.

    PubMed

    Hussain, Lal; Ahmed, Adeel; Saeed, Sharjil; Rathore, Saima; Awan, Imtiaz Ahmed; Shah, Saeed Arif; Majid, Abdul; Idris, Adnan; Awan, Anees Ahmed

    2018-02-06

    Prostate is a second leading causes of cancer deaths among men. Early detection of cancer can effectively reduce the rate of mortality caused by Prostate cancer. Due to high and multiresolution of MRIs from prostate cancer require a proper diagnostic systems and tools. In the past researchers developed Computer aided diagnosis (CAD) systems that help the radiologist to detect the abnormalities. In this research paper, we have employed novel Machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer. Moreover, different features extracting strategies are proposed to improve the detection performance. The features extracting strategies are based on texture, morphological, scale invariant feature transform (SIFT), and elliptic Fourier descriptors (EFDs) features. The performance was evaluated based on single as well as combination of features using Machine Learning Classification techniques. The Cross validation (Jack-knife k-fold) was performed and performance was evaluated in term of receiver operating curve (ROC) and specificity, sensitivity, Positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR). Based on single features extracting strategies, SVM Gaussian Kernel gives the highest accuracy of 98.34% with AUC of 0.999. While, using combination of features extracting strategies, SVM Gaussian kernel with texture + morphological, and EFDs + morphological features give the highest accuracy of 99.71% and AUC of 1.00.

  1. Model-based and Model-free Machine Learning Techniques for Diagnostic Prediction and Classification of Clinical Outcomes in Parkinson's Disease.

    PubMed

    Gao, Chao; Sun, Hanbo; Wang, Tuo; Tang, Ming; Bohnen, Nicolaas I; Müller, Martijn L T M; Herman, Talia; Giladi, Nir; Kalinin, Alexandr; Spino, Cathie; Dauer, William; Hausdorff, Jeffrey M; Dinov, Ivo D

    2018-05-08

    In this study, we apply a multidisciplinary approach to investigate falls in PD patients using clinical, demographic and neuroimaging data from two independent initiatives (University of Michigan and Tel Aviv Sourasky Medical Center). Using machine learning techniques, we construct predictive models to discriminate fallers and non-fallers. Through controlled feature selection, we identified the most salient predictors of patient falls including gait speed, Hoehn and Yahr stage, postural instability and gait difficulty-related measurements. The model-based and model-free analytical methods we employed included logistic regression, random forests, support vector machines, and XGboost. The reliability of the forecasts was assessed by internal statistical (5-fold) cross validation as well as by external out-of-bag validation. Four specific challenges were addressed in the study: Challenge 1, develop a protocol for harmonizing and aggregating complex, multisource, and multi-site Parkinson's disease data; Challenge 2, identify salient predictive features associated with specific clinical traits, e.g., patient falls; Challenge 3, forecast patient falls and evaluate the classification performance; and Challenge 4, predict tremor dominance (TD) vs. posture instability and gait difficulty (PIGD). Our findings suggest that, compared to other approaches, model-free machine learning based techniques provide a more reliable clinical outcome forecasting of falls in Parkinson's patients, for example, with a classification accuracy of about 70-80%.

  2. Novel jet observables from machine learning

    NASA Astrophysics Data System (ADS)

    Datta, Kaustuv; Larkoski, Andrew J.

    2018-03-01

    Previous studies have demonstrated the utility and applicability of machine learning techniques to jet physics. In this paper, we construct new observables for the discrimination of jets from different originating particles exclusively from information identified by the machine. The approach we propose is to first organize information in the jet by resolved phase space and determine the effective N -body phase space at which discrimination power saturates. This then allows for the construction of a discrimination observable from the N -body phase space coordinates. A general form of this observable can be expressed with numerous parameters that are chosen so that the observable maximizes the signal vs. background likelihood. Here, we illustrate this technique applied to discrimination of H\\to b\\overline{b} decays from massive g\\to b\\overline{b} splittings. We show that for a simple parametrization, we can construct an observable that has discrimination power comparable to, or better than, widely-used observables motivated from theory considerations. For the case of jets on which modified mass-drop tagger grooming is applied, the observable that the machine learns is essentially the angle of the dominant gluon emission off of the b\\overline{b} pair.

  3. Prediction of Return-to-original-work after an Industrial Accident Using Machine Learning and Comparison of Techniques

    PubMed Central

    2018-01-01

    Background Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. Methods An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. Results The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. Conclusion It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy. PMID:29736160

  4. Reviewing the connection between speech and obstructive sleep apnea.

    PubMed

    Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A

    2016-02-20

    Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.

  5. Comparison of Nine Statistical Model Based Warfarin Pharmacogenetic Dosing Algorithms Using the Racially Diverse International Warfarin Pharmacogenetic Consortium Cohort Database

    PubMed Central

    Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao

    2015-01-01

    Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568

  6. Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.

    PubMed

    Shuryak, Igor

    2017-01-01

    The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected "signal"; (5) using several machine learning methods to test the "signal's" sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation.

  7. Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets

    PubMed Central

    Shuryak, Igor

    2017-01-01

    The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected “signal”; (5) using several machine learning methods to test the “signal’s” sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation. PMID:28068401

  8. Scoping Study of Machine Learning Techniques for Visualization and Analysis of Multi-source Data in Nuclear Safeguards

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cui, Yonggang

    In implementation of nuclear safeguards, many different techniques are being used to monitor operation of nuclear facilities and safeguard nuclear materials, ranging from radiation detectors, flow monitors, video surveillance, satellite imagers, digital seals to open source search and reports of onsite inspections/verifications. Each technique measures one or more unique properties related to nuclear materials or operation processes. Because these data sets have no or loose correlations, it could be beneficial to analyze the data sets together to improve the effectiveness and efficiency of safeguards processes. Advanced visualization techniques and machine-learning based multi-modality analysis could be effective tools in such integratedmore » analysis. In this project, we will conduct a survey of existing visualization and analysis techniques for multi-source data and assess their potential values in nuclear safeguards.« less

  9. Mitigation of time-varying distortions in Nyquist-WDM systems using machine learning

    NASA Astrophysics Data System (ADS)

    Granada Torres, Jhon J.; Varughese, Siddharth; Thomas, Varghese A.; Chiuchiarelli, Andrea; Ralph, Stephen E.; Cárdenas Soto, Ana M.; Guerrero González, Neil

    2017-11-01

    We propose a machine learning-based nonsymmetrical demodulation technique relying on clustering to mitigate time-varying distortions derived from several impairments such as IQ imbalance, bias drift, phase noise and interchannel interference. Experimental results show that those impairments cause centroid movements in the received constellations seen in time-windows of 10k symbols in controlled scenarios. In our demodulation technique, the k-means algorithm iteratively identifies the cluster centroids in the constellation of the received symbols in short time windows by means of the optimization of decision thresholds for a minimum BER. We experimentally verified the effectiveness of this computationally efficient technique in multicarrier 16QAM Nyquist-WDM systems over 270 km links. Our nonsymmetrical demodulation technique outperforms the conventional QAM demodulation technique, reducing the OSNR requirement up to ∼0.8 dB at a BER of 1 × 10-2 for signals affected by interchannel interference.

  10. Enhancing interpretability of automatically extracted machine learning features: application to a RBM-Random Forest system on brain lesion segmentation.

    PubMed

    Pereira, Sérgio; Meier, Raphael; McKinley, Richard; Wiest, Roland; Alves, Victor; Silva, Carlos A; Reyes, Mauricio

    2018-02-01

    Machine learning systems are achieving better performances at the cost of becoming increasingly complex. However, because of that, they become less interpretable, which may cause some distrust by the end-user of the system. This is especially important as these systems are pervasively being introduced to critical domains, such as the medical field. Representation Learning techniques are general methods for automatic feature computation. Nevertheless, these techniques are regarded as uninterpretable "black boxes". In this paper, we propose a methodology to enhance the interpretability of automatically extracted machine learning features. The proposed system is composed of a Restricted Boltzmann Machine for unsupervised feature learning, and a Random Forest classifier, which are combined to jointly consider existing correlations between imaging data, features, and target variables. We define two levels of interpretation: global and local. The former is devoted to understanding if the system learned the relevant relations in the data correctly, while the later is focused on predictions performed on a voxel- and patient-level. In addition, we propose a novel feature importance strategy that considers both imaging data and target variables, and we demonstrate the ability of the approach to leverage the interpretability of the obtained representation for the task at hand. We evaluated the proposed methodology in brain tumor segmentation and penumbra estimation in ischemic stroke lesions. We show the ability of the proposed methodology to unveil information regarding relationships between imaging modalities and extracted features and their usefulness for the task at hand. In both clinical scenarios, we demonstrate that the proposed methodology enhances the interpretability of automatically learned features, highlighting specific learning patterns that resemble how an expert extracts relevant data from medical images. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Resident Space Object Characterization and Behavior Understanding via Machine Learning and Ontology-based Bayesian Networks

    NASA Astrophysics Data System (ADS)

    Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.

    2016-09-01

    In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.

  12. Automatic segmentation of airway tree based on local intensity filter and machine learning technique in 3D chest CT volume.

    PubMed

    Meng, Qier; Kitasaka, Takayuki; Nimura, Yukitaka; Oda, Masahiro; Ueno, Junji; Mori, Kensaku

    2017-02-01

    Airway segmentation plays an important role in analyzing chest computed tomography (CT) volumes for computerized lung cancer detection, emphysema diagnosis and pre- and intra-operative bronchoscope navigation. However, obtaining a complete 3D airway tree structure from a CT volume is quite a challenging task. Several researchers have proposed automated airway segmentation algorithms basically based on region growing and machine learning techniques. However, these methods fail to detect the peripheral bronchial branches, which results in a large amount of leakage. This paper presents a novel approach for more accurate extraction of the complex airway tree. This proposed segmentation method is composed of three steps. First, Hessian analysis is utilized to enhance the tube-like structure in CT volumes; then, an adaptive multiscale cavity enhancement filter is employed to detect the cavity-like structure with different radii. In the second step, support vector machine learning will be utilized to remove the false positive (FP) regions from the result obtained in the previous step. Finally, the graph-cut algorithm is used to refine the candidate voxels to form an integrated airway tree. A test dataset including 50 standard-dose chest CT volumes was used for evaluating our proposed method. The average extraction rate was about 79.1 % with the significantly decreased FP rate. A new method of airway segmentation based on local intensity structure and machine learning technique was developed. The method was shown to be feasible for airway segmentation in a computer-aided diagnosis system for a lung and bronchoscope guidance system.

  13. Training Scalable Restricted Boltzmann Machines Using a Quantum Annealer

    NASA Astrophysics Data System (ADS)

    Kumar, V.; Bass, G.; Dulny, J., III

    2016-12-01

    Machine learning and the optimization involved therein is of critical importance for commercial and military applications. Due to the computational complexity of many-variable optimization, the conventional approach is to employ meta-heuristic techniques to find suboptimal solutions. Quantum Annealing (QA) hardware offers a completely novel approach with the potential to obtain significantly better solutions with large speed-ups compared to traditional computing. In this presentation, we describe our development of new machine learning algorithms tailored for QA hardware. We are training restricted Boltzmann machines (RBMs) using QA hardware on large, high-dimensional commercial datasets. Traditional optimization heuristics such as contrastive divergence and other closely related techniques are slow to converge, especially on large datasets. Recent studies have indicated that QA hardware when used as a sampler provides better training performance compared to conventional approaches. Most of these studies have been limited to moderately-sized datasets due to the hardware restrictions imposed by exisitng QA devices, which make it difficult to solve real-world problems at scale. In this work we develop novel strategies to circumvent this issue. We discuss scale-up techniques such as enhanced embedding and partitioned RBMs which allow large commercial datasets to be learned using QA hardware. We present our initial results obtained by training an RBM as an autoencoder on an image dataset. The results obtained so far indicate that the convergence rates can be improved significantly by increasing RBM network connectivity. These ideas can be readily applied to generalized Boltzmann machines and we are currently investigating this in an ongoing project.

  14. Quantitative approaches to energy and glucose homeostasis: machine learning and modelling for precision understanding and prediction

    PubMed Central

    Murphy, Kevin G.; Jones, Nick S.

    2018-01-01

    Obesity is a major global public health problem. Understanding how energy homeostasis is regulated, and can become dysregulated, is crucial for developing new treatments for obesity. Detailed recording of individual behaviour and new imaging modalities offer the prospect of medically relevant models of energy homeostasis that are both understandable and individually predictive. The profusion of data from these sources has led to an interest in applying machine learning techniques to gain insight from these large, relatively unstructured datasets. We review both physiological models and machine learning results across a diverse range of applications in energy homeostasis, and highlight how modelling and machine learning can work together to improve predictive ability. We collect quantitative details in a comprehensive mathematical supplement. We also discuss the prospects of forecasting homeostatic behaviour and stress the importance of characterizing stochasticity within and between individuals in order to provide practical, tailored forecasts and guidance to combat the spread of obesity. PMID:29367240

  15. Predicting the stability of ternary intermetallics with density functional theory and machine learning

    NASA Astrophysics Data System (ADS)

    Schmidt, Jonathan; Chen, Liming; Botti, Silvana; Marques, Miguel A. L.

    2018-06-01

    We use a combination of machine learning techniques and high-throughput density-functional theory calculations to explore ternary compounds with the AB2C2 composition. We chose the two most common intermetallic prototypes for this composition, namely, the tI10-CeAl2Ga2 and the tP10-FeMo2B2 structures. Our results suggest that there may be ˜10 times more stable compounds in these phases than previously known. These are mostly metallic and non-magnetic. While the use of machine learning reduces the overall calculation cost by around 75%, some limitations of its predictive power still exist, in particular, for compounds involving the second-row of the periodic table or magnetic elements.

  16. Autonomous Scanning Probe Microscopy in Situ Tip Conditioning through Machine Learning.

    PubMed

    Rashidi, Mohammad; Wolkow, Robert A

    2018-05-23

    Atomic-scale characterization and manipulation with scanning probe microscopy rely upon the use of an atomically sharp probe. Here we present automated methods based on machine learning to automatically detect and recondition the quality of the probe of a scanning tunneling microscope. As a model system, we employ these techniques on the technologically relevant hydrogen-terminated silicon surface, training the network to recognize abnormalities in the appearance of surface dangling bonds. Of the machine learning methods tested, a convolutional neural network yielded the greatest accuracy, achieving a positive identification of degraded tips in 97% of the test cases. By using multiple points of comparison and majority voting, the accuracy of the method is improved beyond 99%.

  17. Boosted Regression Trees Outperforms Support Vector Machines in Predicting (Regional) Yields of Winter Wheat from Single and Cumulated Dekadal Spot-VGT Derived Normalized Difference Vegetation Indices

    NASA Astrophysics Data System (ADS)

    Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos

    2016-08-01

    This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.

  18. Sentiment analysis: a comparison of deep learning neural network algorithm with SVM and naϊve Bayes for Indonesian text

    NASA Astrophysics Data System (ADS)

    Calvin Frans Mariel, Wahyu; Mariyah, Siti; Pramana, Setia

    2018-03-01

    Deep learning is a new era of machine learning techniques that essentially imitate the structure and function of the human brain. It is a development of deeper Artificial Neural Network (ANN) that uses more than one hidden layer. Deep Learning Neural Network has a great ability on recognizing patterns from various data types such as picture, audio, text, and many more. In this paper, the authors tries to measure that algorithm’s ability by applying it into the text classification. The classification task herein is done by considering the content of sentiment in a text which is also called as sentiment analysis. By using several combinations of text preprocessing and feature extraction techniques, we aim to compare the precise modelling results of Deep Learning Neural Network with the other two commonly used algorithms, the Naϊve Bayes and Support Vector Machine (SVM). This algorithm comparison uses Indonesian text data with balanced and unbalanced sentiment composition. Based on the experimental simulation, Deep Learning Neural Network clearly outperforms the Naϊve Bayes and SVM and offers a better F-1 Score while for the best feature extraction technique which improves that modelling result is Bigram.

  19. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  20. Comparison of machine learning and semi-quantification algorithms for (I123)FP-CIT classification: the beginning of the end for semi-quantification?

    PubMed

    Taylor, Jonathan Christopher; Fenner, John Wesley

    2017-11-29

    Semi-quantification methods are well established in the clinic for assisted reporting of (I123) Ioflupane images. Arguably, these are limited diagnostic tools. Recent research has demonstrated the potential for improved classification performance offered by machine learning algorithms. A direct comparison between methods is required to establish whether a move towards widespread clinical adoption of machine learning algorithms is justified. This study compared three machine learning algorithms with that of a range of semi-quantification methods, using the Parkinson's Progression Markers Initiative (PPMI) research database and a locally derived clinical database for validation. Machine learning algorithms were based on support vector machine classifiers with three different sets of features: Voxel intensities Principal components of image voxel intensities Striatal binding radios from the putamen and caudate. Semi-quantification methods were based on striatal binding ratios (SBRs) from both putamina, with and without consideration of the caudates. Normal limits for the SBRs were defined through four different methods: Minimum of age-matched controls Mean minus 1/1.5/2 standard deviations from age-matched controls Linear regression of normal patient data against age (minus 1/1.5/2 standard errors) Selection of the optimum operating point on the receiver operator characteristic curve from normal and abnormal training data Each machine learning and semi-quantification technique was evaluated with stratified, nested 10-fold cross-validation, repeated 10 times. The mean accuracy of the semi-quantitative methods for classification of local data into Parkinsonian and non-Parkinsonian groups varied from 0.78 to 0.87, contrasting with 0.89 to 0.95 for classifying PPMI data into healthy controls and Parkinson's disease groups. The machine learning algorithms gave mean accuracies between 0.88 to 0.92 and 0.95 to 0.97 for local and PPMI data respectively. Classification performance was lower for the local database than the research database for both semi-quantitative and machine learning algorithms. However, for both databases, the machine learning methods generated equal or higher mean accuracies (with lower variance) than any of the semi-quantification approaches. The gain in performance from using machine learning algorithms as compared to semi-quantification was relatively small and may be insufficient, when considered in isolation, to offer significant advantages in the clinical context.

  1. Solving a Higgs optimization problem with quantum annealing for machine learning.

    PubMed

    Mott, Alex; Job, Joshua; Vlimant, Jean-Roch; Lidar, Daniel; Spiropulu, Maria

    2017-10-18

    The discovery of Higgs-boson decays in a background of standard-model processes was assisted by machine learning methods. The classifiers used to separate signals such as these from background are trained using highly unerring but not completely perfect simulations of the physical processes involved, often resulting in incorrect labelling of background processes or signals (label noise) and systematic errors. Here we use quantum and classical annealing (probabilistic techniques for approximating the global maximum or minimum of a given function) to solve a Higgs-signal-versus-background machine learning optimization problem, mapped to a problem of finding the ground state of a corresponding Ising spin model. We build a set of weak classifiers based on the kinematic observables of the Higgs decay photons, which we then use to construct a strong classifier. This strong classifier is highly resilient against overtraining and against errors in the correlations of the physical observables in the training data. We show that the resulting quantum and classical annealing-based classifier systems perform comparably to the state-of-the-art machine learning methods that are currently used in particle physics. However, in contrast to these methods, the annealing-based classifiers are simple functions of directly interpretable experimental parameters with clear physical meaning. The annealer-trained classifiers use the excited states in the vicinity of the ground state and demonstrate some advantage over traditional machine learning methods for small training datasets. Given the relative simplicity of the algorithm and its robustness to error, this technique may find application in other areas of experimental particle physics, such as real-time decision making in event-selection problems and classification in neutrino physics.

  2. An automated ranking platform for machine learning regression models for meat spoilage prediction using multi-spectral imaging and metabolic profiling.

    PubMed

    Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady

    2017-09-01

    Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Solving a Higgs optimization problem with quantum annealing for machine learning

    NASA Astrophysics Data System (ADS)

    Mott, Alex; Job, Joshua; Vlimant, Jean-Roch; Lidar, Daniel; Spiropulu, Maria

    2017-10-01

    The discovery of Higgs-boson decays in a background of standard-model processes was assisted by machine learning methods. The classifiers used to separate signals such as these from background are trained using highly unerring but not completely perfect simulations of the physical processes involved, often resulting in incorrect labelling of background processes or signals (label noise) and systematic errors. Here we use quantum and classical annealing (probabilistic techniques for approximating the global maximum or minimum of a given function) to solve a Higgs-signal-versus-background machine learning optimization problem, mapped to a problem of finding the ground state of a corresponding Ising spin model. We build a set of weak classifiers based on the kinematic observables of the Higgs decay photons, which we then use to construct a strong classifier. This strong classifier is highly resilient against overtraining and against errors in the correlations of the physical observables in the training data. We show that the resulting quantum and classical annealing-based classifier systems perform comparably to the state-of-the-art machine learning methods that are currently used in particle physics. However, in contrast to these methods, the annealing-based classifiers are simple functions of directly interpretable experimental parameters with clear physical meaning. The annealer-trained classifiers use the excited states in the vicinity of the ground state and demonstrate some advantage over traditional machine learning methods for small training datasets. Given the relative simplicity of the algorithm and its robustness to error, this technique may find application in other areas of experimental particle physics, such as real-time decision making in event-selection problems and classification in neutrino physics.

  4. Machine Learning Intermolecular Potentials for 1,3,5-Triamino-2,4,6-trinitrobenzene (TATB) Using Symmetry-Adapted Perturbation Theory

    DTIC Science & Technology

    2018-04-25

    unlimited. NOTICES Disclaimers The findings in this report are not to be construed as an official Department of the Army position unless so...this report, intermolecular potentials for 1,3,5-triamino-2,4,6-trinitrobenzene (TATB) are developed using machine learning techniques. Three...potentials based on support vector regression, kernel ridge regression, and a neural network are fit using symmetry-adapted perturbation theory. The

  5. Collective behaviour across animal species.

    PubMed

    DeLellis, Pietro; Polverino, Giovanni; Ustuner, Gozde; Abaid, Nicole; Macrì, Simone; Bollt, Erik M; Porfiri, Maurizio

    2014-01-16

    We posit a new geometric perspective to define, detect, and classify inherent patterns of collective behaviour across a variety of animal species. We show that machine learning techniques, and specifically the isometric mapping algorithm, allow the identification and interpretation of different types of collective behaviour in five social animal species. These results offer a first glimpse at the transformative potential of machine learning for ethology, similar to its impact on robotics, where it enabled robots to recognize objects and navigate the environment.

  6. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    PubMed Central

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie PMID:29688379

  7. A Novel Local Learning based Approach With Application to Breast Cancer Diagnosis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Songhua; Tourassi, Georgia

    2012-01-01

    The purpose of this study is to develop and evaluate a novel local learning-based approach for computer-assisted diagnosis of breast cancer. Our new local learning based algorithm using the linear logistic regression method as its base learner is described. Overall, our algorithm will perform its stochastic searching process until the total allowed computing time is used up by our random walk process in identifying the most suitable population subdivision scheme and their corresponding individual base learners. The proposed local learning-based approach was applied for the prediction of breast cancer given 11 mammographic and clinical findings reported by physicians using themore » BI-RADS lexicon. Our database consisted of 850 patients with biopsy confirmed diagnosis (290 malignant and 560 benign). We also compared the performance of our method with a collection of publicly available state-of-the-art machine learning methods. Predictive performance for all classifiers was evaluated using 10-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Figure 1 reports the performance of 54 machine learning methods implemented in the machine learning toolkit Weka (version 3.0). We introduced a novel local learning-based classifier and compared it with an extensive list of other classifiers for the problem of breast cancer diagnosis. Our experiments show that the algorithm superior prediction performance outperforming a wide range of other well established machine learning techniques. Our conclusion complements the existing understanding in the machine learning field that local learning may capture complicated, non-linear relationships exhibited by real-world datasets.« less

  8. Automatic Earthquake Detection by Active Learning

    NASA Astrophysics Data System (ADS)

    Bergen, K.; Beroza, G. C.

    2017-12-01

    In recent years, advances in machine learning have transformed fields such as image recognition, natural language processing and recommender systems. Many of these performance gains have relied on the availability of large, labeled data sets to train high-accuracy models; labeled data sets are those for which each sample includes a target class label, such as waveforms tagged as either earthquakes or noise. Earthquake seismologists are increasingly leveraging machine learning and data mining techniques to detect and analyze weak earthquake signals in large seismic data sets. One of the challenges in applying machine learning to seismic data sets is the limited labeled data problem; learning algorithms need to be given examples of earthquake waveforms, but the number of known events, taken from earthquake catalogs, may be insufficient to build an accurate detector. Furthermore, earthquake catalogs are known to be incomplete, resulting in training data that may be biased towards larger events and contain inaccurate labels. This challenge is compounded by the class imbalance problem; the events of interest, earthquakes, are infrequent relative to noise in continuous data sets, and many learning algorithms perform poorly on rare classes. In this work, we investigate the use of active learning for automatic earthquake detection. Active learning is a type of semi-supervised machine learning that uses a human-in-the-loop approach to strategically supplement a small initial training set. The learning algorithm incorporates domain expertise through interaction between a human expert and the algorithm, with the algorithm actively posing queries to the user to improve detection performance. We demonstrate the potential of active machine learning to improve earthquake detection performance with limited available training data.

  9. Clinical data miner: an electronic case report form system with integrated data preprocessing and machine-learning libraries supporting clinical diagnostic model research.

    PubMed

    Installé, Arnaud Jf; Van den Bosch, Thierry; De Moor, Bart; Timmerman, Dirk

    2014-10-20

    Using machine-learning techniques, clinical diagnostic model research extracts diagnostic models from patient data. Traditionally, patient data are often collected using electronic Case Report Form (eCRF) systems, while mathematical software is used for analyzing these data using machine-learning techniques. Due to the lack of integration between eCRF systems and mathematical software, extracting diagnostic models is a complex, error-prone process. Moreover, due to the complexity of this process, it is usually only performed once, after a predetermined number of data points have been collected, without insight into the predictive performance of the resulting models. The objective of the study of Clinical Data Miner (CDM) software framework is to offer an eCRF system with integrated data preprocessing and machine-learning libraries, improving efficiency of the clinical diagnostic model research workflow, and to enable optimization of patient inclusion numbers through study performance monitoring. The CDM software framework was developed using a test-driven development (TDD) approach, to ensure high software quality. Architecturally, CDM's design is split over a number of modules, to ensure future extendability. The TDD approach has enabled us to deliver high software quality. CDM's eCRF Web interface is in active use by the studies of the International Endometrial Tumor Analysis consortium, with over 4000 enrolled patients, and more studies planned. Additionally, a derived user interface has been used in six separate interrater agreement studies. CDM's integrated data preprocessing and machine-learning libraries simplify some otherwise manual and error-prone steps in the clinical diagnostic model research workflow. Furthermore, CDM's libraries provide study coordinators with a method to monitor a study's predictive performance as patient inclusions increase. To our knowledge, CDM is the only eCRF system integrating data preprocessing and machine-learning libraries. This integration improves the efficiency of the clinical diagnostic model research workflow. Moreover, by simplifying the generation of learning curves, CDM enables study coordinators to assess more accurately when data collection can be terminated, resulting in better models or lower patient recruitment costs.

  10. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  11. Prediction of activity type in preschool children using machine learning techniques.

    PubMed

    Hagenbuchner, Markus; Cliff, Dylan P; Trost, Stewart G; Van Tuc, Nguyen; Peoples, Gregory E

    2015-07-01

    Recent research has shown that machine learning techniques can accurately predict activity classes from accelerometer data in adolescents and adults. The purpose of this study is to develop and test machine learning models for predicting activity type in preschool-aged children. Participants completed 12 standardised activity trials (TV, reading, tablet game, quiet play, art, treasure hunt, cleaning up, active game, obstacle course, bicycle riding) over two laboratory visits. Eleven children aged 3-6 years (mean age=4.8±0.87; 55% girls) completed the activity trials while wearing an ActiGraph GT3X+ accelerometer on the right hip. Activities were categorised into five activity classes: sedentary activities, light activities, moderate to vigorous activities, walking, and running. A standard feed-forward Artificial Neural Network and a Deep Learning Ensemble Network were trained on features in the accelerometer data used in previous investigations (10th, 25th, 50th, 75th and 90th percentiles and the lag-one autocorrelation). Overall recognition accuracy for the standard feed forward Artificial Neural Network was 69.7%. Recognition accuracy for sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running was 82%, 79%, 64%, 36% and 46%, respectively. In comparison, overall recognition accuracy for the Deep Learning Ensemble Network was 82.6%. For sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running recognition accuracy was 84%, 91%, 79%, 73% and 73%, respectively. Ensemble machine learning approaches such as Deep Learning Ensemble Network can accurately predict activity type from accelerometer data in preschool children. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  12. Machine learning and next-generation asteroid surveys

    NASA Astrophysics Data System (ADS)

    Nugent, Carrie R.; Dailey, John; Cutri, Roc M.; Masci, Frank J.; Mainzer, Amy K.

    2017-10-01

    Next-generation surveys such as NEOCam (Mainzer et al., 2016) will sift through tens of millions of point source detections daily to detect and discover asteroids. This requires new, more efficient techniques to distinguish between solar system objects, background stars and galaxies, and artifacts such as cosmic rays, scattered light and diffraction spikes.Supervised machine learning is a set of algorithms that allows computers to classify data on a training set, and then apply that classification to make predictions on new datasets. It has been employed by a broad range of fields, including computer vision, medical diagnoses, economics, and natural language processing. It has also been applied to astronomical datasets, including transient identification in the Palomar Transient Factory pipeline (Masci et al., 2016), and in the Pan-STARRS1 difference imaging (D. E. Wright et al., 2015).As part of the NEOCam extended phase A work we apply machine learning techniques to the problem of asteroid detection. Asteroid detection is an ideal application of supervised learning, as there is a wealth of metrics associated with each extracted source, and suitable training sets are easily created. Using the vetted NEOWISE dataset (E. L. Wright et al., 2010, Mainzer et al., 2011) as a proof-of-concept of this technique, we applied the python package sklearn. We report on reliability, feature set selection, and the suitability of various algorithms.

  13. Ontology-Based Learner Categorization through Case Based Reasoning and Fuzzy Logic

    ERIC Educational Resources Information Center

    Sarwar, Sohail; García-Castro, Raul; Qayyum, Zia Ul; Safyan, Muhammad; Munir, Rana Faisal

    2017-01-01

    Learner categorization has a pivotal role in making e-learning systems a success. However, learner characteristics exploited at abstract level of granularity by contemporary techniques cannot categorize the learners effectively. In this paper, an architecture of e-learning framework has been presented that exploits the machine learning based…

  14. Genetic algorithm enhanced by machine learning in dynamic aperture optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Yongjun; Cheng, Weixing; Yu, Li Hua

    With the aid of machine learning techniques, the genetic algorithm has been enhanced and applied to the multi-objective optimization problem presented by the dynamic aperture of the National Synchrotron Light Source II (NSLS-II) Storage Ring. During the evolution processes employed by the genetic algorithm, the population is classified into different clusters in the search space. The clusters with top average fitness are given “elite” status. Intervention on the population is implemented by repopulating some potentially competitive candidates based on the experience learned from the accumulated data. These candidates replace randomly selected candidates among the original data pool. The average fitnessmore » of the population is therefore improved while diversity is not lost. Maintaining diversity ensures that the optimization is global rather than local. The quality of the population increases and produces more competitive descendants accelerating the evolution process significantly. When identifying the distribution of optimal candidates, they appear to be located in isolated islands within the search space. Some of these optimal candidates have been experimentally confirmed at the NSLS-II storage ring. Furthermore, the machine learning techniques that exploit the genetic algorithm can also be used in other population-based optimization problems such as particle swarm algorithm.« less

  15. Geologic Carbon Sequestration Leakage Detection: A Physics-Guided Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Lin, Y.; Harp, D. R.; Chen, B.; Pawar, R.

    2017-12-01

    One of the risks of large-scale geologic carbon sequestration is the potential migration of fluids out of the storage formations. Accurate and fast detection of this fluids migration is not only important but also challenging, due to the large subsurface uncertainty and complex governing physics. Traditional leakage detection and monitoring techniques rely on geophysical observations including pressure. However, the resulting accuracy of these methods is limited because of indirect information they provide requiring expert interpretation, therefore yielding in-accurate estimates of leakage rates and locations. In this work, we develop a novel machine-learning technique based on support vector regression to effectively and efficiently predict the leakage locations and leakage rates based on limited number of pressure observations. Compared to the conventional data-driven approaches, which can be usually seem as a "black box" procedure, we develop a physics-guided machine learning method to incorporate the governing physics into the learning procedure. To validate the performance of our proposed leakage detection method, we employ our method to both 2D and 3D synthetic subsurface models. Our novel CO2 leakage detection method has shown high detection accuracy in the example problems.

  16. Genetic algorithm enhanced by machine learning in dynamic aperture optimization

    NASA Astrophysics Data System (ADS)

    Li, Yongjun; Cheng, Weixing; Yu, Li Hua; Rainer, Robert

    2018-05-01

    With the aid of machine learning techniques, the genetic algorithm has been enhanced and applied to the multi-objective optimization problem presented by the dynamic aperture of the National Synchrotron Light Source II (NSLS-II) Storage Ring. During the evolution processes employed by the genetic algorithm, the population is classified into different clusters in the search space. The clusters with top average fitness are given "elite" status. Intervention on the population is implemented by repopulating some potentially competitive candidates based on the experience learned from the accumulated data. These candidates replace randomly selected candidates among the original data pool. The average fitness of the population is therefore improved while diversity is not lost. Maintaining diversity ensures that the optimization is global rather than local. The quality of the population increases and produces more competitive descendants accelerating the evolution process significantly. When identifying the distribution of optimal candidates, they appear to be located in isolated islands within the search space. Some of these optimal candidates have been experimentally confirmed at the NSLS-II storage ring. The machine learning techniques that exploit the genetic algorithm can also be used in other population-based optimization problems such as particle swarm algorithm.

  17. Genetic algorithm enhanced by machine learning in dynamic aperture optimization

    DOE PAGES

    Li, Yongjun; Cheng, Weixing; Yu, Li Hua; ...

    2018-05-29

    With the aid of machine learning techniques, the genetic algorithm has been enhanced and applied to the multi-objective optimization problem presented by the dynamic aperture of the National Synchrotron Light Source II (NSLS-II) Storage Ring. During the evolution processes employed by the genetic algorithm, the population is classified into different clusters in the search space. The clusters with top average fitness are given “elite” status. Intervention on the population is implemented by repopulating some potentially competitive candidates based on the experience learned from the accumulated data. These candidates replace randomly selected candidates among the original data pool. The average fitnessmore » of the population is therefore improved while diversity is not lost. Maintaining diversity ensures that the optimization is global rather than local. The quality of the population increases and produces more competitive descendants accelerating the evolution process significantly. When identifying the distribution of optimal candidates, they appear to be located in isolated islands within the search space. Some of these optimal candidates have been experimentally confirmed at the NSLS-II storage ring. Furthermore, the machine learning techniques that exploit the genetic algorithm can also be used in other population-based optimization problems such as particle swarm algorithm.« less

  18. Modalities, Relations, and Learning

    NASA Astrophysics Data System (ADS)

    Müller, Martin Eric

    While the popularity of statistical, probabilistic and exhaustive machine learning techniques still increases, relational and logic approaches are still a niche market in research. While the former approaches focus on predictive accuracy, the latter ones prove to be indispensable in knowledge discovery.

  19. Predicting Solar Activity Using Machine-Learning Methods

    NASA Astrophysics Data System (ADS)

    Bobra, M.

    2017-12-01

    Of all the activity observed on the Sun, two of the most energetic events are flares and coronal mass ejections. However, we do not, as of yet, fully understand the physical mechanism that triggers solar eruptions. A machine-learning algorithm, which is favorable in cases where the amount of data is large, is one way to [1] empirically determine the signatures of this mechanism in solar image data and [2] use them to predict solar activity. In this talk, we discuss the application of various machine learning algorithms - specifically, a Support Vector Machine, a sparse linear regression (Lasso), and Convolutional Neural Network - to image data from the photosphere, chromosphere, transition region, and corona taken by instruments aboard the Solar Dynamics Observatory in order to predict solar activity on a variety of time scales. Such an approach may be useful since, at the present time, there are no physical models of flares available for real-time prediction. We discuss our results (Bobra and Couvidat, 2015; Bobra and Ilonidis, 2016; Jonas et al., 2017) as well as other attempts to predict flares using machine-learning (e.g. Ahmed et al., 2013; Nishizuka et al. 2017) and compare these results with the more traditional techniques used by the NOAA Space Weather Prediction Center (Crown, 2012). We also discuss some of the challenges in using machine-learning algorithms for space science applications.

  20. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations.

    PubMed

    Torkzaban, Bahareh; Kayvanjoo, Amir Hossein; Ardalan, Arman; Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.

  1. Harnessing information from injury narratives in the 'big data' era: understanding and applying machine learning for injury surveillance.

    PubMed

    Vallmuur, Kirsten; Marucci-Wellman, Helen R; Taylor, Jennifer A; Lehto, Mark; Corns, Helen L; Smith, Gordon S

    2016-04-01

    Vast amounts of injury narratives are collected daily and are available electronically in real time and have great potential for use in injury surveillance and evaluation. Machine learning algorithms have been developed to assist in identifying cases and classifying mechanisms leading to injury in a much timelier manner than is possible when relying on manual coding of narratives. The aim of this paper is to describe the background, growth, value, challenges and future directions of machine learning as applied to injury surveillance. This paper reviews key aspects of machine learning using injury narratives, providing a case study to demonstrate an application to an established human-machine learning approach. The range of applications and utility of narrative text has increased greatly with advancements in computing techniques over time. Practical and feasible methods exist for semiautomatic classification of injury narratives which are accurate, efficient and meaningful. The human-machine learning approach described in the case study achieved high sensitivity and PPV and reduced the need for human coding to less than a third of cases in one large occupational injury database. The last 20 years have seen a dramatic change in the potential for technological advancements in injury surveillance. Machine learning of 'big injury narrative data' opens up many possibilities for expanded sources of data which can provide more comprehensive, ongoing and timely surveillance to inform future injury prevention policy and practice. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  2. New Techniques for Deep Learning with Geospatial Data using TensorFlow, Earth Engine, and Google Cloud Platform

    NASA Astrophysics Data System (ADS)

    Hancher, M.

    2017-12-01

    Recent years have seen promising results from many research teams applying deep learning techniques to geospatial data processing. In that same timeframe, TensorFlow has emerged as the most popular framework for deep learning in general, and Google has assembled petabytes of Earth observation data from a wide variety of sources and made them available in analysis-ready form in the cloud through Google Earth Engine. Nevertheless, developing and applying deep learning to geospatial data at scale has been somewhat cumbersome to date. We present a new set of tools and techniques that simplify this process. Our approach combines the strengths of several underlying tools: TensorFlow for its expressive deep learning framework; Earth Engine for data management, preprocessing, postprocessing, and visualization; and other tools in Google Cloud Platform to train TensorFlow models at scale, perform additional custom parallel data processing, and drive the entire process from a single familiar Python development environment. These tools can be used to easily apply standard deep neural networks, convolutional neural networks, and other custom model architectures to a variety of geospatial data structures. We discuss our experiences applying these and related tools to a range of machine learning problems, including classic problems like cloud detection, building detection, land cover classification, as well as more novel problems like illegal fishing detection. Our improved tools will make it easier for geospatial data scientists to apply modern deep learning techniques to their own problems, and will also make it easier for machine learning researchers to advance the state of the art of those techniques.

  3. Predicting Flavonoid UGT Regioselectivity

    PubMed Central

    Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

    2011-01-01

    Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849

  4. A strategy to apply machine learning to small datasets in materials science

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Ling, Chen

    2018-12-01

    There is growing interest in applying machine learning techniques in the research of materials science. However, although it is recognized that materials datasets are typically smaller and sometimes more diverse compared to other fields, the influence of availability of materials data on training machine learning models has not yet been studied, which prevents the possibility to establish accurate predictive rules using small materials datasets. Here we analyzed the fundamental interplay between the availability of materials data and the predictive capability of machine learning models. Instead of affecting the model precision directly, the effect of data size is mediated by the degree of freedom (DoF) of model, resulting in the phenomenon of association between precision and DoF. The appearance of precision-DoF association signals the issue of underfitting and is characterized by large bias of prediction, which consequently restricts the accurate prediction in unknown domains. We proposed to incorporate the crude estimation of property in the feature space to establish ML models using small sized materials data, which increases the accuracy of prediction without the cost of higher DoF. In three case studies of predicting the band gap of binary semiconductors, lattice thermal conductivity, and elastic properties of zeolites, the integration of crude estimation effectively boosted the predictive capability of machine learning models to state-of-art levels, demonstrating the generality of the proposed strategy to construct accurate machine learning models using small materials dataset.

  5. Accuracy comparison among different machine learning techniques for detecting malicious codes

    NASA Astrophysics Data System (ADS)

    Narang, Komal

    2016-03-01

    In this paper, a machine learning based model for malware detection is proposed. It can detect newly released malware i.e. zero day attack by analyzing operation codes on Android operating system. The accuracy of Naïve Bayes, Support Vector Machine (SVM) and Neural Network for detecting malicious code has been compared for the proposed model. In the experiment 400 benign files, 100 system files and 500 malicious files have been used to construct the model. The model yields the best accuracy 88.9% when neural network is used as classifier and achieved 95% and 82.8% accuracy for sensitivity and specificity respectively.

  6. Toward accelerating landslide mapping with interactive machine learning techniques

    NASA Astrophysics Data System (ADS)

    Stumpf, André; Lachiche, Nicolas; Malet, Jean-Philippe; Kerle, Norman; Puissant, Anne

    2013-04-01

    Despite important advances in the development of more automated methods for landslide mapping from optical remote sensing images, the elaboration of inventory maps after major triggering events still remains a tedious task. Image classification with expert defined rules typically still requires significant manual labour for the elaboration and adaption of rule sets for each particular case. Machine learning algorithm, on the contrary, have the ability to learn and identify complex image patterns from labelled examples but may require relatively large amounts of training data. In order to reduce the amount of required training data active learning has evolved as key concept to guide the sampling for applications such as document classification, genetics and remote sensing. The general underlying idea of most active learning approaches is to initialize a machine learning model with a small training set, and to subsequently exploit the model state and/or the data structure to iteratively select the most valuable samples that should be labelled by the user and added in the training set. With relatively few queries and labelled samples, an active learning strategy should ideally yield at least the same accuracy than an equivalent classifier trained with many randomly selected samples. Our study was dedicated to the development of an active learning approach for landslide mapping from VHR remote sensing images with special consideration of the spatial distribution of the samples. The developed approach is a region-based query heuristic that enables to guide the user attention towards few compact spatial batches rather than distributed points resulting in time savings of 50% and more compared to standard active learning techniques. The approach was tested with multi-temporal and multi-sensor satellite images capturing recent large scale triggering events in Brazil and China and demonstrated balanced user's and producer's accuracies between 74% and 80%. The assessment also included an experimental evaluation of the uncertainties of manual mappings from multiple experts and demonstrated strong relationships between the uncertainty of the experts and the machine learning model.

  7. A review of intelligent systems for heart sound signal analysis.

    PubMed

    Nabih-Ali, Mohammed; El-Dahshan, El-Sayed A; Yahia, Ashraf S

    2017-10-01

    Intelligent computer-aided diagnosis (CAD) systems can enhance the diagnostic capabilities of physicians and reduce the time required for accurate diagnosis. CAD systems could provide physicians with a suggestion about the diagnostic of heart diseases. The objective of this paper is to review the recent published preprocessing, feature extraction and classification techniques and their state of the art of phonocardiogram (PCG) signal analysis. Published literature reviewed in this paper shows the potential of machine learning techniques as a design tool in PCG CAD systems and reveals that the CAD systems for PCG signal analysis are still an open problem. Related studies are compared to their datasets, feature extraction techniques and the classifiers they used. Current achievements and limitations in developing CAD systems for PCG signal analysis using machine learning techniques are presented and discussed. In the light of this review, a number of future research directions for PCG signal analysis are provided.

  8. A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam

    NASA Astrophysics Data System (ADS)

    Foozy, Cik Feresa Mohd; Ahmad, Rabiah; Faizal Abdollah, M. A.; Chai Wen, Chuah

    2017-08-01

    SMS Spamming is a serious attack that can manipulate the use of the SMS by spreading the advertisement in bulk. By sending the unwanted SMS that contain advertisement can make the users feeling disturb and this against the privacy of the mobile users. To overcome these issues, many studies have proposed to detect SMS Spam by using data mining tools. This paper will do a comparative study using five machine learning techniques such as Naïve Bayes, K-NN (K-Nearest Neighbour Algorithm), Decision Tree, Random Forest and Decision Stumps to observe the accuracy result between RapidMiner and WEKA for dataset SMS Spam UCI Machine Learning repository.

  9. Imbalanced Learning for Functional State Assessment

    NASA Technical Reports Server (NTRS)

    Li, Feng; McKenzie, Frederick; Li, Jiang; Zhang, Guangfan; Xu, Roger; Richey, Carl; Schnell, Tom

    2011-01-01

    This paper presents results of several imbalanced learning techniques applied to operator functional state assessment where the data is highly imbalanced, i.e., some function states (majority classes) have much more training samples than other states (minority classes). Conventional machine learning techniques usually tend to classify all data samples into majority classes and perform poorly for minority classes. In this study, we implemented five imbalanced learning techniques, including random undersampling, random over-sampling, synthetic minority over-sampling technique (SMOTE), borderline-SMOTE and adaptive synthetic sampling (ADASYN) to solve this problem. Experimental results on a benchmark driving lest dataset show thai accuracies for minority classes could be improved dramatically with a cost of slight performance degradations for majority classes,

  10. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias

    With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by firstmore » layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.« less

  11. Automatic vetting of planet candidates from ground based surveys: Machine learning with NGTS

    NASA Astrophysics Data System (ADS)

    Armstrong, David J.; Günther, Maximilian N.; McCormac, James; Smith, Alexis M. S.; Bayliss, Daniel; Bouchy, François; Burleigh, Matthew R.; Casewell, Sarah; Eigmüller, Philipp; Gillen, Edward; Goad, Michael R.; Hodgkin, Simon T.; Jenkins, James S.; Louden, Tom; Metrailler, Lionel; Pollacco, Don; Poppenhaeger, Katja; Queloz, Didier; Raynard, Liam; Rauer, Heike; Udry, Stéphane; Walker, Simon R.; Watson, Christopher A.; West, Richard G.; Wheatley, Peter J.

    2018-05-01

    State of the art exoplanet transit surveys are producing ever increasing quantities of data. To make the best use of this resource, in detecting interesting planetary systems or in determining accurate planetary population statistics, requires new automated methods. Here we describe a machine learning algorithm that forms an integral part of the pipeline for the NGTS transit survey, demonstrating the efficacy of machine learning in selecting planetary candidates from multi-night ground based survey data. Our method uses a combination of random forests and self-organising-maps to rank planetary candidates, achieving an AUC score of 97.6% in ranking 12368 injected planets against 27496 false positives in the NGTS data. We build on past examples by using injected transit signals to form a training set, a necessary development for applying similar methods to upcoming surveys. We also make the autovet code used to implement the algorithm publicly accessible. autovet is designed to perform machine learned vetting of planetary candidates, and can utilise a variety of methods. The apparent robustness of machine learning techniques, whether on space-based or the qualitatively different ground-based data, highlights their importance to future surveys such as TESS and PLATO and the need to better understand their advantages and pitfalls in an exoplanetary context.

  12. Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

    PubMed Central

    Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M.; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John

    2016-01-01

    Objective: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient’s status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral. PMID:27257386

  13. Some history and use of the random positioning machine, RPM, in gravity related research

    NASA Astrophysics Data System (ADS)

    van Loon, Jack J. W. A.

    The first experiments using machines and instruments to manipulate gravity and thus learn about its impact to this force onto living systems were performed by Sir Thomas Andrew Knight in 1806, exactly two centuries ago. What have we learned from these experiments and in particular what have we learned about the use of instruments to reveal the impact of gravity and rotation on plants and other living systems? In this essay I want to go into the use of instruments in gravity related research with emphases on the Random Positioning Machine, RPM. Going from water wheel via clinostat to RPM, we will address the usefulness and possible working principles of these hypergravity and mostly called microgravity, or better, micro-weight simulation techniques.

  14. Machine learning approaches for estimation of prediction interval for the model output.

    PubMed

    Shrestha, Durga L; Solomatine, Dimitri P

    2006-03-01

    A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.

  15. Incremental Support Vector Machine Framework for Visual Sensor Networks

    NASA Astrophysics Data System (ADS)

    Awad, Mariette; Jiang, Xianhua; Motai, Yuichi

    2006-12-01

    Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM) technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM) formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.

  16. Summary of vulnerability related technologies based on machine learning

    NASA Astrophysics Data System (ADS)

    Zhao, Lei; Chen, Zhihao; Jia, Qiong

    2018-04-01

    As the scale of information system increases by an order of magnitude, the complexity of system software is getting higher. The vulnerability interaction from design, development and deployment to implementation stages greatly increases the risk of the entire information system being attacked successfully. Considering the limitations and lags of the existing mainstream security vulnerability detection techniques, this paper summarizes the development and current status of related technologies based on the machine learning methods applied to deal with massive and irregular data, and handling security vulnerabilities.

  17. Collective behaviour across animal species

    PubMed Central

    DeLellis, Pietro; Polverino, Giovanni; Ustuner, Gozde; Abaid, Nicole; Macrì, Simone; Bollt, Erik M.; Porfiri, Maurizio

    2014-01-01

    We posit a new geometric perspective to define, detect, and classify inherent patterns of collective behaviour across a variety of animal species. We show that machine learning techniques, and specifically the isometric mapping algorithm, allow the identification and interpretation of different types of collective behaviour in five social animal species. These results offer a first glimpse at the transformative potential of machine learning for ethology, similar to its impact on robotics, where it enabled robots to recognize objects and navigate the environment. PMID:24430561

  18. Neural Decoder for Topological Codes

    NASA Astrophysics Data System (ADS)

    Torlai, Giacomo; Melko, Roger G.

    2017-07-01

    We present an algorithm for error correction in topological codes that exploits modern machine learning techniques. Our decoder is constructed from a stochastic neural network called a Boltzmann machine, of the type extensively used in deep learning. We provide a general prescription for the training of the network and a decoding strategy that is applicable to a wide variety of stabilizer codes with very little specialization. We demonstrate the neural decoder numerically on the well-known two-dimensional toric code with phase-flip errors.

  19. Metaheuristic Algorithms for Convolution Neural Network

    PubMed Central

    Fanany, Mohamad Ivan; Arymurthy, Aniati Murni

    2016-01-01

    A typical modern optimization technique is usually either heuristic or metaheuristic. This technique has managed to solve some optimization problems in the research area of science, engineering, and industry. However, implementation strategy of metaheuristic for accuracy improvement on convolution neural networks (CNN), a famous deep learning method, is still rarely investigated. Deep learning relates to a type of machine learning technique, where its aim is to move closer to the goal of artificial intelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human. In this paper, we propose the implementation strategy of three popular metaheuristic approaches, that is, simulated annealing, differential evolution, and harmony search, to optimize CNN. The performances of these metaheuristic methods in optimizing CNN on classifying MNIST and CIFAR dataset were evaluated and compared. Furthermore, the proposed methods are also compared with the original CNN. Although the proposed methods show an increase in the computation time, their accuracy has also been improved (up to 7.14 percent). PMID:27375738

  20. Metaheuristic Algorithms for Convolution Neural Network.

    PubMed

    Rere, L M Rasdi; Fanany, Mohamad Ivan; Arymurthy, Aniati Murni

    2016-01-01

    A typical modern optimization technique is usually either heuristic or metaheuristic. This technique has managed to solve some optimization problems in the research area of science, engineering, and industry. However, implementation strategy of metaheuristic for accuracy improvement on convolution neural networks (CNN), a famous deep learning method, is still rarely investigated. Deep learning relates to a type of machine learning technique, where its aim is to move closer to the goal of artificial intelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human. In this paper, we propose the implementation strategy of three popular metaheuristic approaches, that is, simulated annealing, differential evolution, and harmony search, to optimize CNN. The performances of these metaheuristic methods in optimizing CNN on classifying MNIST and CIFAR dataset were evaluated and compared. Furthermore, the proposed methods are also compared with the original CNN. Although the proposed methods show an increase in the computation time, their accuracy has also been improved (up to 7.14 percent).

  1. Modeling Gas and Gas Hydrate Accumulation in Marine Sediments Using a K-Nearest Neighbor Machine-Learning Technique

    NASA Astrophysics Data System (ADS)

    Wood, W. T.; Runyan, T. E.; Palmsten, M.; Dale, J.; Crawford, C.

    2016-12-01

    Natural Gas (primarily methane) and gas hydrate accumulations require certain bio-geochemical, as well as physical conditions, some of which are poorly sampled and/or poorly understood. We exploit recent advances in the prediction of seafloor porosity and heat flux via machine learning techniques (e.g. Random forests and Bayesian networks) to predict the occurrence of gas and subsequently gas hydrate in marine sediments. The prediction (actually guided interpolation) of key parameters we use in this study is a K-nearest neighbor technique. KNN requires only minimal pre-processing of the data and predictors, and requires minimal run-time input so the results are almost entirely data-driven. Specifically we use new estimates of sedimentation rate and sediment type, along with recently derived compaction modeling to estimate profiles of porosity and age. We combined the compaction with seafloor heat flux to estimate temperature with depth and geologic age, which, with estimates of organic carbon, and models of methanogenesis yield limits on the production of methane. Results include geospatial predictions of gas (and gas hydrate) accumulations, with quantitative estimates of uncertainty. The Generic Earth Modeling System (GEMS) we have developed to derive the machine learning estimates is modular and easily updated with new algorithms or data.

  2. Relevance Vector Machine Learning for Neonate Pain Intensity Assessment Using Digital Imaging

    PubMed Central

    Gholami, Behnood; Tannenbaum, Allen R.

    2011-01-01

    Pain assessment in patients who are unable to verbally communicate is a challenging problem. The fundamental limitations in pain assessment in neonates stem from subjective assessment criteria, rather than quantifiable and measurable data. This often results in poor quality and inconsistent treatment of patient pain management. Recent advancements in pattern recognition techniques using relevance vector machine (RVM) learning techniques can assist medical staff in assessing pain by constantly monitoring the patient and providing the clinician with quantifiable data for pain management. The RVM classification technique is a Bayesian extension of the support vector machine (SVM) algorithm, which achieves comparable performance to SVM while providing posterior probabilities for class memberships and a sparser model. If classes represent “pure” facial expressions (i.e., extreme expressions that an observer can identify with a high degree of confidence), then the posterior probability of the membership of some intermediate facial expression to a class can provide an estimate of the intensity of such an expression. In this paper, we use the RVM classification technique to distinguish pain from nonpain in neonates as well as assess their pain intensity levels. We also correlate our results with the pain intensity assessed by expert and nonexpert human examiners. PMID:20172803

  3. A Framework for Structuring Learning Assessment in a Online Educational Game: Experiment Centered Design

    ERIC Educational Resources Information Center

    Conrad, Shawn; Clarke-Midura, Jody; Klopfer, Eric

    2014-01-01

    Educational games offer an opportunity to engage and inspire students to take interest in science, technology, engineering, and mathematical (STEM) subjects. Unobtrusive learning assessment techniques coupled with machine learning algorithms can be utilized to record students' in-game actions and formulate a model of the students' knowledge…

  4. Kernel Methods for Mining Instance Data in Ontologies

    NASA Astrophysics Data System (ADS)

    Bloehdorn, Stephan; Sure, York

    The amount of ontologies and meta data available on the Web is constantly growing. The successful application of machine learning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machine learning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.

  5. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-10-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.

  6. PredPsych: A toolbox for predictive machine learning-based approach in experimental psychology research.

    PubMed

    Koul, Atesh; Becchio, Cristina; Cavallo, Andrea

    2017-12-12

    Recent years have seen an increased interest in machine learning-based predictive methods for analyzing quantitative behavioral data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible implementation. The aim of current work was to build an open-source R toolbox - "PredPsych" - that could make these methods readily available to all psychologists. PredPsych is a user-friendly, R toolbox based on machine-learning predictive algorithms. In this paper, we present the framework of PredPsych via the analysis of a recently published multiple-subject motion capture dataset. In addition, we discuss examples of possible research questions that can be addressed with the machine-learning algorithms implemented in PredPsych and cannot be easily addressed with univariate statistical analysis. We anticipate that PredPsych will be of use to researchers with limited programming experience not only in the field of psychology, but also in that of clinical neuroscience, enabling computational assessment of putative bio-behavioral markers for both prognosis and diagnosis.

  7. Refining Markov state models for conformational dynamics using ensemble-averaged data and time-series trajectories

    NASA Astrophysics Data System (ADS)

    Matsunaga, Y.; Sugita, Y.

    2018-06-01

    A data-driven modeling scheme is proposed for conformational dynamics of biomolecules based on molecular dynamics (MD) simulations and experimental measurements. In this scheme, an initial Markov State Model (MSM) is constructed from MD simulation trajectories, and then, the MSM parameters are refined using experimental measurements through machine learning techniques. The second step can reduce the bias of MD simulation results due to inaccurate force-field parameters. Either time-series trajectories or ensemble-averaged data are available as a training data set in the scheme. Using a coarse-grained model of a dye-labeled polyproline-20, we compare the performance of machine learning estimations from the two types of training data sets. Machine learning from time-series data could provide the equilibrium populations of conformational states as well as their transition probabilities. It estimates hidden conformational states in more robust ways compared to that from ensemble-averaged data although there are limitations in estimating the transition probabilities between minor states. We discuss how to use the machine learning scheme for various experimental measurements including single-molecule time-series trajectories.

  8. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  9. (Machine-)Learning to analyze in vivo microscopy: Support vector machines.

    PubMed

    Wang, Michael F Z; Fernandez-Gonzalez, Rodrigo

    2017-11-01

    The development of new microscopy techniques for super-resolved, long-term monitoring of cellular and subcellular dynamics in living organisms is revealing new fundamental aspects of tissue development and repair. However, new microscopy approaches present several challenges. In addition to unprecedented requirements for data storage, the analysis of high resolution, time-lapse images is too complex to be done manually. Machine learning techniques are ideally suited for the (semi-)automated analysis of multidimensional image data. In particular, support vector machines (SVMs), have emerged as an efficient method to analyze microscopy images obtained from animals. Here, we discuss the use of SVMs to analyze in vivo microscopy data. We introduce the mathematical framework behind SVMs, and we describe the metrics used by SVMs and other machine learning approaches to classify image data. We discuss the influence of different SVM parameters in the context of an algorithm for cell segmentation and tracking. Finally, we describe how the application of SVMs has been critical to study protein localization in yeast screens, for lineage tracing in C. elegans, or to determine the developmental stage of Drosophila embryos to investigate gene expression dynamics. We propose that SVMs will become central tools in the analysis of the complex image data that novel microscopy modalities have made possible. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Passenger baggage object database (PBOD)

    NASA Astrophysics Data System (ADS)

    Gittinger, Jaxon M.; Suknot, April N.; Jimenez, Edward S.; Spaulding, Terry W.; Wenrich, Steve A.

    2018-04-01

    Detection of anomalies of interest in x-ray images is an ever-evolving problem that requires the rapid development of automatic detection algorithms. Automatic detection algorithms are developed using machine learning techniques, which would require developers to obtain the x-ray machine that was used to create the images being trained on, and compile all associated metadata for those images by hand. The Passenger Baggage Object Database (PBOD) and data acquisition application were designed and developed for acquiring and persisting 2-D and 3-D x-ray image data and associated metadata. PBOD was specifically created to capture simulated airline passenger "stream of commerce" luggage data, but could be applied to other areas of x-ray imaging to utilize machine-learning methods.

  11. Identifying predictive features in drug response using machine learning: opportunities and challenges.

    PubMed

    Vidyasagar, Mathukumalli

    2015-01-01

    This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.

  12. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.

    PubMed

    You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing

    2013-01-01

    Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.

  13. Big data integration for regional hydrostratigraphic mapping

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.

    2013-12-01

    Numerical models provide a way to evaluate groundwater systems, but determining the hydrostratigraphic units (HSUs) used in devising these models remains subjective, nonunique, and uncertain. A novel geophysical-hydrogeologic data integration scheme is proposed to constrain the estimation of continuous HSUs. First, machine-learning and multivariate statistical techniques are used to simultaneously integrate borehole hydrogeologic (lithology, hydraulic conductivity, aqueous field parameters, dissolved constituents) and geophysical (gamma, spontaneous potential, and resistivity) measurements. Second, airborne electromagnetic measurements are numerically inverted to obtain subsurface resistivity structure at randomly selected locations. Third, the machine-learning algorithm is trained using the borehole hydrostratigraphic units and inverted airborne resistivity profiles. The trained machine-learning algorithm is then used to estimate HSUs at independent resistivity profile locations. We demonstrate efficacy of the proposed approach to map the hydrostratigraphy of a heterogeneous surficial aquifer in northwestern Nebraska.

  14. Machine learning-based dual-energy CT parametric mapping

    NASA Astrophysics Data System (ADS)

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W.; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Helo, Rose Al; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C.; Rassouli, Negin; Gilkeson, Robert C.; Traughber, Bryan J.; Cheng, Chee-Wai; Muzic, Raymond F., Jr.

    2018-06-01

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Zeff), relative electron density (ρ e), mean excitation energy (I x ), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 s. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency.

  15. Machine learning-based dual-energy CT parametric mapping.

    PubMed

    Su, Kuan-Hao; Kuo, Jung-Wen; Jordan, David W; Van Hedent, Steven; Klahr, Paul; Wei, Zhouping; Al Helo, Rose; Liang, Fan; Qian, Pengjiang; Pereira, Gisele C; Rassouli, Negin; Gilkeson, Robert C; Traughber, Bryan J; Cheng, Chee-Wai; Muzic, Raymond F

    2018-06-08

    The aim is to develop and evaluate machine learning methods for generating quantitative parametric maps of effective atomic number (Z eff ), relative electron density (ρ e ), mean excitation energy (I x ), and relative stopping power (RSP) from clinical dual-energy CT data. The maps could be used for material identification and radiation dose calculation. Machine learning methods of historical centroid (HC), random forest (RF), and artificial neural networks (ANN) were used to learn the relationship between dual-energy CT input data and ideal output parametric maps calculated for phantoms from the known compositions of 13 tissue substitutes. After training and model selection steps, the machine learning predictors were used to generate parametric maps from independent phantom and patient input data. Precision and accuracy were evaluated using the ideal maps. This process was repeated for a range of exposure doses, and performance was compared to that of the clinically-used dual-energy, physics-based method which served as the reference. The machine learning methods generated more accurate and precise parametric maps than those obtained using the reference method. Their performance advantage was particularly evident when using data from the lowest exposure, one-fifth of a typical clinical abdomen CT acquisition. The RF method achieved the greatest accuracy. In comparison, the ANN method was only 1% less accurate but had much better computational efficiency than RF, being able to produce parametric maps in 15 s. Machine learning methods outperformed the reference method in terms of accuracy and noise tolerance when generating parametric maps, encouraging further exploration of the techniques. Among the methods we evaluated, ANN is the most suitable for clinical use due to its combination of accuracy, excellent low-noise performance, and computational efficiency.

  16. Quantum-state anomaly detection for arbitrary errors using a machine-learning technique

    NASA Astrophysics Data System (ADS)

    Hara, Satoshi; Ono, Takafumi; Okamoto, Ryo; Washio, Takashi; Takeuchi, Shigeki

    2016-10-01

    The accurate detection of small deviations in given density matrice is important for quantum information processing, which is a difficult task because of the intrinsic fluctuation in density matrices reconstructed using a limited number of experiments. We previously proposed a method for decoherence error detection using a machine-learning technique [S. Hara, T. Ono, R. Okamoto, T. Washio, and S. Takeuchi, Phys. Rev. A 89, 022104 (2014), 10.1103/PhysRevA.89.022104]. However, the previous method is not valid when the errors are just changes in phase. Here, we propose a method that is valid for arbitrary errors in density matrices. The performance of the proposed method is verified using both numerical simulation data and real experimental data.

  17. Application of machine learning techniques to analyse the effects of physical exercise in ventricular fibrillation.

    PubMed

    Caravaca, Juan; Soria-Olivas, Emilio; Bataller, Manuel; Serrano, Antonio J; Such-Miquel, Luis; Vila-Francés, Joan; Guerrero, Juan F

    2014-02-01

    This work presents the application of machine learning techniques to analyse the influence of physical exercise in the physiological properties of the heart, during ventricular fibrillation. To this end, different kinds of classifiers (linear and neural models) are used to classify between trained and sedentary rabbit hearts. The use of those classifiers in combination with a wrapper feature selection algorithm allows to extract knowledge about the most relevant features in the problem. The obtained results show that neural models outperform linear classifiers (better performance indices and a better dimensionality reduction). The most relevant features to describe the benefits of physical exercise are those related to myocardial heterogeneity, mean activation rate and activation complexity. © 2013 Published by Elsevier Ltd.

  18. A computational visual saliency model based on statistics and machine learning.

    PubMed

    Lin, Ru-Je; Lin, Wei-Song

    2014-08-01

    Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.

  19. Multimodal Neuroimaging: Basic Concepts and Classification of Neuropsychiatric Diseases.

    PubMed

    Tulay, Emine Elif; Metin, Barış; Tarhan, Nevzat; Arıkan, Mehmet Kemal

    2018-06-01

    Neuroimaging techniques are widely used in neuroscience to visualize neural activity, to improve our understanding of brain mechanisms, and to identify biomarkers-especially for psychiatric diseases; however, each neuroimaging technique has several limitations. These limitations led to the development of multimodal neuroimaging (MN), which combines data obtained from multiple neuroimaging techniques, such as electroencephalography, functional magnetic resonance imaging, and yields more detailed information about brain dynamics. There are several types of MN, including visual inspection, data integration, and data fusion. This literature review aimed to provide a brief summary and basic information about MN techniques (data fusion approaches in particular) and classification approaches. Data fusion approaches are generally categorized as asymmetric and symmetric. The present review focused exclusively on studies based on symmetric data fusion methods (data-driven methods), such as independent component analysis and principal component analysis. Machine learning techniques have recently been introduced for use in identifying diseases and biomarkers of disease. The machine learning technique most widely used by neuroscientists is classification-especially support vector machine classification. Several studies differentiated patients with psychiatric diseases and healthy controls with using combined datasets. The common conclusion among these studies is that the prediction of diseases increases when combining data via MN techniques; however, there remain a few challenges associated with MN, such as sample size. Perhaps in the future N-way fusion can be used to combine multiple neuroimaging techniques or nonimaging predictors (eg, cognitive ability) to overcome the limitations of MN.

  20. A Review of Current Machine Learning Methods Used for Cancer Recurrence Modeling and Prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hemphill, Geralyn M.

    Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type has become a necessity in cancer research. A major challenge in cancer management is the classification of patients into appropriate risk groups for better treatment and follow-up. Such risk assessment is critically important in order to optimize the patient’s health and the use of medical resources, as well as to avoid cancer recurrence. This paper focuses on the application of machine learning methods for predicting the likelihood of a recurrence of cancer. It is not meant to bemore » an extensive review of the literature on the subject of machine learning techniques for cancer recurrence modeling. Other recent papers have performed such a review, and I will rely heavily on the results and outcomes from these papers. The electronic databases that were used for this review include PubMed, Google, and Google Scholar. Query terms used include “cancer recurrence modeling”, “cancer recurrence and machine learning”, “cancer recurrence modeling and machine learning”, and “machine learning for cancer recurrence and prediction”. The most recent and most applicable papers to the topic of this review have been included in the references. It also includes a list of modeling and classification methods to predict cancer recurrence.« less

  1. Advanced Technologies in Safe and Efficient Operating Rooms

    DTIC Science & Technology

    2009-10-01

    focused on the video, not (initially) any other sensors and ii) tried to capture using machine learning techniques the ability of an expert surgeon to...plant (with humans playing the role of team leader) o a learning environment (where humans play the role of students ). As can be seen, this work...increased cognitive demands associated with the one-handed technique occur because the surgeon is providing instructions to the assistant performing

  2. Robust Library Building for Autonomous Classification of Downhole Geophysical Logs Using Gaussian Processes

    NASA Astrophysics Data System (ADS)

    Silversides, Katherine L.; Melkumyan, Arman

    2017-03-01

    Machine learning techniques such as Gaussian Processes can be used to identify stratigraphically important features in geophysical logs. The marker shales in the banded iron formation hosted iron ore deposits of the Hamersley Ranges, Western Australia, form distinctive signatures in the natural gamma logs. The identification of these marker shales is important for stratigraphic identification of unit boundaries for the geological modelling of the deposit. Machine learning techniques each have different unique properties that will impact the results. For Gaussian Processes (GPs), the output values are inclined towards the mean value, particularly when there is not sufficient information in the library. The impact that these inclinations have on the classification can vary depending on the parameter values selected by the user. Therefore, when applying machine learning techniques, care must be taken to fit the technique to the problem correctly. This study focuses on optimising the settings and choices for training a GPs system to identify a specific marker shale. We show that the final results converge even when different, but equally valid starting libraries are used for the training. To analyse the impact on feature identification, GP models were trained so that the output was inclined towards a positive, neutral or negative output. For this type of classification, the best results were when the pull was towards a negative output. We also show that the GP output can be adjusted by using a standard deviation coefficient that changes the balance between certainty and accuracy in the results.

  3. Machine Learning Techniques for Global Sensitivity Analysis in Climate Models

    NASA Astrophysics Data System (ADS)

    Safta, C.; Sargsyan, K.; Ricciuto, D. M.

    2017-12-01

    Climate models studies are not only challenged by the compute intensive nature of these models but also by the high-dimensionality of the input parameter space. In our previous work with the land model components (Sargsyan et al., 2014) we identified subsets of 10 to 20 parameters relevant for each QoI via Bayesian compressive sensing and variance-based decomposition. Nevertheless the algorithms were challenged by the nonlinear input-output dependencies for some of the relevant QoIs. In this work we will explore a combination of techniques to extract relevant parameters for each QoI and subsequently construct surrogate models with quantified uncertainty necessary to future developments, e.g. model calibration and prediction studies. In the first step, we will compare the skill of machine-learning models (e.g. neural networks, support vector machine) to identify the optimal number of classes in selected QoIs and construct robust multi-class classifiers that will partition the parameter space in regions with smooth input-output dependencies. These classifiers will be coupled with techniques aimed at building sparse and/or low-rank surrogate models tailored to each class. Specifically we will explore and compare sparse learning techniques with low-rank tensor decompositions. These models will be used to identify parameters that are important for each QoI. Surrogate accuracy requirements are higher for subsequent model calibration studies and we will ascertain the performance of this workflow for multi-site ALM simulation ensembles.

  4. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection

    PubMed Central

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M.

    2016-01-01

    Nondestructive Testing (NDT) assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms. PMID:27104533

  5. Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models

    NASA Astrophysics Data System (ADS)

    Benedetti, Marcello; Realpe-Gómez, John; Biswas, Rupak; Perdomo-Ortiz, Alejandro

    2017-10-01

    Mainstream machine-learning techniques such as deep learning and probabilistic programming rely heavily on sampling from generally intractable probability distributions. There is increasing interest in the potential advantages of using quantum computing technologies as sampling engines to speed up these tasks or to make them more effective. However, some pressing challenges in state-of-the-art quantum annealers have to be overcome before we can assess their actual performance. The sparse connectivity, resulting from the local interaction between quantum bits in physical hardware implementations, is considered the most severe limitation to the quality of constructing powerful generative unsupervised machine-learning models. Here, we use embedding techniques to add redundancy to data sets, allowing us to increase the modeling capacity of quantum annealers. We illustrate our findings by training hardware-embedded graphical models on a binarized data set of handwritten digits and two synthetic data sets in experiments with up to 940 quantum bits. Our model can be trained in quantum hardware without full knowledge of the effective parameters specifying the corresponding quantum Gibbs-like distribution; therefore, this approach avoids the need to infer the effective temperature at each iteration, speeding up learning; it also mitigates the effect of noise in the control parameters, making it robust to deviations from the reference Gibbs distribution. Our approach demonstrates the feasibility of using quantum annealers for implementing generative models, and it provides a suitable framework for benchmarking these quantum technologies on machine-learning-related tasks.

  6. Deep Learning for Computer Vision: A Brief Review

    PubMed Central

    Doulamis, Nikolaos; Doulamis, Anastasios; Protopapadakis, Eftychios

    2018-01-01

    Over the last years deep learning methods have been shown to outperform previous state-of-the-art machine learning techniques in several fields, with computer vision being one of the most prominent cases. This review paper provides a brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders. A brief account of their history, structure, advantages, and limitations is given, followed by a description of their applications in various computer vision tasks, such as object detection, face recognition, action and activity recognition, and human pose estimation. Finally, a brief overview is given of future directions in designing deep learning schemes for computer vision problems and the challenges involved therein. PMID:29487619

  7. Unintended consequences of machine learning in medicine?

    PubMed

    McDonald, Laura; Ramagopalan, Sreeram V; Cox, Andrew P; Oguz, Mustafa

    2017-01-01

    Machine learning (ML) has the potential to significantly aid medical practice. However, a recent article highlighted some negative consequences that may arise from using ML decision support in medicine. We argue here that whilst the concerns raised by the authors may be appropriate, they are not specific to ML, and thus the article may lead to an adverse perception about this technique in particular. Whilst ML is not without its limitations like any methodology, a balanced view is needed in order to not hamper its use in potentially enabling better patient care.

  8. Modeling Spanish Mood Choice in Belief Statements

    ERIC Educational Resources Information Center

    Robinson, Jason R.

    2013-01-01

    This work develops a computational methodology new to linguistics that empirically evaluates competing linguistic theories on Spanish verbal mood choice through the use of computational techniques to learn mood and other hidden linguistic features from Spanish belief statements found in corpora. The machine learned probabilistic linguistic models…

  9. Automatic Quality Inspection of Percussion Cap Mass Production by Means of 3D Machine Vision and Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Tellaeche, A.; Arana, R.; Ibarguren, A.; Martínez-Otzeta, J. M.

    The exhaustive quality control is becoming very important in the world's globalized market. One of these examples where quality control becomes critical is the percussion cap mass production. These elements must achieve a minimum tolerance deviation in their fabrication. This paper outlines a machine vision development using a 3D camera for the inspection of the whole production of percussion caps. This system presents multiple problems, such as metallic reflections in the percussion caps, high speed movement of the system and mechanical errors and irregularities in percussion cap placement. Due to these problems, it is impossible to solve the problem by traditional image processing methods, and hence, machine learning algorithms have been tested to provide a feasible classification of the possible errors present in the percussion caps.

  10. An implementation of support vector machine on sentiment classification of movie reviews

    NASA Astrophysics Data System (ADS)

    Yulietha, I. M.; Faraby, S. A.; Adiwijaya; Widyaningtyas, W. C.

    2018-03-01

    With technological advances, all information about movie is available on the internet. If the information is processed properly, it will get the quality of the information. This research proposes to the classify sentiments on movie review documents. This research uses Support Vector Machine (SVM) method because it can classify high dimensional data in accordance with the data used in this research in the form of text. Support Vector Machine is a popular machine learning technique for text classification because it can classify by learning from a collection of documents that have been classified previously and can provide good result. Based on number of datasets, the 90-10 composition has the best result that is 85.6%. Based on SVM kernel, kernel linear with constant 1 has the best result that is 84.9%

  11. Machine learning, medical diagnosis, and biomedical engineering research - commentary.

    PubMed

    Foster, Kenneth R; Koprowski, Robert; Skufca, Joseph D

    2014-07-05

    A large number of papers are appearing in the biomedical engineering literature that describe the use of machine learning techniques to develop classifiers for detection or diagnosis of disease. However, the usefulness of this approach in developing clinically validated diagnostic techniques so far has been limited and the methods are prone to overfitting and other problems which may not be immediately apparent to the investigators. This commentary is intended to help sensitize investigators as well as readers and reviewers of papers to some potential pitfalls in the development of classifiers, and suggests steps that researchers can take to help avoid these problems. Building classifiers should be viewed not simply as an add-on statistical analysis, but as part and parcel of the experimental process. Validation of classifiers for diagnostic applications should be considered as part of a much larger process of establishing the clinical validity of the diagnostic technique.

  12. Machine-Learning Techniques Applied to Antibacterial Drug Discovery

    PubMed Central

    Durrant, Jacob D.; Amaro, Rommie E.

    2014-01-01

    The emergence of drug-resistant bacteria threatens to catapult humanity back to the pre-antibiotic era. Even now, multi-drug-resistant bacterial infections annually result in millions of hospital days, billions in healthcare costs, and, most importantly, tens of thousands of lives lost. As many pharmaceutical companies have abandoned antibiotic development in search of more lucrative therapeutics, academic researchers are uniquely positioned to fill the resulting vacuum. Traditional high-throughput screens and lead-optimization efforts are expensive and labor intensive. Computer-aided drug discovery techniques, which are cheaper and faster, can accelerate the identification of novel antibiotics in an academic setting, leading to improved hit rates and faster transitions to pre-clinical and clinical testing. The current review describes two machine-learning techniques, neural networks and decision trees, that have been used to identify experimentally validated antibiotics. We conclude by describing the future directions of this exciting field. PMID:25521642

  13. Statistical Learning Analysis in Neuroscience: Aiming for Transparency

    PubMed Central

    Hanke, Michael; Halchenko, Yaroslav O.; Haxby, James V.; Pollmann, Stefan

    2009-01-01

    Encouraged by a rise of reciprocal interest between the machine learning and neuroscience communities, several recent studies have demonstrated the explanatory power of statistical learning techniques for the analysis of neural data. In order to facilitate a wider adoption of these methods, neuroscientific research needs to ensure a maximum of transparency to allow for comprehensive evaluation of the employed procedures. We argue that such transparency requires “neuroscience-aware” technology for the performance of multivariate pattern analyses of neural data that can be documented in a comprehensive, yet comprehensible way. Recently, we introduced PyMVPA, a specialized Python framework for machine learning based data analysis that addresses this demand. Here, we review its features and applicability to various neural data modalities. PMID:20582270

  14. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    PubMed Central

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571

  15. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

  16. An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.

    PubMed

    Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein

    2017-12-22

    The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.

  17. Estimation of Alpine Skier Posture Using Machine Learning Techniques

    PubMed Central

    Nemec, Bojan; Petrič, Tadej; Babič, Jan; Supej, Matej

    2014-01-01

    High precision Global Navigation Satellite System (GNSS) measurements are becoming more and more popular in alpine skiing due to the relatively undemanding setup and excellent performance. However, GNSS provides only single-point measurements that are defined with the antenna placed typically behind the skier's neck. A key issue is how to estimate other more relevant parameters of the skier's body, like the center of mass (COM) and ski trajectories. Previously, these parameters were estimated by modeling the skier's body with an inverted-pendulum model that oversimplified the skier's body. In this study, we propose two machine learning methods that overcome this shortcoming and estimate COM and skis trajectories based on a more faithful approximation of the skier's body with nine degrees-of-freedom. The first method utilizes a well-established approach of artificial neural networks, while the second method is based on a state-of-the-art statistical generalization method. Both methods were evaluated using the reference measurements obtained on a typical giant slalom course and compared with the inverted-pendulum method. Our results outperform the results of commonly used inverted-pendulum methods and demonstrate the applicability of machine learning techniques in biomechanical measurements of alpine skiing. PMID:25313492

  18. Feasibility study of stain-free classification of cell apoptosis based on diffraction imaging flow cytometry and supervised machine learning techniques.

    PubMed

    Feng, Jingwen; Feng, Tong; Yang, Chengwen; Wang, Wei; Sa, Yu; Feng, Yuanming

    2018-06-01

    This study was to explore the feasibility of prediction and classification of cells in different stages of apoptosis with a stain-free method based on diffraction images and supervised machine learning. Apoptosis was induced in human chronic myelogenous leukemia K562 cells by cis-platinum (DDP). A newly developed technique of polarization diffraction imaging flow cytometry (p-DIFC) was performed to acquire diffraction images of the cells in three different statuses (viable, early apoptotic and late apoptotic/necrotic) after cell separation through fluorescence activated cell sorting with Annexin V-PE and SYTOX® Green double staining. The texture features of the diffraction images were extracted with in-house software based on the Gray-level co-occurrence matrix algorithm to generate datasets for cell classification with supervised machine learning method. Therefore, this new method has been verified in hydrogen peroxide induced apoptosis model of HL-60. Results show that accuracy of higher than 90% was achieved respectively in independent test datasets from each cell type based on logistic regression with ridge estimators, which indicated that p-DIFC system has a great potential in predicting and classifying cells in different stages of apoptosis.

  19. Investigating Mesoscale Convective Systems and their Predictability Using Machine Learning

    NASA Astrophysics Data System (ADS)

    Daher, H.; Duffy, D.; Bowen, M. K.

    2016-12-01

    A mesoscale convective system (MCS) is a thunderstorm region that lasts several hours long and forms near weather fronts and can often develop into tornadoes. Here we seek to answer the question of whether these tornadoes are "predictable" by looking for a defining characteristic(s) separating MCSs that evolve into tornadoes versus those that do not. Using NASA's Modern Era Retrospective-analysis for Research and Applications 2 reanalysis data (M2R12K), we apply several state of the art machine learning techniques to investigate this question. The spatial region examined in this experiment is Tornado Alley in the United States over the peak tornado months. A database containing select variables from M2R12K is created using PostgreSQL. This database is then analyzed using machine learning methods such as Symbolic Aggregate approXimation (SAX) and DBSCAN (an unsupervised density-based data clustering algorithm). The incentive behind using these methods is to mathematically define a MCS so that association rule mining techniques can be used to uncover some sort of signal or teleconnection that will help us forecast which MCSs will result in tornadoes and therefore give society more time to prepare and in turn reduce casualties and destruction.

  20. Downscaling Coarse Scale Microwave Soil Moisture Product using Machine Learning

    NASA Astrophysics Data System (ADS)

    Abbaszadeh, P.; Moradkhani, H.; Yan, H.

    2016-12-01

    Soil moisture (SM) is a key variable in partitioning and examining the global water-energy cycle, agricultural planning, and water resource management. It is also strongly coupled with climate change, playing an important role in weather forecasting and drought monitoring and prediction, flood modeling and irrigation management. Although satellite retrievals can provide an unprecedented information of soil moisture at a global-scale, the products might be inadequate for basin scale study or regional assessment. To improve the spatial resolution of SM, this work presents a novel approach based on Machine Learning (ML) technique that allows for downscaling of the satellite soil moisture to fine resolution. For this purpose, the SMAP L-band radiometer SM products were used and conditioned on the Variable Infiltration Capacity (VIC) model prediction to describe the relationship between the coarse and fine scale soil moisture data. The proposed downscaling approach was applied to a western US basin and the products were compared against the available SM data from in-situ gauge stations. The obtained results indicated a great potential of the machine learning technique to derive the fine resolution soil moisture information that is currently used for land data assimilation applications.

  1. Machine Learning and Data Mining for Comprehensive Test Ban Treaty Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Russell, S; Vaidya, S

    2009-07-30

    The Comprehensive Test Ban Treaty (CTBT) is gaining renewed attention in light of growing worldwide interest in mitigating risks of nuclear weapons proliferation and testing. Since the International Monitoring System (IMS) installed the first suite of sensors in the late 1990's, the IMS network has steadily progressed, providing valuable support for event diagnostics. This progress was highlighted at the recent International Scientific Studies (ISS) Conference in Vienna in June 2009, where scientists and domain experts met with policy makers to assess the current status of the CTBT Verification System. A strategic theme within the ISS Conference centered on exploring opportunitiesmore » for further enhancing the detection and localization accuracy of low magnitude events by drawing upon modern tools and techniques for machine learning and large-scale data analysis. Several promising approaches for data exploitation were presented at the Conference. These are summarized in a companion report. In this paper, we introduce essential concepts in machine learning and assess techniques which could provide both incremental and comprehensive value for event discrimination by increasing the accuracy of the final data product, refining On-Site-Inspection (OSI) conclusions, and potentially reducing the cost of future network operations.« less

  2. Clustering Single-Cell Expression Data Using Random Forest Graphs.

    PubMed

    Pouyan, Maziyar Baran; Nourani, Mehrdad

    2017-07-01

    Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.

  3. Development of techniques to enhance man/machine communication

    NASA Technical Reports Server (NTRS)

    Targ, R.; Cole, P.; Puthoff, H.

    1974-01-01

    A four-state random stimulus generator, considered to function as an ESP teaching machine was used to investigate an approach to facilitating interactions between man and machines. A subject tries to guess in which of four states the machine is. The machine offers the user feedback and reinforcement as to the correctness of his choice. Using this machine, 148 volunteer subjects were screened under various protocols. Several whose learning slope and/or mean score departed significantly from chance expectation were identified. Direct physiological evidence of perception of remote stimuli not presented to any known sense of the percipient using electroencephalographic (EEG) output when a light was flashed in a distant room was also studied.

  4. A machine learning approach to computer-aided molecular design

    NASA Astrophysics Data System (ADS)

    Bolis, Giorgio; Di Pace, Luigi; Fabrocini, Filippo

    1991-12-01

    Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one — the specialization step — the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase — the generalization step — the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process.

  5. Effective Learning of Probabilistic Models for Clinical Predictions from Longitudinal Data

    ERIC Educational Resources Information Center

    Yang, Shuo

    2017-01-01

    With the expeditious advancement of information technologies, health-related data presented unprecedented potentials for medical and health discoveries but at the same time significant challenges for machine learning techniques both in terms of size and complexity. Those challenges include: the structured data with various storage formats and…

  6. Use of machine-learning classifiers to predict requests for preoperative acute pain service consultation.

    PubMed

    Tighe, Patrick J; Lucas, Stephen D; Edwards, David A; Boezaart, André P; Aytug, Haldun; Bihorac, Azra

    2012-10-01

      The purpose of this project was to determine whether machine-learning classifiers could predict which patients would require a preoperative acute pain service (APS) consultation.   Retrospective cohort.   University teaching hospital.   The records of 9,860 surgical patients posted between January 1 and June 30, 2010 were reviewed.   Request for APS consultation. A cohort of machine-learning classifiers was compared according to its ability or inability to classify surgical cases as requiring a request for a preoperative APS consultation. Classifiers were then optimized utilizing ensemble techniques. Computational efficiency was measured with the central processing unit processing times required for model training. Classifiers were tested using the full feature set, as well as the reduced feature set that was optimized using a merit-based dimensional reduction strategy.   Machine-learning classifiers correctly predicted preoperative requests for APS consultations in 92.3% (95% confidence intervals [CI], 91.8-92.8) of all surgical cases. Bayesian methods yielded the highest area under the receiver operating curve (0.87, 95% CI 0.84-0.89) and lowest training times (0.0018 seconds, 95% CI, 0.0017-0.0019 for the NaiveBayesUpdateable algorithm). An ensemble of high-performing machine-learning classifiers did not yield a higher area under the receiver operating curve than its component classifiers. Dimensional reduction decreased the computational requirements for multiple classifiers, but did not adversely affect classification performance.   Using historical data, machine-learning classifiers can predict which surgical cases should prompt a preoperative request for an APS consultation. Dimensional reduction improved computational efficiency and preserved predictive performance. Wiley Periodicals, Inc.

  7. Making Individual Prognoses in Psychiatry Using Neuroimaging and Machine Learning.

    PubMed

    Janssen, Ronald J; Mourão-Miranda, Janaina; Schnack, Hugo G

    2018-04-22

    Psychiatric prognosis is a difficult problem. Making a prognosis requires looking far into the future, as opposed to making a diagnosis, which is concerned with the current state. During the follow-up period, many factors will influence the course of the disease. Combined with the usually scarcer longitudinal data and the variability in the definition of outcomes/transition, this makes prognostic predictions a challenging endeavor. Employing neuroimaging data in this endeavor introduces the additional hurdle of high dimensionality. Machine-learning techniques are especially suited to tackle this challenging problem. This review starts with a brief introduction to machine learning in the context of its application to clinical neuroimaging data. We highlight a few issues that are especially relevant for prediction of outcome and transition using neuroimaging. We then review the literature that discusses the application of machine learning for this purpose. Critical examination of the studies and their results with respect to the relevant issues revealed the following: 1) there is growing evidence for the prognostic capability of machine-learning-based models using neuroimaging; and 2) reported accuracies may be too optimistic owing to small sample sizes and the lack of independent test samples. Finally, we discuss options to improve the reliability of (prognostic) prediction models. These include new methodologies and multimodal modeling. Paramount, however, is our conclusion that future work will need to provide properly (cross-)validated accuracy estimates of models trained on sufficiently large datasets. Nevertheless, with the technological advances enabling acquisition of large databases of patients and healthy subjects, machine learning represents a powerful tool in the search for psychiatric biomarkers. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  8. Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

    PubMed

    Howard, Rebecca; Rattray, Magnus; Prosperi, Mattia; Custovic, Adnan

    2015-07-01

    Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as 'asthma endotypes'. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies.

  9. Graph Representations of Flow and Transport in Fracture Networks using Machine Learning

    NASA Astrophysics Data System (ADS)

    Srinivasan, G.; Viswanathan, H. S.; Karra, S.; O'Malley, D.; Godinez, H. C.; Hagberg, A.; Osthus, D.; Mohd-Yusof, J.

    2017-12-01

    Flow and transport of fluids through fractured systems is governed by the properties and interactions at the micro-scale. Retaining information about the micro-structure such as fracture length, orientation, aperture and connectivity in mesh-based computational models results in solving for millions to billions of degrees of freedom and quickly renders the problem computationally intractable. Our approach depicts fracture networks graphically, by mapping fractures to nodes and intersections to edges, thereby greatly reducing computational burden. Additionally, we use machine learning techniques to build simulators on the graph representation, trained on data from the mesh-based high fidelity simulations to speed up computation by orders of magnitude. We demonstrate our methodology on ensembles of discrete fracture networks, dividing up the data into training and validation sets. Our machine learned graph-based solvers result in over 3 orders of magnitude speedup without any significant sacrifice in accuracy.

  10. Machine Learning to Differentiate Between Positive and Negative Emotions Using Pupil Diameter

    PubMed Central

    Babiker, Areej; Faye, Ibrahima; Prehn, Kristin; Malik, Aamir

    2015-01-01

    Pupil diameter (PD) has been suggested as a reliable parameter for identifying an individual’s emotional state. In this paper, we introduce a learning machine technique to detect and differentiate between positive and negative emotions. We presented 30 participants with positive and negative sound stimuli and recorded pupillary responses. The results showed a significant increase in pupil dilation during the processing of negative and positive sound stimuli with greater increase for negative stimuli. We also found a more sustained dilation for negative compared to positive stimuli at the end of the trial, which was utilized to differentiate between positive and negative emotions using a machine learning approach which gave an accuracy of 96.5% with sensitivity of 97.93% and specificity of 98%. The obtained results were validated using another dataset designed for a different study and which was recorded while 30 participants processed word pairs with positive and negative emotions. PMID:26733912

  11. Spike sorting based upon machine learning algorithms (SOMA).

    PubMed

    Horton, P M; Nicol, A U; Kendrick, K M; Feng, J F

    2007-02-15

    We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological data and automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensions to a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known number of clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance of misclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealing separable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired using PCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have applied it to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. The SOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.

  12. Using decision-tree classifier systems to extract knowledge from databases

    NASA Technical Reports Server (NTRS)

    St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.

    1990-01-01

    One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.

  13. Machine Learning for Education: Learning to Teach

    DTIC Science & Technology

    2016-12-01

    such as commercial aviation, healthcare, and military operations. In the context of military applications, serious gaming – the training warfighters...problems. Playing these games not only allowed the warfighter to discover and learn new tactics, techniques, and procedures, but also allowed the...collecting information across relevant sample sizes have motivated a data-driven, game - based simulation approach. For example, industry and academia alike

  14. Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and Progressive Supranuclear Palsy.

    PubMed

    Salvatore, C; Cerasa, A; Castiglioni, I; Gallivanone, F; Augimeri, A; Lopez, M; Arabia, G; Morelli, M; Gilardi, M C; Quattrone, A

    2014-01-30

    Supervised machine learning has been proposed as a revolutionary approach for identifying sensitive medical image biomarkers (or combination of them) allowing for automatic diagnosis of individual subjects. The aim of this work was to assess the feasibility of a supervised machine learning algorithm for the assisted diagnosis of patients with clinically diagnosed Parkinson's disease (PD) and Progressive Supranuclear Palsy (PSP). Morphological T1-weighted Magnetic Resonance Images (MRIs) of PD patients (28), PSP patients (28) and healthy control subjects (28) were used by a supervised machine learning algorithm based on the combination of Principal Components Analysis as feature extraction technique and on Support Vector Machines as classification algorithm. The algorithm was able to obtain voxel-based morphological biomarkers of PD and PSP. The algorithm allowed individual diagnosis of PD versus controls, PSP versus controls and PSP versus PD with an Accuracy, Specificity and Sensitivity>90%. Voxels influencing classification between PD and PSP patients involved midbrain, pons, corpus callosum and thalamus, four critical regions known to be strongly involved in the pathophysiological mechanisms of PSP. Classification accuracy of individual PSP patients was consistent with previous manual morphological metrics and with other supervised machine learning application to MRI data, whereas accuracy in the detection of individual PD patients was significantly higher with our classification method. The algorithm provides excellent discrimination of PD patients from PSP patients at an individual level, thus encouraging the application of computer-based diagnosis in clinical practice. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. Auto-SEIA: simultaneous optimization of image processing and machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Negro Maggio, Valentina; Iocchi, Luca

    2015-02-01

    Object classification from images is an important task for machine vision and it is a crucial ingredient for many computer vision applications, ranging from security and surveillance to marketing. Image based object classification techniques properly integrate image processing and machine learning (i.e., classification) procedures. In this paper we present a system for automatic simultaneous optimization of algorithms and parameters for object classification from images. More specifically, the proposed system is able to process a dataset of labelled images and to return a best configuration of image processing and classification algorithms and of their parameters with respect to the accuracy of classification. Experiments with real public datasets are used to demonstrate the effectiveness of the developed system.

  16. Salient Feature Identification and Analysis using Kernel-Based Classification Techniques for Synthetic Aperture Radar Automatic Target Recognition

    DTIC Science & Technology

    2014-03-27

    and machine learning for a range of research including such topics as medical imaging [10] and handwriting recognition [11]. The type of feature...1989. [11] C. Bahlmann, B. Haasdonk, and H. Burkhardt, “Online handwriting recognition with support vector machines-a kernel approach,” in Eighth...International Workshop on Frontiers in Handwriting Recognition, pp. 49–54, IEEE, 2002. [12] C. Cortes and V. Vapnik, “Support-vector networks,” Machine

  17. Supervised machine learning for analysing spectra of exoplanetary atmospheres

    NASA Astrophysics Data System (ADS)

    Márquez-Neila, Pablo; Fisher, Chloe; Sznitman, Raphael; Heng, Kevin

    2018-06-01

    The use of machine learning is becoming ubiquitous in astronomy1-3, but remains rare in the study of the atmospheres of exoplanets. Given the spectrum of an exoplanetary atmosphere, a multi-parameter space is swept through in real time to find the best-fit model4-6. Known as atmospheric retrieval, this technique originates in the Earth and planetary sciences7. Such methods are very time-consuming, and by necessity there is a compromise between physical and chemical realism and computational feasibility. Machine learning has previously been used to determine which molecules to include in the model, but the retrieval itself was still performed using standard methods8. Here, we report an adaptation of the `random forest' method of supervised machine learning9,10, trained on a precomputed grid of atmospheric models, which retrieves full posterior distributions of the abundances of molecules and the cloud opacity. The use of a precomputed grid allows a large part of the computational burden to be shifted offline. We demonstrate our technique on a transmission spectrum of the hot gas-giant exoplanet WASP-12b using a five-parameter model (temperature, a constant cloud opacity and the volume mixing ratios or relative abundances of molecules of water, ammonia and hydrogen cyanide)11. We obtain results consistent with the standard nested-sampling retrieval method. We also estimate the sensitivity of the measured spectrum to the model parameters, and we are able to quantify the information content of the spectrum. Our method can be straightforwardly applied using more sophisticated atmospheric models to interpret an ensemble of spectra without having to retrain the random forest.

  18. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques

    PubMed Central

    Macyszyn, Luke; Akbari, Hamed; Pisapia, Jared M.; Da, Xiao; Attiah, Mark; Pigrish, Vadim; Bi, Yingtao; Pal, Sharmistha; Davuluri, Ramana V.; Roccograndi, Laura; Dahmane, Nadia; Martinez-Lage, Maria; Biros, George; Wolf, Ronald L.; Bilello, Michel; O'Rourke, Donald M.; Davatzikos, Christos

    2016-01-01

    Background MRI characteristics of brain gliomas have been used to predict clinical outcome and molecular tumor characteristics. However, previously reported imaging biomarkers have not been sufficiently accurate or reproducible to enter routine clinical practice and often rely on relatively simple MRI measures. The current study leverages advanced image analysis and machine learning algorithms to identify complex and reproducible imaging patterns predictive of overall survival and molecular subtype in glioblastoma (GB). Methods One hundred five patients with GB were first used to extract approximately 60 diverse features from preoperative multiparametric MRIs. These imaging features were used by a machine learning algorithm to derive imaging predictors of patient survival and molecular subtype. Cross-validation ensured generalizability of these predictors to new patients. Subsequently, the predictors were evaluated in a prospective cohort of 29 new patients. Results Survival curves yielded a hazard ratio of 10.64 for predicted long versus short survivors. The overall, 3-way (long/medium/short survival) accuracy in the prospective cohort approached 80%. Classification of patients into the 4 molecular subtypes of GB achieved 76% accuracy. Conclusions By employing machine learning techniques, we were able to demonstrate that imaging patterns are highly predictive of patient survival. Additionally, we found that GB subtypes have distinctive imaging phenotypes. These results reveal that when imaging markers related to infiltration, cell density, microvascularity, and blood–brain barrier compromise are integrated via advanced pattern analysis methods, they form very accurate predictive biomarkers. These predictive markers used solely preoperative images, hence they can significantly augment diagnosis and treatment of GB patients. PMID:26188015

  19. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

    PubMed

    Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif

    2017-01-01

    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

  20. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms.

    PubMed

    Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L

    2013-12-01

    The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  1. Neural Control of a Tracking Task via Attention-Gated Reinforcement Learning for Brain-Machine Interfaces.

    PubMed

    Wang, Yiwen; Wang, Fang; Xu, Kai; Zhang, Qiaosheng; Zhang, Shaomin; Zheng, Xiaoxiang

    2015-05-01

    Reinforcement learning (RL)-based brain machine interfaces (BMIs) enable the user to learn from the environment through interactions to complete the task without desired signals, which is promising for clinical applications. Previous studies exploited Q-learning techniques to discriminate neural states into simple directional actions providing the trial initial timing. However, the movements in BMI applications can be quite complicated, and the action timing explicitly shows the intention when to move. The rich actions and the corresponding neural states form a large state-action space, imposing generalization difficulty on Q-learning. In this paper, we propose to adopt attention-gated reinforcement learning (AGREL) as a new learning scheme for BMIs to adaptively decode high-dimensional neural activities into seven distinct movements (directional moves, holdings and resting) due to the efficient weight-updating. We apply AGREL on neural data recorded from M1 of a monkey to directly predict a seven-action set in a time sequence to reconstruct the trajectory of a center-out task. Compared to Q-learning techniques, AGREL could improve the target acquisition rate to 90.16% in average with faster convergence and more stability to follow neural activity over multiple days, indicating the potential to achieve better online decoding performance for more complicated BMI tasks.

  2. Automatic microseismic event picking via unsupervised machine learning

    NASA Astrophysics Data System (ADS)

    Chen, Yangkang

    2018-01-01

    Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

  3. A Machine Learning Framework for Plan Payment Risk Adjustment.

    PubMed

    Rose, Sherri

    2016-12-01

    To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.

  4. Big Data and Machine Learning in Plastic Surgery: A New Frontier in Surgical Innovation.

    PubMed

    Kanevsky, Jonathan; Corban, Jason; Gaster, Richard; Kanevsky, Ari; Lin, Samuel; Gilardino, Mirko

    2016-05-01

    Medical decision-making is increasingly based on quantifiable data. From the moment patients come into contact with the health care system, their entire medical history is recorded electronically. Whether a patient is in the operating room or on the hospital ward, technological advancement has facilitated the expedient and reliable measurement of clinically relevant health metrics, all in an effort to guide care and ensure the best possible clinical outcomes. However, as the volume and complexity of biomedical data grow, it becomes challenging to effectively process "big data" using conventional techniques. Physicians and scientists must be prepared to look beyond classic methods of data processing to extract clinically relevant information. The purpose of this article is to introduce the modern plastic surgeon to machine learning and computational interpretation of large data sets. What is machine learning? Machine learning, a subfield of artificial intelligence, can address clinically relevant problems in several domains of plastic surgery, including burn surgery; microsurgery; and craniofacial, peripheral nerve, and aesthetic surgery. This article provides a brief introduction to current research and suggests future projects that will allow plastic surgeons to explore this new frontier of surgical science.

  5. Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

    NASA Astrophysics Data System (ADS)

    Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

    2017-02-01

    Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.

  6. An immune-inspired semi-supervised algorithm for breast cancer diagnosis.

    PubMed

    Peng, Lingxi; Chen, Wenbin; Zhou, Wubai; Li, Fufang; Yang, Jin; Zhang, Jiandong

    2016-10-01

    Breast cancer is the most frequently and world widely diagnosed life-threatening cancer, which is the leading cause of cancer death among women. Early accurate diagnosis can be a big plus in treating breast cancer. Researchers have approached this problem using various data mining and machine learning techniques such as support vector machine, artificial neural network, etc. The computer immunology is also an intelligent method inspired by biological immune system, which has been successfully applied in pattern recognition, combination optimization, machine learning, etc. However, most of these diagnosis methods belong to a supervised diagnosis method. It is very expensive to obtain labeled data in biology and medicine. In this paper, we seamlessly integrate the state-of-the-art research on life science with artificial intelligence, and propose a semi-supervised learning algorithm to reduce the need for labeled data. We use two well-known benchmark breast cancer datasets in our study, which are acquired from the UCI machine learning repository. Extensive experiments are conducted and evaluated on those two datasets. Our experimental results demonstrate the effectiveness and efficiency of our proposed algorithm, which proves that our algorithm is a promising automatic diagnosis method for breast cancer. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  7. CD process control through machine learning

    NASA Astrophysics Data System (ADS)

    Utzny, Clemens

    2016-10-01

    For the specific requirements of the 14nm and 20nm site applications a new CD map approach was developed at the AMTC. This approach relies on a well established machine learning technique called recursive partitioning. Recursive partitioning is a powerful technique which creates a decision tree by successively testing whether the quantity of interest can be explained by one of the supplied covariates. The test performed is generally a statistical test with a pre-supplied significance level. Once the test indicates significant association between the variable of interest and a covariate a split performed at a threshold value which minimizes the variation within the newly attained groups. This partitioning is recurred until either no significant association can be detected or the resulting sub group size falls below a pre-supplied level.

  8. Analyzing Activity Behavior and Movement in a Naturalistic Environment using Smart Home Techniques

    PubMed Central

    Cook, Diane J.; Schmitter-Edgecombe, Maureen; Dawadi, Prafulla

    2015-01-01

    One of the many services that intelligent systems can provide is the ability to analyze the impact of different medical conditions on daily behavior. In this study we use smart home and wearable sensors to collect data while (n=84) older adults perform complex activities of daily living. We analyze the data using machine learning techniques and reveal that differences between healthy older adults and adults with Parkinson disease not only exist in their activity patterns, but that these differences can be automatically recognized. Our machine learning classifiers reach an accuracy of 0.97 with an AUC value of 0.97 in distinguishing these groups. Our permutation-based testing confirms that the sensor-based differences between these groups are statistically significant. PMID:26259225

  9. Analyzing Activity Behavior and Movement in a Naturalistic Environment Using Smart Home Techniques.

    PubMed

    Cook, Diane J; Schmitter-Edgecombe, Maureen; Dawadi, Prafulla

    2015-11-01

    One of the many services that intelligent systems can provide is the ability to analyze the impact of different medical conditions on daily behavior. In this study, we use smart home and wearable sensors to collect data, while ( n = 84) older adults perform complex activities of daily living. We analyze the data using machine learning techniques and reveal that differences between healthy older adults and adults with Parkinson disease not only exist in their activity patterns, but that these differences can be automatically recognized. Our machine learning classifiers reach an accuracy of 0.97 with an area under the ROC curve value of 0.97 in distinguishing these groups. Our permutation-based testing confirms that the sensor-based differences between these groups are statistically significant.

  10. Using machine learning techniques to automate sky survey catalog generation

    NASA Technical Reports Server (NTRS)

    Fayyad, Usama M.; Roden, J. C.; Doyle, R. J.; Weir, Nicholas; Djorgovski, S. G.

    1993-01-01

    We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data.

  11. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

    PubMed

    Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela

    2015-05-01

    Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  12. PCA-based polling strategy in machine learning framework for coronary artery disease risk assessment in intravascular ultrasound: A link between carotid and coronary grayscale plaque morphology.

    PubMed

    Araki, Tadashi; Ikeda, Nobutaka; Shukla, Devarshi; Jain, Pankaj K; Londhe, Narendra D; Shrivastava, Vimal K; Banchhor, Sumit K; Saba, Luca; Nicolaides, Andrew; Shafique, Shoaib; Laird, John R; Suri, Jasjit S

    2016-05-01

    Percutaneous coronary interventional procedures need advance planning prior to stenting or an endarterectomy. Cardiologists use intravascular ultrasound (IVUS) for screening, risk assessment and stratification of coronary artery disease (CAD). We hypothesize that plaque components are vulnerable to rupture due to plaque progression. Currently, there are no standard grayscale IVUS tools for risk assessment of plaque rupture. This paper presents a novel strategy for risk stratification based on plaque morphology embedded with principal component analysis (PCA) for plaque feature dimensionality reduction and dominant feature selection technique. The risk assessment utilizes 56 grayscale coronary features in a machine learning framework while linking information from carotid and coronary plaque burdens due to their common genetic makeup. This system consists of a machine learning paradigm which uses a support vector machine (SVM) combined with PCA for optimal and dominant coronary artery morphological feature extraction. Carotid artery proven intima-media thickness (cIMT) biomarker is adapted as a gold standard during the training phase of the machine learning system. For the performance evaluation, K-fold cross validation protocol is adapted with 20 trials per fold. For choosing the dominant features out of the 56 grayscale features, a polling strategy of PCA is adapted where the original value of the features is unaltered. Different protocols are designed for establishing the stability and reliability criteria of the coronary risk assessment system (cRAS). Using the PCA-based machine learning paradigm and cross-validation protocol, a classification accuracy of 98.43% (AUC 0.98) with K=10 folds using an SVM radial basis function (RBF) kernel was achieved. A reliability index of 97.32% and machine learning stability criteria of 5% were met for the cRAS. This is the first Computer aided design (CADx) system of its kind that is able to demonstrate the ability of coronary risk assessment and stratification while demonstrating a successful design of the machine learning system based on our assumptions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  13. Learning atoms for materials discovery.

    PubMed

    Zhou, Quan; Tang, Peizhe; Liu, Shenxiu; Pan, Jinbo; Yan, Qimin; Zhang, Shou-Cheng

    2018-06-26

    Exciting advances have been made in artificial intelligence (AI) during recent decades. Among them, applications of machine learning (ML) and deep learning techniques brought human-competitive performances in various tasks of fields, including image recognition, speech recognition, and natural language understanding. Even in Go, the ancient game of profound complexity, the AI player has already beat human world champions convincingly with and without learning from the human. In this work, we show that our unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials. These learned properties are represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge. We use the atom vectors as basic input units for neural networks and other ML models designed and trained to predict materials properties, which demonstrate significant accuracy. Copyright © 2018 the Author(s). Published by PNAS.

  14. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    PubMed

    S K, Somasundaram; P, Alli

    2017-11-09

    The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection of DR screening system using Bagging Ensemble Classifier (BEC) is investigated. With the help of voting the process in ML-BEC, bagging minimizes the error due to variance of the base classifier. With the publicly available retinal image databases, our classifier is trained with 25% of RI. Results show that the ensemble classifier can achieve better classification accuracy (CA) than single classification models. Empirical experiments suggest that the machine learning-based ensemble classifier is efficient for further reducing DR classification time (CT).

  15. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Estimating Global Seafloor Total Organic Carbon Using a Machine Learning Technique and Its Relevance to Methane Hydrates

    NASA Astrophysics Data System (ADS)

    Lee, T. R.; Wood, W. T.; Dale, J.

    2017-12-01

    Empirical and theoretical models of sub-seafloor organic matter transformation, degradation and methanogenesis require estimates of initial seafloor total organic carbon (TOC). This subsurface methane, under the appropriate geophysical and geochemical conditions may manifest as methane hydrate deposits. Despite the importance of seafloor TOC, actual observations of TOC in the world's oceans are sparse and large regions of the seafloor yet remain unmeasured. To provide an estimate in areas where observations are limited or non-existent, we have implemented interpolation techniques that rely on existing data sets. Recent geospatial analyses have provided accurate accounts of global geophysical and geochemical properties (e.g. crustal heat flow, seafloor biomass, porosity) through machine learning interpolation techniques. These techniques find correlations between the desired quantity (in this case TOC) and other quantities (predictors, e.g. bathymetry, distance from coast, etc.) that are more widely known. Predictions (with uncertainties) of seafloor TOC in regions lacking direct observations are made based on the correlations. Global distribution of seafloor TOC at 1 x 1 arc-degree resolution was estimated from a dataset of seafloor TOC compiled by Seiter et al. [2004] and a non-parametric (i.e. data-driven) machine learning algorithm, specifically k-nearest neighbors (KNN). Built-in predictor selection and a ten-fold validation technique generated statistically optimal estimates of seafloor TOC and uncertainties. In addition, inexperience was estimated. Inexperience is effectively the distance in parameter space to the single nearest neighbor, and it indicates geographic locations where future data collection would most benefit prediction accuracy. These improved geospatial estimates of TOC in data deficient areas will provide new constraints on methane production and subsequent methane hydrate accumulation.

  17. Machine Learning and Data Mining Methods in Diabetes Research.

    PubMed

    Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna

    2017-01-01

    The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.

  18. Machine learning and predictive data analytics enabling metrology and process control in IC fabrication

    NASA Astrophysics Data System (ADS)

    Rana, Narender; Zhang, Yunlin; Wall, Donald; Dirahoui, Bachir; Bailey, Todd C.

    2015-03-01

    Integrate circuit (IC) technology is going through multiple changes in terms of patterning techniques (multiple patterning, EUV and DSA), device architectures (FinFET, nanowire, graphene) and patterning scale (few nanometers). These changes require tight controls on processes and measurements to achieve the required device performance, and challenge the metrology and process control in terms of capability and quality. Multivariate data with complex nonlinear trends and correlations generally cannot be described well by mathematical or parametric models but can be relatively easily learned by computing machines and used to predict or extrapolate. This paper introduces the predictive metrology approach which has been applied to three different applications. Machine learning and predictive analytics have been leveraged to accurately predict dimensions of EUV resist patterns down to 18 nm half pitch leveraging resist shrinkage patterns. These patterns could not be directly and accurately measured due to metrology tool limitations. Machine learning has also been applied to predict the electrical performance early in the process pipeline for deep trench capacitance and metal line resistance. As the wafer goes through various processes its associated cost multiplies. It may take days to weeks to get the electrical performance readout. Predicting the electrical performance early on can be very valuable in enabling timely actionable decision such as rework, scrap, feedforward, feedback predicted information or information derived from prediction to improve or monitor processes. This paper provides a general overview of machine learning and advanced analytics application in the advanced semiconductor development and manufacturing.

  19. Machine-Learned Data Structures of Lipid Marker Serum Concentrations in Multiple Sclerosis Patients Differ from Those in Healthy Subjects.

    PubMed

    Lötsch, Jörn; Thrun, Michael; Lerch, Florian; Brunkhorst, Robert; Schiffmann, Susanne; Thomas, Dominique; Tegder, Irmgard; Geisslinger, Gerd; Ultsch, Alfred

    2017-06-07

    Lipid metabolism has been suggested to be a major pathophysiological mechanism of multiple sclerosis (MS). With the increasing knowledge about lipid signaling, acquired data become increasingly complex making bioinformatics necessary in lipid research. We used unsupervised machine-learning to analyze lipid marker serum concentrations, pursuing the hypothesis that for the most relevant markers the emerging data structures will coincide with the diagnosis of MS. Machine learning was implemented as emergent self-organizing feature maps (ESOM) combined with the U*-matrix visualization technique. The data space consisted of serum concentrations of three main classes of lipid markers comprising eicosanoids ( d = 11 markers), ceramides ( d = 10), and lyosophosphatidic acids ( d = 6). They were analyzed in cohorts of MS patients ( n = 102) and healthy subjects ( n = 301). Clear data structures in the high-dimensional data space were observed in eicosanoid and ceramides serum concentrations whereas no clear structure could be found in lysophosphatidic acid concentrations. With ceramide concentrations, the structures that had emerged from unsupervised machine-learning almost completely overlapped with the known grouping of MS patients versus healthy subjects. This was only partly provided by eicosanoid serum concentrations. Thus, unsupervised machine-learning identified distinct data structures of bioactive lipid serum concentrations. These structures could be superimposed with the known grouping of MS patients versus healthy subjects, which was almost completely possible with ceramides. Therefore, based on the present analysis, ceramides are first-line candidates for further exploration as drug-gable targets or biomarkers in MS.

  20. A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.

    PubMed

    Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne

    2018-05-01

    Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.

  1. Machine learning strategies for systems with invariance properties

    NASA Astrophysics Data System (ADS)

    Ling, Julia; Jones, Reese; Templeton, Jeremy

    2016-08-01

    In many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds Averaged Navier Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high performance computing has led to a growing availability of high fidelity simulation data. These data open up the possibility of using machine learning algorithms, such as random forests or neural networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these empirical models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first method, a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance at significantly reduced computational training costs.

  2. Machine learning strategies for systems with invariance properties

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neuralmore » networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.« less

  3. Machine learning strategies for systems with invariance properties

    DOE PAGES

    Ling, Julia; Jones, Reese E.; Templeton, Jeremy Alan

    2016-05-06

    Here, in many scientific fields, empirical models are employed to facilitate computational simulations of engineering systems. For example, in fluid mechanics, empirical Reynolds stress closures enable computationally-efficient Reynolds-Averaged Navier-Stokes simulations. Likewise, in solid mechanics, constitutive relations between the stress and strain in a material are required in deformation analysis. Traditional methods for developing and tuning empirical models usually combine physical intuition with simple regression techniques on limited data sets. The rise of high-performance computing has led to a growing availability of high-fidelity simulation data, which open up the possibility of using machine learning algorithms, such as random forests or neuralmore » networks, to develop more accurate and general empirical models. A key question when using data-driven algorithms to develop these models is how domain knowledge should be incorporated into the machine learning process. This paper will specifically address physical systems that possess symmetry or invariance properties. Two different methods for teaching a machine learning model an invariance property are compared. In the first , a basis of invariant inputs is constructed, and the machine learning model is trained upon this basis, thereby embedding the invariance into the model. In the second method, the algorithm is trained on multiple transformations of the raw input data until the model learns invariance to that transformation. Results are discussed for two case studies: one in turbulence modeling and one in crystal elasticity. It is shown that in both cases embedding the invariance property into the input features yields higher performance with significantly reduced computational training costs.« less

  4. Applying Machine Learning to Star Cluster Classification

    NASA Astrophysics Data System (ADS)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  5. Machine learning in the rational design of antimicrobial peptides.

    PubMed

    Rondón-Villarreal, Paola; Sierra, Daniel A; Torres, Rodrigo

    2014-01-01

    One of the most important public health issues is the microbial and bacterial resistance to conventional antibiotics by pathogen microorganisms. In recent years, many researches have been focused on the development of new antibiotics. Among these, antimicrobial peptides (AMPs) have raised as a promising alternative to combat antibioticresistant microorganisms. For this reason, many theoretical efforts have been done in the development of new computational tools for the rational design of both better and effective AMPs. In this review, we present an overview of the rational design of AMPs using machine learning techniques and new research fields.

  6. Learning and Optimization of Cognitive Capabilities. Final Project Report.

    ERIC Educational Resources Information Center

    Lumsdaine, A.A.; And Others

    The work of a three-year series of experimental studies of human cognition is summarized in this report. Proglem solving and learning in man-machine interaction was investigated, as well as relevant variables and processes. The work included four separate projects: (1) computer-aided problem solving, (2) computer-aided instruction techniques, (3)…

  7. Developing a Hybrid Model to Predict Student First Year Retention in STEM Disciplines Using Machine Learning Techniques

    ERIC Educational Resources Information Center

    Alkhasawneh, Ruba; Hargraves, Rosalyn Hobson

    2014-01-01

    The purpose of this research was to develop a hybrid framework to model first year student retention for underrepresented minority (URM) students comprising African Americans, Hispanic Americans, and Native Americans. Identifying inputs that best contribute to student retention provides significant information for institutions to learn about…

  8. An Analysis of Learning To Plan as a Search Problem.

    ERIC Educational Resources Information Center

    Gratch, Jonathan; DeJong, Gerald

    Increasingly, machine learning is entertained as a mechanism for improving the efficiency of planning systems. Research in this area has generated an impressive battery of techniques and a growing body of empirical successes. Unfortunately the formal properties of these systems are not well understood. This is highlighted by a growing corpus of…

  9. Machine learning derived risk prediction of anorexia nervosa.

    PubMed

    Guo, Yiran; Wei, Zhi; Keating, Brendan J; Hakonarson, Hakon

    2016-01-20

    Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children's Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC's of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting.

  10. Scaling Support Vector Machines On Modern HPC Platforms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    You, Yang; Fu, Haohuan; Song, Shuaiwen

    2015-02-01

    We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.

  11. Maharshi Pathak | NREL

    Science.gov Websites

    working at NREL, Maharshi was a graduate student at ASU where he focused on incorporation of machine learning techniques into the modeling of commercial building stocks that can assist the policy-level

  12. Crowdsourcing: A Primer and Its implications for Systems Engineering

    DTIC Science & Technology

    2012-08-01

    detailing areas to be improved within current crowdsourcing frameworks. Finally, an agent-based simulation using machine learning techniques is defined, preliminary results are presented, and future research directions are described.

  13. Wavelet-enhanced convolutional neural network: a new idea in a deep learning paradigm.

    PubMed

    Savareh, Behrouz Alizadeh; Emami, Hassan; Hajiabadi, Mohamadreza; Azimi, Seyed Majid; Ghafoori, Mahyar

    2018-05-29

    Manual brain tumor segmentation is a challenging task that requires the use of machine learning techniques. One of the machine learning techniques that has been given much attention is the convolutional neural network (CNN). The performance of the CNN can be enhanced by combining other data analysis tools such as wavelet transform. In this study, one of the famous implementations of CNN, a fully convolutional network (FCN), was used in brain tumor segmentation and its architecture was enhanced by wavelet transform. In this combination, a wavelet transform was used as a complementary and enhancing tool for CNN in brain tumor segmentation. Comparing the performance of basic FCN architecture against the wavelet-enhanced form revealed a remarkable superiority of enhanced architecture in brain tumor segmentation tasks. Using mathematical functions and enhancing tools such as wavelet transform and other mathematical functions can improve the performance of CNN in any image processing task such as segmentation and classification.

  14. Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    PubMed Central

    Sheth, Amit; Perera, Sujan; Wijeratne, Sanjaya; Thirunarayan, Krishnaprasad

    2018-01-01

    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.

  15. Artificial Intelligence in Precision Cardiovascular Medicine.

    PubMed

    Krittanawong, Chayakrit; Zhang, HongJu; Wang, Zhen; Aydar, Mehmet; Kitai, Takeshi

    2017-05-30

    Artificial intelligence (AI) is a field of computer science that aims to mimic human thought processes, learning capacity, and knowledge storage. AI techniques have been applied in cardiovascular medicine to explore novel genotypes and phenotypes in existing diseases, improve the quality of patient care, enable cost-effectiveness, and reduce readmission and mortality rates. Over the past decade, several machine-learning techniques have been used for cardiovascular disease diagnosis and prediction. Each problem requires some degree of understanding of the problem, in terms of cardiovascular medicine and statistics, to apply the optimal machine-learning algorithm. In the near future, AI will result in a paradigm shift toward precision cardiovascular medicine. The potential of AI in cardiovascular medicine is tremendous; however, ignorance of the challenges may overshadow its potential clinical impact. This paper gives a glimpse of AI's application in cardiovascular clinical care and discusses its potential role in facilitating precision cardiovascular medicine. Copyright © 2017 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  16. Precision Parameter Estimation and Machine Learning

    NASA Astrophysics Data System (ADS)

    Wandelt, Benjamin D.

    2008-12-01

    I discuss the strategy of ``Acceleration by Parallel Precomputation and Learning'' (AP-PLe) that can vastly accelerate parameter estimation in high-dimensional parameter spaces and costly likelihood functions, using trivially parallel computing to speed up sequential exploration of parameter space. This strategy combines the power of distributed computing with machine learning and Markov-Chain Monte Carlo techniques efficiently to explore a likelihood function, posterior distribution or χ2-surface. This strategy is particularly successful in cases where computing the likelihood is costly and the number of parameters is moderate or large. We apply this technique to two central problems in cosmology: the solution of the cosmological parameter estimation problem with sufficient accuracy for the Planck data using PICo; and the detailed calculation of cosmological helium and hydrogen recombination with RICO. Since the APPLe approach is designed to be able to use massively parallel resources to speed up problems that are inherently serial, we can bring the power of distributed computing to bear on parameter estimation problems. We have demonstrated this with the CosmologyatHome project.

  17. Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach

    NASA Technical Reports Server (NTRS)

    Das, Santanu; Oza, Nikunj C.

    2011-01-01

    In this paper we propose an innovative learning algorithm - a variation of One-class nu Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class nu SVMs while reducing both training time and test time by several factors.

  18. Photometric Supernova Classification with Machine Learning

    NASA Astrophysics Data System (ADS)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  19. Machine learning search for variable stars

    NASA Astrophysics Data System (ADS)

    Pashchenko, Ilya N.; Sokolovsky, Kirill V.; Gavras, Panagiotis

    2018-04-01

    Photometric variability detection is often considered as a hypothesis testing problem: an object is variable if the null hypothesis that its brightness is constant can be ruled out given the measurements and their uncertainties. The practical applicability of this approach is limited by uncorrected systematic errors. We propose a new variability detection technique sensitive to a wide range of variability types while being robust to outliers and underestimated measurement uncertainties. We consider variability detection as a classification problem that can be approached with machine learning. Logistic Regression (LR), Support Vector Machines (SVM), k Nearest Neighbours (kNN), Neural Nets (NN), Random Forests (RF), and Stochastic Gradient Boosting classifier (SGB) are applied to 18 features (variability indices) quantifying scatter and/or correlation between points in a light curve. We use a subset of Optical Gravitational Lensing Experiment phase two (OGLE-II) Large Magellanic Cloud (LMC) photometry (30 265 light curves) that was searched for variability using traditional methods (168 known variable objects) as the training set and then apply the NN to a new test set of 31 798 OGLE-II LMC light curves. Among 205 candidates selected in the test set, 178 are real variables, while 13 low-amplitude variables are new discoveries. The machine learning classifiers considered are found to be more efficient (select more variables and fewer false candidates) compared to traditional techniques using individual variability indices or their linear combination. The NN, SGB, SVM, and RF show a higher efficiency compared to LR and kNN.

  20. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

    PubMed

    Hussain, Lal

    2018-06-01

    Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.

  1. Use of Machine Learning Classifiers and Sensor Data to Detect Neurological Deficit in Stroke Patients.

    PubMed

    Park, Eunjeong; Chang, Hyuk-Jae; Nam, Hyo Suk

    2017-04-18

    The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3%. Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ©Eunjeong Park, Hyuk-Jae Chang, Hyo Suk Nam. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.04.2017.

  2. Machine learning-based quantitative texture analysis of CT images of small renal masses: Differentiation of angiomyolipoma without visible fat from renal cell carcinoma.

    PubMed

    Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei

    2018-04-01

    To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P < 0.05) and had good interobserver agreement. An optimal feature subset including 11 features was further selected by the SVM-RFE method. The SVM-RFE+SMOTE classifier achieved the best performance in discriminating between small AMLwvf and RCC, with the highest accuracy, sensitivity, specificity and AUC of 93.9 %, 87.8 %, 100 % and 0.955, respectively. Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.

  3. Toward interactive search in remote sensing imagery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Porter, Reid B; Hush, Do; Harvey, Neal

    2010-01-01

    To move from data to information in almost all science and defense applications requires a human-in-the-loop to validate information products, resolve inconsistencies, and account for incomplete and potentially deceptive sources of information. This is a key motivation for visual analytics which aims to develop techniques that complement and empower human users. By contrast, the vast majority of algorithms developed in machine learning aim to replace human users in data exploitation. In this paper we describe a recently introduced machine learning problem, called rare category detection, which may be a better match to visual analytic environments. We describe a new designmore » criteria for this problem, and present comparisons to existing techniques with both synthetic and real-world datasets. We conclude by describing an application in broad-area search of remote sensing imagery.« less

  4. Machine learning techniques to predict sensitive patterns to fault attack in the Java Card application

    NASA Astrophysics Data System (ADS)

    Chahrazed, Yahiaoui; Jean-Louis, Lanet; Mohamed, Mezghiche; Karim, Tamine

    2018-01-01

    Fault attack represents one of the serious threats against Java Card security. It consists of physical perturbation of chip components to introduce faults in the code execution. A fault may be induced using a laser beam to impact opcodes and operands of instructions. This could lead to a mutation of the application code in such a way that it becomes hostile. Any successful attack may reveal a secret information stored in the card or grant an undesired authorisation. We propose a methodology to recognise, during the development step, the sensitive patterns to the fault attack in the Java Card applications. It is based on the concepts from text categorisation and machine learning. In fact, in this method, we represented the patterns using opcodes n-grams as features, and we evaluated different machine learning classifiers. The results show that the classifiers performed poorly when classifying dangerous sensitive patterns, due to the imbalance of our data-set. The number of dangerous sensitive patterns is much lower than the number of not dangerous patterns. We used resampling techniques to balance the class distribution in our data-set. The experimental results indicated that the resampling techniques improved the accuracy of the classifiers. In addition, our proposed method reduces the execution time of sensitive patterns classification in comparison to the SmartCM tool. This tool is used in our study to evaluate the effect of faults on Java Card applications.

  5. Full-Physics Inverse Learning Machine for Satellite Remote Sensing Retrievals

    NASA Astrophysics Data System (ADS)

    Loyola, D. G.

    2017-12-01

    The satellite remote sensing retrievals are usually ill-posed inverse problems that are typically solved by finding a state vector that minimizes the residual between simulated data and real measurements. The classical inversion methods are very time-consuming as they require iterative calls to complex radiative-transfer forward models to simulate radiances and Jacobians, and subsequent inversion of relatively large matrices. In this work we present a novel and extremely fast algorithm for solving inverse problems called full-physics inverse learning machine (FP-ILM). The FP-ILM algorithm consists of a training phase in which machine learning techniques are used to derive an inversion operator based on synthetic data generated using a radiative transfer model (which expresses the "full-physics" component) and the smart sampling technique, and an operational phase in which the inversion operator is applied to real measurements. FP-ILM has been successfully applied to the retrieval of the SO2 plume height during volcanic eruptions and to the retrieval of ozone profile shapes from UV/VIS satellite sensors. Furthermore, FP-ILM will be used for the near-real-time processing of the upcoming generation of European Sentinel sensors with their unprecedented spectral and spatial resolution and associated large increases in the amount of data.

  6. An analysis of a digital variant of the Trail Making Test using machine learning techniques.

    PubMed

    Dahmen, Jessamyn; Cook, Diane; Fellows, Robert; Schmitter-Edgecombe, Maureen

    2017-01-01

    The goal of this work is to develop a digital version of a standard cognitive assessment, the Trail Making Test (TMT), and assess its utility. This paper introduces a novel digital version of the TMT and introduces a machine learning based approach to assess its capabilities. Using digital Trail Making Test (dTMT) data collected from (N = 54) older adult participants as feature sets, we use machine learning techniques to analyze the utility of the dTMT and evaluate the insights provided by the digital features. Predicted TMT scores correlate well with clinical digital test scores (r = 0.98) and paper time to completion scores (r = 0.65). Predicted TICS exhibited a small correlation with clinically derived TICS scores (r = 0.12 Part A, r = 0.10 Part B). Predicted FAB scores exhibited a small correlation with clinically derived FAB scores (r = 0.13 Part A, r = 0.29 for Part B). Digitally derived features were also used to predict diagnosis (AUC of 0.65). Our findings indicate that the dTMT is capable of measuring the same aspects of cognition as the paper-based TMT. Furthermore, the dTMT's additional data may be able to help monitor other cognitive processes not captured by the paper-based TMT alone.

  7. Enhanced Quality Control in Pharmaceutical Applications by Combining Raman Spectroscopy and Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Martinez, J. C.; Guzmán-Sepúlveda, J. R.; Bolañoz Evia, G. R.; Córdova, T.; Guzmán-Cabrera, R.

    2018-06-01

    In this work, we applied machine learning techniques to Raman spectra for the characterization and classification of manufactured pharmaceutical products. Our measurements were taken with commercial equipment, for accurate assessment of variations with respect to one calibrated control sample. Unlike the typical use of Raman spectroscopy in pharmaceutical applications, in our approach the principal components of the Raman spectrum are used concurrently as attributes in machine learning algorithms. This permits an efficient comparison and classification of the spectra measured from the samples under study. This also allows for accurate quality control as all relevant spectral components are considered simultaneously. We demonstrate our approach with respect to the specific case of acetaminophen, which is one of the most widely used analgesics in the market. In the experiments, commercial samples from thirteen different laboratories were analyzed and compared against a control sample. The raw data were analyzed based on an arithmetic difference between the nominal active substance and the measured values in each commercial sample. The principal component analysis was applied to the data for quantitative verification (i.e., without considering the actual concentration of the active substance) of the difference in the calibrated sample. Our results show that by following this approach adulterations in pharmaceutical compositions can be clearly identified and accurately quantified.

  8. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

    PubMed Central

    2013-01-01

    Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620

  9. Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD.

    PubMed

    Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe

    2015-01-01

    The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.

  10. A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data

    PubMed Central

    2015-01-01

    Background Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets. Results and discussion Our RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large ‘omics’ datasets are increasingly being used in the area of rheumatology. Conclusions Feature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery. PMID:25923811

  11. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques.

    PubMed

    Macyszyn, Luke; Akbari, Hamed; Pisapia, Jared M; Da, Xiao; Attiah, Mark; Pigrish, Vadim; Bi, Yingtao; Pal, Sharmistha; Davuluri, Ramana V; Roccograndi, Laura; Dahmane, Nadia; Martinez-Lage, Maria; Biros, George; Wolf, Ronald L; Bilello, Michel; O'Rourke, Donald M; Davatzikos, Christos

    2016-03-01

    MRI characteristics of brain gliomas have been used to predict clinical outcome and molecular tumor characteristics. However, previously reported imaging biomarkers have not been sufficiently accurate or reproducible to enter routine clinical practice and often rely on relatively simple MRI measures. The current study leverages advanced image analysis and machine learning algorithms to identify complex and reproducible imaging patterns predictive of overall survival and molecular subtype in glioblastoma (GB). One hundred five patients with GB were first used to extract approximately 60 diverse features from preoperative multiparametric MRIs. These imaging features were used by a machine learning algorithm to derive imaging predictors of patient survival and molecular subtype. Cross-validation ensured generalizability of these predictors to new patients. Subsequently, the predictors were evaluated in a prospective cohort of 29 new patients. Survival curves yielded a hazard ratio of 10.64 for predicted long versus short survivors. The overall, 3-way (long/medium/short survival) accuracy in the prospective cohort approached 80%. Classification of patients into the 4 molecular subtypes of GB achieved 76% accuracy. By employing machine learning techniques, we were able to demonstrate that imaging patterns are highly predictive of patient survival. Additionally, we found that GB subtypes have distinctive imaging phenotypes. These results reveal that when imaging markers related to infiltration, cell density, microvascularity, and blood-brain barrier compromise are integrated via advanced pattern analysis methods, they form very accurate predictive biomarkers. These predictive markers used solely preoperative images, hence they can significantly augment diagnosis and treatment of GB patients. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Quantitative sensory testing response patterns to capsaicin- and ultraviolet-B–induced local skin hypersensitization in healthy subjects: a machine-learned analysis

    PubMed Central

    Lötsch, Jörn; Geisslinger, Gerd; Heinemann, Sarah; Lerch, Florian; Oertel, Bruno G.; Ultsch, Alfred

    2018-01-01

    Abstract The comprehensive assessment of pain-related human phenotypes requires combinations of nociceptive measures that produce complex high-dimensional data, posing challenges to bioinformatic analysis. In this study, we assessed established experimental models of heat hyperalgesia of the skin, consisting of local ultraviolet-B (UV-B) irradiation or capsaicin application, in 82 healthy subjects using a variety of noxious stimuli. We extended the original heat stimulation by applying cold and mechanical stimuli and assessing the hypersensitization effects with a clinically established quantitative sensory testing (QST) battery (German Research Network on Neuropathic Pain). This study provided a 246 × 10-sized data matrix (82 subjects assessed at baseline, following UV-B application, and following capsaicin application) with respect to 10 QST parameters, which we analyzed using machine-learning techniques. We observed statistically significant effects of the hypersensitization treatments in 9 different QST parameters. Supervised machine-learned analysis implemented as random forests followed by ABC analysis pointed to heat pain thresholds as the most relevantly affected QST parameter. However, decision tree analysis indicated that UV-B additionally modulated sensitivity to cold. Unsupervised machine-learning techniques, implemented as emergent self-organizing maps, hinted at subgroups responding to topical application of capsaicin. The distinction among subgroups was based on sensitivity to pressure pain, which could be attributed to sex differences, with women being more sensitive than men. Thus, while UV-B and capsaicin share a major component of heat pain sensitization, they differ in their effects on QST parameter patterns in healthy subjects, suggesting a lack of redundancy between these models. PMID:28700537

  13. Using Machine Learning as a fast emulator of physical processes within the Met Office's Unified Model

    NASA Astrophysics Data System (ADS)

    Prudden, R.; Arribas, A.; Tomlinson, J.; Robinson, N.

    2017-12-01

    The Unified Model is a numerical model of the atmosphere used at the UK Met Office (and numerous partner organisations including Korean Meteorological Agency, Australian Bureau of Meteorology and US Air Force) for both weather and climate applications.Especifically, dynamical models such as the Unified Model are now a central part of weather forecasting. Starting from basic physical laws, these models make it possible to predict events such as storms before they have even begun to form. The Unified Model can be simply described as having two components: one component solves the navier-stokes equations (usually referred to as the "dynamics"); the other solves relevant sub-grid physical processes (usually referred to as the "physics"). Running weather forecasts requires substantial computing resources - for example, the UK Met Office operates the largest operational High Performance Computer in Europe - and the cost of a typical simulation is spent roughly 50% in the "dynamics" and 50% in the "physics". Therefore there is a high incentive to reduce cost of weather forecasts and Machine Learning is a possible option because, once a machine learning model has been trained, it is often much faster to run than a full simulation. This is the motivation for a technique called model emulation, the idea being to build a fast statistical model which closely approximates a far more expensive simulation. In this paper we discuss the use of Machine Learning as an emulator to replace the "physics" component of the Unified Model. Various approaches and options will be presented and the implications for further model development, operational running of forecasting systems, development of data assimilation schemes, and development of ensemble prediction techniques will be discussed.

  14. Quantum Support Vector Machine for Big Data Classification

    NASA Astrophysics Data System (ADS)

    Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth

    2014-09-01

    Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.

  15. Applying data fusion techniques for benthic habitat mapping and monitoring in a coral reef ecosystem

    NASA Astrophysics Data System (ADS)

    Zhang, Caiyun

    2015-06-01

    Accurate mapping and effective monitoring of benthic habitat in the Florida Keys are critical in developing management strategies for this valuable coral reef ecosystem. For this study, a framework was designed for automated benthic habitat mapping by combining multiple data sources (hyperspectral, aerial photography, and bathymetry data) and four contemporary imagery processing techniques (data fusion, Object-based Image Analysis (OBIA), machine learning, and ensemble analysis). In the framework, 1-m digital aerial photograph was first merged with 17-m hyperspectral imagery and 10-m bathymetry data using a pixel/feature-level fusion strategy. The fused dataset was then preclassified by three machine learning algorithms (Random Forest, Support Vector Machines, and k-Nearest Neighbor). Final object-based habitat maps were produced through ensemble analysis of outcomes from three classifiers. The framework was tested for classifying a group-level (3-class) and code-level (9-class) habitats in a portion of the Florida Keys. Informative and accurate habitat maps were achieved with an overall accuracy of 88.5% and 83.5% for the group-level and code-level classifications, respectively.

  16. Application of a model of instrumental conditioning to mobile robot control

    NASA Astrophysics Data System (ADS)

    Saksida, Lisa M.; Touretzky, D. S.

    1997-09-01

    Instrumental conditioning is a psychological process whereby an animal learns to associate its actions with their consequences. This type of learning is exploited in animal training techniques such as 'shaping by successive approximations,' which enables trainers to gradually adjust the animal's behavior by giving strategically timed reinforcements. While this is similar in principle to reinforcement learning, the real phenomenon includes many subtle effects not considered in the machine learning literature. In addition, a good deal of domain information is utilized by an animal learning a new task; it does not start from scratch every time it learns a new behavior. For these reasons, it is not surprising that mobile robot learning algorithms have yet to approach the sophistication and robustness of animal learning. A serious attempt to model instrumental learning could prove fruitful for improving machine learning techniques. In the present paper, we develop a computational theory of shaping at a level appropriate for controlling mobile robots. The theory is based on a series of mechanisms for 'behavior editing,' in which pre-existing behaviors, either innate or previously learned, can be dramatically changed in magnitude, shifted in direction, or otherwise manipulated so as to produce new behavioral routines. We have implemented our theory on Amelia, an RWI B21 mobile robot equipped with a gripper and color video camera. We provide results from training Amelia on several tasks, all of which were constructed as variations of one innate behavior, object-pursuit.

  17. Automated Essay Grading using Machine Learning Algorithm

    NASA Astrophysics Data System (ADS)

    Ramalingam, V. V.; Pandian, A.; Chetry, Prateek; Nigam, Himanshu

    2018-04-01

    Essays are paramount for of assessing the academic excellence along with linking the different ideas with the ability to recall but are notably time consuming when they are assessed manually. Manual grading takes significant amount of evaluator’s time and hence it is an expensive process. Automated grading if proven effective will not only reduce the time for assessment but comparing it with human scores will also make the score realistic. The project aims to develop an automated essay assessment system by use of machine learning techniques by classifying a corpus of textual entities into small number of discrete categories, corresponding to possible grades. Linear regression technique will be utilized for training the model along with making the use of various other classifications and clustering techniques. We intend to train classifiers on the training set, make it go through the downloaded dataset, and then measure performance our dataset by comparing the obtained values with the dataset values. We have implemented our model using java.

  18. Planning for rover opportunistic science

    NASA Technical Reports Server (NTRS)

    Gaines, Daniel M.; Estlin, Tara; Forest, Fisher; Chouinard, Caroline; Castano, Rebecca; Anderson, Robert C.

    2004-01-01

    The Mars Exploration Rover Spirit recently set a record for the furthest distance traveled in a single sol on Mars. Future planetary exploration missions are expected to use even longer drives to position rovers in areas of high scientific interest. This increase provides the potential for a large rise in the number of new science collection opportunities as the rover traverses the Martian surface. In this paper, we describe the OASIS system, which provides autonomous capabilities for dynamically identifying and pursuing these science opportunities during longrange traverses. OASIS uses machine learning and planning and scheduling techniques to address this goal. Machine learning techniques are applied to analyze data as it is collected and quickly determine new science gods and priorities on these goals. Planning and scheduling techniques are used to alter the behavior of the rover so that new science measurements can be performed while still obeying resource and other mission constraints. We will introduce OASIS and describe how planning and scheduling algorithms support opportunistic science.

  19. A feasibility study of automatic lung nodule detection in chest digital tomosynthesis with machine learning based on support vector machine

    NASA Astrophysics Data System (ADS)

    Lee, Donghoon; Kim, Ye-seul; Choi, Sunghoon; Lee, Haenghwa; Jo, Byungdu; Choi, Seungyeon; Shin, Jungwook; Kim, Hee-Joung

    2017-03-01

    The chest digital tomosynthesis(CDT) is recently developed medical device that has several advantage for diagnosing lung disease. For example, CDT provides depth information with relatively low radiation dose compared to computed tomography (CT). However, a major problem with CDT is the image artifacts associated with data incompleteness resulting from limited angle data acquisition in CDT geometry. For this reason, the sensitivity of lung disease was not clear compared to CT. In this study, to improve sensitivity of lung disease detection in CDT, we developed computer aided diagnosis (CAD) systems based on machine learning. For design CAD systems, we used 100 cases of lung nodules cropped images and 100 cases of normal lesion cropped images acquired by lung man phantoms and proto type CDT. We used machine learning techniques based on support vector machine and Gabor filter. The Gabor filter was used for extracting characteristics of lung nodules and we compared performance of feature extraction of Gabor filter with various scale and orientation parameters. We used 3, 4, 5 scales and 4, 6, 8 orientations. After extracting features, support vector machine (SVM) was used for classifying feature of lesions. The linear, polynomial and Gaussian kernels of SVM were compared to decide the best SVM conditions for CDT reconstruction images. The results of CAD system with machine learning showed the capability of automatically lung lesion detection. Furthermore detection performance was the best when Gabor filter with 5 scale and 8 orientation and SVM with Gaussian kernel were used. In conclusion, our suggested CAD system showed improving sensitivity of lung lesion detection in CDT and decide Gabor filter and SVM conditions to achieve higher detection performance of our developed CAD system for CDT.

  20. Machine learning for the automatic detection of anomalous events

    NASA Astrophysics Data System (ADS)

    Fisher, Wendy D.

    In this dissertation, we describe our research contributions for a novel approach to the application of machine learning for the automatic detection of anomalous events. We work in two different domains to ensure a robust data-driven workflow that could be generalized for monitoring other systems. Specifically, in our first domain, we begin with the identification of internal erosion events in earth dams and levees (EDLs) using geophysical data collected from sensors located on the surface of the levee. As EDLs across the globe reach the end of their design lives, effectively monitoring their structural integrity is of critical importance. The second domain of interest is related to mobile telecommunications, where we investigate a system for automatically detecting non-commercial base station routers (BSRs) operating in protected frequency space. The presence of non-commercial BSRs can disrupt the connectivity of end users, cause service issues for the commercial providers, and introduce significant security concerns. We provide our motivation, experimentation, and results from investigating a generalized novel data-driven workflow using several machine learning techniques. In Chapter 2, we present results from our performance study that uses popular unsupervised clustering algorithms to gain insights to our real-world problems, and evaluate our results using internal and external validation techniques. Using EDL passive seismic data from an experimental laboratory earth embankment, results consistently show a clear separation of events from non-events in four of the five clustering algorithms applied. Chapter 3 uses a multivariate Gaussian machine learning model to identify anomalies in our experimental data sets. For the EDL work, we used experimental data from two different laboratory earth embankments. Additionally, we explore five wavelet transform methods for signal denoising. The best performance is achieved with the Haar wavelets. We achieve up to 97.3% overall accuracy and less than 1.4% false negatives in anomaly detection. In Chapter 4, we research using two-class and one-class support vector machines (SVMs) for an effective anomaly detection system. We again use the two different EDL data sets from experimental laboratory earth embankments (each having approximately 80% normal and 20% anomalies) to ensure our workflow is robust enough to work with multiple data sets and different types of anomalous events (e.g., cracks and piping). We apply Haar wavelet-denoising techniques and extract nine spectral features from decomposed segments of the time series data. The two-class SVM with 10-fold cross validation achieved over 94% overall accuracy and 96% F1-score. Our approach provides a means for automatically identifying anomalous events using various machine learning techniques. Detecting internal erosion events in aging EDLs, earlier than is currently possible, can allow more time to prevent or mitigate catastrophic failures. Results show that we can successfully separate normal from anomalous data observations in passive seismic data, and provide a step towards techniques for continuous real-time monitoring of EDL health. Our lightweight non-commercial BSR detection system also has promise in separating commercial from non-commercial BSR scans without the need for prior geographic location information, extensive time-lapse surveys, or a database of known commercial carriers. (Abstract shortened by ProQuest.).

  1. Segmenting overlapping nano-objects in atomic force microscopy image

    NASA Astrophysics Data System (ADS)

    Wang, Qian; Han, Yuexing; Li, Qing; Wang, Bing; Konagaya, Akihiko

    2018-01-01

    Recently, techniques for nanoparticles have rapidly been developed for various fields, such as material science, medical, and biology. In particular, methods of image processing have widely been used to automatically analyze nanoparticles. A technique to automatically segment overlapping nanoparticles with image processing and machine learning is proposed. Here, two tasks are necessary: elimination of image noises and action of the overlapping shapes. For the first task, mean square error and the seed fill algorithm are adopted to remove noises and improve the quality of the original image. For the second task, four steps are needed to segment the overlapping nanoparticles. First, possibility split lines are obtained by connecting the high curvature pixels on the contours. Second, the candidate split lines are classified with a machine learning algorithm. Third, the overlapping regions are detected with the method of density-based spatial clustering of applications with noise (DBSCAN). Finally, the best split lines are selected with a constrained minimum value. We give some experimental examples and compare our technique with two other methods. The results can show the effectiveness of the proposed technique.

  2. Discriminating Induced-Microearthquakes Using New Seismic Features

    NASA Astrophysics Data System (ADS)

    Mousavi, S. M.; Horton, S.

    2016-12-01

    We studied characteristics of induced-microearthquakes on the basis of the waveforms recorded on a limited number of surface receivers using machine-learning techniques. Forty features in the time, frequency, and time-frequency domains were measured on each waveform, and several techniques such as correlation-based feature selection, Artificial Neural Networks (ANNs), Logistic Regression (LR) and X-mean were used as research tools to explore the relationship between these seismic features and source parameters. The results show that spectral features have the highest correlation to source depth. Two new measurements developed as seismic features for this study, spectral centroids and 2D cross-correlations in the time-frequency domain, performed better than the common seismic measurements. These features can be used by machine learning techniques for efficient automatic classification of low energy signals recorded at one or more seismic stations. We applied the technique to 440 microearthquakes-1.7Reference: Mousavi, S.M., S.P. Horton, C. A. Langston, B. Samei, (2016) Seismic features and automatic discrimination of deep and shallow induced-microearthquakes using neural network and logistic regression, Geophys. J. Int. doi: 10.1093/gji/ggw258.

  3. Multivariate Time Series Forecasting of Crude Palm Oil Price Using Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Kanchymalay, Kasturi; Salim, N.; Sukprasert, Anupong; Krishnan, Ramesh; Raba'ah Hashim, Ummi

    2017-08-01

    The aim of this paper was to study the correlation between crude palm oil (CPO) price, selected vegetable oil prices (such as soybean oil, coconut oil, and olive oil, rapeseed oil and sunflower oil), crude oil and the monthly exchange rate. Comparative analysis was then performed on CPO price forecasting results using the machine learning techniques. Monthly CPO prices, selected vegetable oil prices, crude oil prices and monthly exchange rate data from January 1987 to February 2017 were utilized. Preliminary analysis showed a positive and high correlation between the CPO price and soy bean oil price and also between CPO price and crude oil price. Experiments were conducted using multi-layer perception, support vector regression and Holt Winter exponential smoothing techniques. The results were assessed by using criteria of root mean square error (RMSE), means absolute error (MAE), means absolute percentage error (MAPE) and Direction of accuracy (DA). Among these three techniques, support vector regression(SVR) with Sequential minimal optimization (SMO) algorithm showed relatively better results compared to multi-layer perceptron and Holt Winters exponential smoothing method.

  4. Fluorescence excitation-emission matrix spectroscopy for degradation monitoring of machinery lubricants

    NASA Astrophysics Data System (ADS)

    Sosnovski, Oleg; Suresh, Pooja; Dudelzak, Alexander E.; Green, Benjamin

    2018-02-01

    Lubrication oil is a vital component of heavy rotating machinery defining the machine's health, operational safety and effectiveness. Recently, the focus has been on developing sensors that provide real-time/online monitoring of oil condition/lubricity. Industrial practices and standards for assessing oil condition involve various analytical methods. Most these techniques are unsuitable for online applications. The paper presents the results of studying degradation of antioxidant additives in machinery lubricants using Fluorescence Excitation-Emission Matrix (EEM) Spectroscopy and Machine Learning techniques. EEM Spectroscopy is capable of rapid and even standoff sensing; it is potentially applicable to real-time online monitoring.

  5. Formation enthalpies for transition metal alloys using machine learning

    NASA Astrophysics Data System (ADS)

    Ubaru, Shashanka; Miedlar, Agnieszka; Saad, Yousef; Chelikowsky, James R.

    2017-06-01

    The enthalpy of formation is an important thermodynamic property. Developing fast and accurate methods for its prediction is of practical interest in a variety of applications. Material informatics techniques based on machine learning have recently been introduced in the literature as an inexpensive means of exploiting materials data, and can be used to examine a variety of thermodynamics properties. We investigate the use of such machine learning tools for predicting the formation enthalpies of binary intermetallic compounds that contain at least one transition metal. We consider certain easily available properties of the constituting elements complemented by some basic properties of the compounds, to predict the formation enthalpies. We show how choosing these properties (input features) based on a literature study (using prior physics knowledge) seems to outperform machine learning based feature selection methods such as sensitivity analysis and LASSO (least absolute shrinkage and selection operator) based methods. A nonlinear kernel based support vector regression method is employed to perform the predictions. The predictive ability of our model is illustrated via several experiments on a dataset containing 648 binary alloys. We train and validate the model using the formation enthalpies calculated using a model by Miedema, which is a popular semiempirical model used for the prediction of formation enthalpies of metal alloys.

  6. Automatic detection of Martian dark slope streaks by machine learning using HiRISE images

    NASA Astrophysics Data System (ADS)

    Wang, Yexin; Di, Kaichang; Xin, Xin; Wan, Wenhui

    2017-07-01

    Dark slope streaks (DSSs) on the Martian surface are one of the active geologic features that can be observed on Mars nowadays. The detection of DSS is a prerequisite for studying its appearance, morphology, and distribution to reveal its underlying geological mechanisms. In addition, increasingly massive amounts of Mars high resolution data are now available. Hence, an automatic detection method for locating DSSs is highly desirable. In this research, we present an automatic DSS detection method by combining interest region extraction and machine learning techniques. The interest region extraction combines gradient and regional grayscale information. Moreover, a novel recognition strategy is proposed that takes the normalized minimum bounding rectangles (MBRs) of the extracted regions to calculate the Local Binary Pattern (LBP) feature and train a DSS classifier using the Adaboost machine learning algorithm. Comparative experiments using five different feature descriptors and three different machine learning algorithms show the superiority of the proposed method. Experimental results utilizing 888 extracted region samples from 28 HiRISE images show that the overall detection accuracy of our proposed method is 92.4%, with a true positive rate of 79.1% and false positive rate of 3.7%, which in particular indicates great performance of the method at eliminating non-DSS regions.

  7. Advances in Patient Classification for Traditional Chinese Medicine: A Machine Learning Perspective

    PubMed Central

    Zhao, Changbo; Li, Guo-Zheng; Wang, Chengjun; Niu, Jinling

    2015-01-01

    As a complementary and alternative medicine in medical field, traditional Chinese medicine (TCM) has drawn great attention in the domestic field and overseas. In practice, TCM provides a quite distinct methodology to patient diagnosis and treatment compared to western medicine (WM). Syndrome (ZHENG or pattern) is differentiated by a set of symptoms and signs examined from an individual by four main diagnostic methods: inspection, auscultation and olfaction, interrogation, and palpation which reflects the pathological and physiological changes of disease occurrence and development. Patient classification is to divide patients into several classes based on different criteria. In this paper, from the machine learning perspective, a survey on patient classification issue will be summarized on three major aspects of TCM: sign classification, syndrome differentiation, and disease classification. With the consideration of different diagnostic data analyzed by different computational methods, we present the overview for four subfields of TCM diagnosis, respectively. For each subfield, we design a rectangular reference list with applications in the horizontal direction and machine learning algorithms in the longitudinal direction. According to the current development of objective TCM diagnosis for patient classification, a discussion of the research issues around machine learning techniques with applications to TCM diagnosis is given to facilitate the further research for TCM patient classification. PMID:26246834

  8. Markerless gating for lung cancer radiotherapy based on machine learning techniques

    NASA Astrophysics Data System (ADS)

    Lin, Tong; Li, Ruijiang; Tang, Xiaoli; Dy, Jennifer G.; Jiang, Steve B.

    2009-03-01

    In lung cancer radiotherapy, radiation to a mobile target can be delivered by respiratory gating, for which we need to know whether the target is inside or outside a predefined gating window at any time point during the treatment. This can be achieved by tracking one or more fiducial markers implanted inside or near the target, either fluoroscopically or electromagnetically. However, the clinical implementation of marker tracking is limited for lung cancer radiotherapy mainly due to the risk of pneumothorax. Therefore, gating without implanted fiducial markers is a promising clinical direction. We have developed several template-matching methods for fluoroscopic marker-less gating. Recently, we have modeled the gating problem as a binary pattern classification problem, in which principal component analysis (PCA) and support vector machine (SVM) are combined to perform the classification task. Following the same framework, we investigated different combinations of dimensionality reduction techniques (PCA and four nonlinear manifold learning methods) and two machine learning classification methods (artificial neural networks—ANN and SVM). Performance was evaluated on ten fluoroscopic image sequences of nine lung cancer patients. We found that among all combinations of dimensionality reduction techniques and classification methods, PCA combined with either ANN or SVM achieved a better performance than the other nonlinear manifold learning methods. ANN when combined with PCA achieves a better performance than SVM in terms of classification accuracy and recall rate, although the target coverage is similar for the two classification methods. Furthermore, the running time for both ANN and SVM with PCA is within tolerance for real-time applications. Overall, ANN combined with PCA is a better candidate than other combinations we investigated in this work for real-time gated radiotherapy.

  9. Classification of sodium MRI data of cartilage using machine learning.

    PubMed

    Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R

    2015-11-01

    To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.

  10. Improved Saturated Hydraulic Conductivity Pedotransfer Functions Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Araya, S. N.; Ghezzehei, T. A.

    2017-12-01

    Saturated hydraulic conductivity (Ks) is one of the fundamental hydraulic properties of soils. Its measurement, however, is cumbersome and instead pedotransfer functions (PTFs) are often used to estimate it. Despite a lot of progress over the years, generic PTFs that estimate hydraulic conductivity generally don't have a good performance. We develop significantly improved PTFs by applying state of the art machine learning techniques coupled with high-performance computing on a large database of over 20,000 soils—USKSAT and the Florida Soil Characterization databases. We compared the performance of four machine learning algorithms (k-nearest neighbors, gradient boosted model, support vector machine, and relevance vector machine) and evaluated the relative importance of several soil properties in explaining Ks. An attempt is also made to better account for soil structural properties; we evaluated the importance of variables derived from transformations of soil water retention characteristics and other soil properties. The gradient boosted models gave the best performance with root mean square errors less than 0.7 and mean errors in the order of 0.01 on a log scale of Ks [cm/h]. The effective particle size, D10, was found to be the single most important predictor. Other important predictors included percent clay, bulk density, organic carbon percent, coefficient of uniformity and values derived from water retention characteristics. Model performances were consistently better for Ks values greater than 10 cm/h. This study maximizes the extraction of information from a large database to develop generic machine learning based PTFs to estimate Ks. The study also evaluates the importance of various soil properties and their transformations in explaining Ks.

  11. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  12. Coupling machine learning with mechanistic models to study runoff production and river flow at the hillslope scale

    NASA Astrophysics Data System (ADS)

    Marçais, J.; Gupta, H. V.; De Dreuzy, J. R.; Troch, P. A. A.

    2016-12-01

    Geomorphological structure and geological heterogeneity of hillslopes are major controls on runoff responses. The diversity of hillslopes (morphological shapes and geological structures) on one hand, and the highly non linear runoff mechanism response on the other hand, make it difficult to transpose what has been learnt at one specific hillslope to another. Therefore, making reliable predictions on runoff appearance or river flow for a given hillslope is a challenge. Applying a classic model calibration (based on inverse problems technique) requires doing it for each specific hillslope and having some data available for calibration. When applied to thousands of cases it cannot always be promoted. Here we propose a novel modeling framework based on coupling process based models with data based approach. First we develop a mechanistic model, based on hillslope storage Boussinesq equations (Troch et al. 2003), able to model non linear runoff responses to rainfall at the hillslope scale. Second we set up a model database, representing thousands of non calibrated simulations. These simulations investigate different hillslope shapes (real ones obtained by analyzing 5m digital elevation model of Brittany and synthetic ones), different hillslope geological structures (i.e. different parametrizations) and different hydrologic forcing terms (i.e. different infiltration chronicles). Then, we use this model library to train a machine learning model on this physically based database. Machine learning model performance is then assessed by a classic validating phase (testing it on new hillslopes and comparing machine learning with mechanistic outputs). Finally we use this machine learning model to learn what are the hillslope properties controlling runoffs. This methodology will be further tested combining synthetic datasets with real ones.

  13. Geological applications of machine learning on hyperspectral remote sensing data

    NASA Astrophysics Data System (ADS)

    Tse, C. H.; Li, Yi-liang; Lam, Edmund Y.

    2015-02-01

    The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.

  14. Utilizing Machine Learning for Analysis of Tiara for Texas

    NASA Astrophysics Data System (ADS)

    van Slycke, Jacqueline; Christian, Greg, , Dr.

    2017-09-01

    The Tiara for Texas detector at Texas A&M University consists of a target chamber housing an array of silicon detectors and surrounded by four high purity germanium clovers that generate voltage pulses proportional to detected gamma ray energies. While some radiation is fully absorbed in one photopeak, others undergo Compton scattering between detectors. This process is thoroughly simulated in GEANT4. Machine learning with scikit-learn allows for the reconstruction of scattered photons to the original energy of the incident gamma ray. In a given simulation, a defined number of rays are emitted from the source. Each ray is marked as an event and its path is tracked. Scikit-learn uses the events' paths to train an algorithm, which recognizes which events should be summed to reconstruct the full gamma ray energy and additional events to test the algorithm. These predictions are not exact, but were analyzed to further understand any discrepancies and increase the effectiveness of the simulation. The results from this research project compare various machine learning techniques to determine which methods should be expanded on in the future. National Science Foundation Grant PHY-1659847 and United States Department of Energy Grant DE-FG02-93ER40773.

  15. Behavioral Modeling for Mental Health using Machine Learning Algorithms.

    PubMed

    Srividya, M; Mohanavalli, S; Bhalaji, N

    2018-04-03

    Mental health is an indicator of emotional, psychological and social well-being of an individual. It determines how an individual thinks, feels and handle situations. Positive mental health helps one to work productively and realize their full potential. Mental health is important at every stage of life, from childhood and adolescence through adulthood. Many factors contribute to mental health problems which lead to mental illness like stress, social anxiety, depression, obsessive compulsive disorder, drug addiction, and personality disorders. It is becoming increasingly important to determine the onset of the mental illness to maintain proper life balance. The nature of machine learning algorithms and Artificial Intelligence (AI) can be fully harnessed for predicting the onset of mental illness. Such applications when implemented in real time will benefit the society by serving as a monitoring tool for individuals with deviant behavior. This research work proposes to apply various machine learning algorithms such as support vector machines, decision trees, naïve bayes classifier, K-nearest neighbor classifier and logistic regression to identify state of mental health in a target group. The responses obtained from the target group for the designed questionnaire were first subject to unsupervised learning techniques. The labels obtained as a result of clustering were validated by computing the Mean Opinion Score. These cluster labels were then used to build classifiers to predict the mental health of an individual. Population from various groups like high school students, college students and working professionals were considered as target groups. The research presents an analysis of applying the aforementioned machine learning algorithms on the target groups and also suggests directions for future work.

  16. Machine-Learned Data Structures of Lipid Marker Serum Concentrations in Multiple Sclerosis Patients Differ from Those in Healthy Subjects

    PubMed Central

    Lötsch, Jörn; Thrun, Michael; Lerch, Florian; Brunkhorst, Robert; Schiffmann, Susanne; Thomas, Dominique; Tegder, Irmgard; Geisslinger, Gerd; Ultsch, Alfred

    2017-01-01

    Lipid signaling has been suggested to be a major pathophysiological mechanism of multiple sclerosis (MS). With the increasing knowledge about lipid signaling, acquired data become increasingly complex making bioinformatics necessary in lipid research. We used unsupervised machine-learning to analyze lipid marker serum concentrations, pursuing the hypothesis that for the most relevant markers the emerging data structures will coincide with the diagnosis of MS. Machine learning was implemented as emergent self-organizing feature maps (ESOM) combined with the U*-matrix visualization technique. The data space consisted of serum concentrations of three main classes of lipid markers comprising eicosanoids (d = 11 markers), ceramides (d = 10), and lyosophosphatidic acids (d = 6). They were analyzed in cohorts of MS patients (n = 102) and healthy subjects (n = 301). Clear data structures in the high-dimensional data space were observed in eicosanoid and ceramides serum concentrations whereas no clear structure could be found in lysophosphatidic acid concentrations. With ceramide concentrations, the structures that had emerged from unsupervised machine-learning almost completely overlapped with the known grouping of MS patients versus healthy subjects. This was only partly provided by eicosanoid serum concentrations. Thus, unsupervised machine-learning identified distinct data structures of bioactive lipid serum concentrations. These structures could be superimposed with the known grouping of MS patients versus healthy subjects, which was almost completely possible with ceramides. Therefore, based on the present analysis, ceramides are first-line candidates for further exploration as drug-gable targets or biomarkers in MS. PMID:28590455

  17. Improved detection of chemical substances from colorimetric sensor data using probabilistic machine learning

    NASA Astrophysics Data System (ADS)

    Mølgaard, Lasse L.; Buus, Ole T.; Larsen, Jan; Babamoradi, Hamid; Thygesen, Ida L.; Laustsen, Milan; Munk, Jens Kristian; Dossi, Eleftheria; O'Keeffe, Caroline; Lässig, Lina; Tatlow, Sol; Sandström, Lars; Jakobsen, Mogens H.

    2017-05-01

    We present a data-driven machine learning approach to detect drug- and explosives-precursors using colorimetric sensor technology for air-sampling. The sensing technology has been developed in the context of the CRIM-TRACK project. At present a fully- integrated portable prototype for air sampling with disposable sensing chips and automated data acquisition has been developed. The prototype allows for fast, user-friendly sampling, which has made it possible to produce large datasets of colorimetric data for different target analytes in laboratory and simulated real-world application scenarios. To make use of the highly multi-variate data produced from the colorimetric chip a number of machine learning techniques are employed to provide reliable classification of target analytes from confounders found in the air streams. We demonstrate that a data-driven machine learning method using dimensionality reduction in combination with a probabilistic classifier makes it possible to produce informative features and a high detection rate of analytes. Furthermore, the probabilistic machine learning approach provides a means of automatically identifying unreliable measurements that could produce false predictions. The robustness of the colorimetric sensor has been evaluated in a series of experiments focusing on the amphetamine pre-cursor phenylacetone as well as the improvised explosives pre-cursor hydrogen peroxide. The analysis demonstrates that the system is able to detect analytes in clean air and mixed with substances that occur naturally in real-world sampling scenarios. The technology under development in CRIM-TRACK has the potential as an effective tool to control trafficking of illegal drugs, explosive detection, or in other law enforcement applications.

  18. Machine learning of big data in gaining insight into successful treatment of hypertension.

    PubMed

    Koren, Gideon; Nordon, Galia; Radinsky, Kira; Shalev, Varda

    2018-06-01

    Despite effective medications, rates of uncontrolled hypertension remain high. Treatment protocols are largely based on randomized trials and meta-analyses of these studies. The objective of this study was to test the utility of machine learning of big data in gaining insight into the treatment of hypertension. We applied machine learning techniques such as decision trees and neural networks, to identify determinants that contribute to the success of hypertension drug treatment on a large set of patients. We also identified concomitant drugs not considered to have antihypertensive activity, which may contribute to lowering blood pressure (BP) control. Higher initial BP predicts lower success rates. Among the medication options and their combinations, treatment with beta blockers appears to be more commonly effective, which is not reflected in contemporary guidelines. Among numerous concomitant drugs taken by hypertensive patients, proton pump inhibitors (PPIs), and HMG CO-A reductase inhibitors (statins) significantly improved the success rate of hypertension. In conclusions, machine learning of big data is a novel method to identify effective antihypertensive therapy and for repurposing medications already on the market for new indications. Our results related to beta blockers, stemming from machine learning of a large and diverse set of big data, in contrast to the much narrower criteria for randomized clinic trials (RCTs), should be corroborated and affirmed by other methods, as they hold potential promise for an old class of drugs which may be presently underutilized. These previously unrecognized effects of PPIs and statins have been very recently identified as effective in lowering BP in preliminary clinical observations, lending credibility to our big data results.

  19. Automated Data Assimilation and Flight Planning for Multi-Platform Observation Missions

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj; Morris, Robert A.; Strawa, Anthony; Kurklu, Elif; Keely, Leslie

    2008-01-01

    This is a progress report on an effort in which our goal is to demonstrate the effectiveness of automated data mining and planning for the daily management of Earth Science missions. Currently, data mining and machine learning technologies are being used by scientists at research labs for validating Earth science models. However, few if any of these advanced techniques are currently being integrated into daily mission operations. Consequently, there are significant gaps in the knowledge that can be derived from the models and data that are used each day for guiding mission activities. The result can be sub-optimal observation plans, lack of useful data, and wasteful use of resources. Recent advances in data mining, machine learning, and planning make it feasible to migrate these technologies into the daily mission planning cycle. We describe the design of a closed loop system for data acquisition, processing, and flight planning that integrates the results of machine learning into the flight planning process.

  20. Combining Machine Learning Systems and Multiple Docking Simulation Packages to Improve Docking Prediction Reliability for Network Pharmacology

    PubMed Central

    Hsin, Kun-Yi; Ghosh, Samik; Kitano, Hiroaki

    2013-01-01

    Increased availability of bioinformatics resources is creating opportunities for the application of network pharmacology to predict drug effects and toxicity resulting from multi-target interactions. Here we present a high-precision computational prediction approach that combines two elaborately built machine learning systems and multiple molecular docking tools to assess binding potentials of a test compound against proteins involved in a complex molecular network. One of the two machine learning systems is a re-scoring function to evaluate binding modes generated by docking tools. The second is a binding mode selection function to identify the most predictive binding mode. Results from a series of benchmark validations and a case study show that this approach surpasses the prediction reliability of other techniques and that it also identifies either primary or off-targets of kinase inhibitors. Integrating this approach with molecular network maps makes it possible to address drug safety issues by comprehensively investigating network-dependent effects of a drug or drug candidate. PMID:24391846

  1. Development of machine learning models to predict inhibition of 3-dehydroquinate dehydratase.

    PubMed

    de Ávila, Maurício Boff; de Azevedo, Walter Filgueira

    2018-04-20

    In this study, we describe the development of new machine learning models to predict inhibition of the enzyme 3-dehydroquinate dehydratase (DHQD). This enzyme is the third step of the shikimate pathway and is responsible for the synthesis of chorismate, which is a natural precursor of aromatic amino acids. The enzymes of shikimate pathway are absent in humans, which make them protein targets for the design of antimicrobial drugs. We focus our study on the crystallographic structures of DHQD in complex with competitive inhibitors, for which experimental inhibition constant data is available. Application of supervised machine learning techniques was able to elaborate a robust DHQD-targeted model to predict binding affinity. Combination of high-resolution crystallographic structures and binding information indicates that the prevalence of intermolecular electrostatic interactions between DHQD and competitive inhibitors is of pivotal importance for the binding affinity against this enzyme. The present findings can be used to speed up virtual screening studies focused on the DHQD structure. © 2018 John Wiley & Sons A/S.

  2. Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy.

    PubMed

    Asadi, Hamed; Dowling, Richard; Yan, Bernard; Mitchell, Peter

    2014-01-01

    Stroke is a major cause of death and disability. Accurately predicting stroke outcome from a set of predictive variables may identify high-risk patients and guide treatment approaches, leading to decreased morbidity. Logistic regression models allow for the identification and validation of predictive variables. However, advanced machine learning algorithms offer an alternative, in particular, for large-scale multi-institutional data, with the advantage of easily incorporating newly available data to improve prediction performance. Our aim was to design and compare different machine learning methods, capable of predicting the outcome of endovascular intervention in acute anterior circulation ischaemic stroke. We conducted a retrospective study of a prospectively collected database of acute ischaemic stroke treated by endovascular intervention. Using SPSS®, MATLAB®, and Rapidminer®, classical statistics as well as artificial neural network and support vector algorithms were applied to design a supervised machine capable of classifying these predictors into potential good and poor outcomes. These algorithms were trained, validated and tested using randomly divided data. We included 107 consecutive acute anterior circulation ischaemic stroke patients treated by endovascular technique. Sixty-six were male and the mean age of 65.3. All the available demographic, procedural and clinical factors were included into the models. The final confusion matrix of the neural network, demonstrated an overall congruency of ∼ 80% between the target and output classes, with favourable receiving operative characteristics. However, after optimisation, the support vector machine had a relatively better performance, with a root mean squared error of 2.064 (SD: ± 0.408). We showed promising accuracy of outcome prediction, using supervised machine learning algorithms, with potential for incorporation of larger multicenter datasets, likely further improving prediction. Finally, we propose that a robust machine learning system can potentially optimise the selection process for endovascular versus medical treatment in the management of acute stroke.

  3. Impact of pixel-based machine-learning techniques on automated frameworks for delineation of gross tumor volume regions for stereotactic body radiation therapy.

    PubMed

    Kawata, Yasuo; Arimura, Hidetaka; Ikushima, Koujirou; Jin, Ze; Morita, Kento; Tokunaga, Chiaki; Yabu-Uchi, Hidetake; Shioyama, Yoshiyuki; Sasaki, Tomonari; Honda, Hiroshi; Sasaki, Masayuki

    2017-10-01

    The aim of this study was to investigate the impact of pixel-based machine learning (ML) techniques, i.e., fuzzy-c-means clustering method (FCM), and the artificial neural network (ANN) and support vector machine (SVM), on an automated framework for delineation of gross tumor volume (GTV) regions of lung cancer for stereotactic body radiation therapy. The morphological and metabolic features for GTV regions, which were determined based on the knowledge of radiation oncologists, were fed on a pixel-by-pixel basis into the respective FCM, ANN, and SVM ML techniques. Then, the ML techniques were incorporated into the automated delineation framework of GTVs followed by an optimum contour selection (OCS) method, which we proposed in a previous study. The three-ML-based frameworks were evaluated for 16 lung cancer cases (six solid, four ground glass opacity (GGO), six part-solid GGO) with the datasets of planning computed tomography (CT) and 18 F-fluorodeoxyglucose (FDG) positron emission tomography (PET)/CT images using the three-dimensional Dice similarity coefficient (DSC). DSC denotes the degree of region similarity between the GTVs contoured by radiation oncologists and those estimated using the automated framework. The FCM-based framework achieved the highest DSCs of 0.79±0.06, whereas DSCs of the ANN-based and SVM-based frameworks were 0.76±0.14 and 0.73±0.14, respectively. The FCM-based framework provided the highest segmentation accuracy and precision without a learning process (lowest calculation cost). Therefore, the FCM-based framework can be useful for delineation of tumor regions in practical treatment planning. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  4. Outcomes and Complications After Endovascular Treatment of Brain Arteriovenous Malformations: A Prognostication Attempt Using Artificial Intelligence.

    PubMed

    Asadi, Hamed; Kok, Hong Kuan; Looby, Seamus; Brennan, Paul; O'Hare, Alan; Thornton, John

    2016-12-01

    To identify factors influencing outcome in brain arteriovenous malformations (BAVM) treated with endovascular embolization. We also assessed the feasibility of using machine learning techniques to prognosticate and predict outcome and compared this to conventional statistical analyses. A retrospective study of patients undergoing endovascular treatment of BAVM during a 22-year period in a national neuroscience center was performed. Clinical presentation, imaging, procedural details, complications, and outcome were recorded. The data was analyzed with artificial intelligence techniques to identify predictors of outcome and assess accuracy in predicting clinical outcome at final follow-up. One-hundred ninety-nine patients underwent treatment for BAVM with a mean follow-up duration of 63 months. The commonest clinical presentation was intracranial hemorrhage (56%). During the follow-up period, there were 51 further hemorrhagic events, comprising spontaneous hemorrhage (n = 27) and procedural related hemorrhage (n = 24). All spontaneous events occurred in previously embolized BAVMs remote from the procedure. Complications included ischemic stroke in 10%, symptomatic hemorrhage in 9.8%, and mortality rate of 4.7%. Standard regression analysis model had an accuracy of 43% in predicting final outcome (mortality), with the type of treatment complication identified as the most important predictor. The machine learning model showed superior accuracy of 97.5% in predicting outcome and identified the presence or absence of nidal fistulae as the most important factor. BAVMs can be treated successfully by endovascular techniques or combined with surgery and radiosurgery with an acceptable risk profile. Machine learning techniques can predict final outcome with greater accuracy and may help individualize treatment based on key predicting factors. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Using machine-learning methods to analyze economic loss function of quality management processes

    NASA Astrophysics Data System (ADS)

    Dzedik, V. A.; Lontsikh, P. A.

    2018-05-01

    During analysis of quality management systems, their economic component is often analyzed insufficiently. To overcome this issue, it is necessary to withdraw the concept of economic loss functions from tolerance thinking and address it. Input data about economic losses in processes have a complex form, thus, using standard tools to solve this problem is complicated. Use of machine learning techniques allows one to obtain precise models of the economic loss function based on even the most complex input data. Results of such analysis contain data about the true efficiency of a process and can be used to make investment decisions.

  6. Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

    PubMed

    Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

    2017-01-01

    Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  7. Amp: A modular approach to machine learning in atomistic simulations

    NASA Astrophysics Data System (ADS)

    Khorshidi, Alireza; Peterson, Andrew A.

    2016-10-01

    Electronic structure calculations, such as those employing Kohn-Sham density functional theory or ab initio wavefunction theories, have allowed for atomistic-level understandings of a wide variety of phenomena and properties of matter at small scales. However, the computational cost of electronic structure methods drastically increases with length and time scales, which makes these methods difficult for long time-scale molecular dynamics simulations or large-sized systems. Machine-learning techniques can provide accurate potentials that can match the quality of electronic structure calculations, provided sufficient training data. These potentials can then be used to rapidly simulate large and long time-scale phenomena at similar quality to the parent electronic structure approach. Machine-learning potentials usually take a bias-free mathematical form and can be readily developed for a wide variety of systems. Electronic structure calculations have favorable properties-namely that they are noiseless and targeted training data can be produced on-demand-that make them particularly well-suited for machine learning. This paper discusses our modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package (Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems. Potentials developed through the atom-centered approach are simultaneously applicable for systems with various sizes. Interpolation can be enhanced by introducing custom descriptors of the local environment. We demonstrate this in the current work for Gaussian-type, bispectrum, and Zernike-type descriptors. Amp has an intuitive and modular structure with an interface through the python scripting language yet has parallelizable fortran components for demanding tasks; it is designed to integrate closely with the widely used Atomic Simulation Environment (ASE), which makes it compatible with a wide variety of commercial and open-source electronic structure codes. We finally demonstrate that the neural network model inside Amp can accurately interpolate electronic structure energies as well as forces of thousands of multi-species atomic systems.

  8. Time-Frequency Learning Machines for Nonstationarity Detection Using Surrogates

    NASA Astrophysics Data System (ADS)

    Borgnat, Pierre; Flandrin, Patrick; Richard, Cédric; Ferrari, André; Amoud, Hassan; Honeine, Paul

    2012-03-01

    Time-frequency representations provide a powerful tool for nonstationary signal analysis and classification, supporting a wide range of applications [12]. As opposed to conventional Fourier analysis, these techniques reveal the evolution in time of the spectral content of signals. In Ref. [7,38], time-frequency analysis is used to test stationarity of any signal. The proposed method consists of a comparison between global and local time-frequency features. The originality is to make use of a family of stationary surrogate signals for defining the null hypothesis of stationarity and, based upon this information, to derive statistical tests. An open question remains, however, about how to choose relevant time-frequency features. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. These learning machines have gained popularity due to their conceptual simplicity and their outstanding performance [30]. Initiated by Vapnik’s support vector machines (SVM) [35], they offer now a wide class of supervised and unsupervised learning algorithms. In Ref. [17-19], the authors have shown how the most effective and innovative learning machines can be tuned to operate in the time-frequency domain. This chapter follows this line of research by taking advantage of learning machines to test and quantify stationarity. Based on one-class SVM, our approach uses the entire time-frequency representation and does not require arbitrary feature extraction. Applied to a set of surrogates, it provides the domain boundary that includes most of these stationarized signals. This allows us to test the stationarity of the signal under investigation. This chapter is organized as follows. In Section 22.2, we introduce the surrogate data method to generate stationarized signals, namely, the null hypothesis of stationarity. The concept of time-frequency learning machines is presented in Section 22.3, and applied to one-class SVM in order to derive a stationarity test in Section 22.4. The relevance of the latter is illustrated by simulation results in Section 22.5.

  9. Machine Learning to Improve Energy Expenditure Estimation in Children With Disabilities: A Pilot Study in Duchenne Muscular Dystrophy.

    PubMed

    Pande, Amit; Mohapatra, Prasant; Nicorici, Alina; Han, Jay J

    2016-07-19

    Children with physical impairments are at a greater risk for obesity and decreased physical activity. A better understanding of physical activity pattern and energy expenditure (EE) would lead to a more targeted approach to intervention. This study focuses on studying the use of machine-learning algorithms for EE estimation in children with disabilities. A pilot study was conducted on children with Duchenne muscular dystrophy (DMD) to identify important factors for determining EE and develop a novel algorithm to accurately estimate EE from wearable sensor-collected data. There were 7 boys with DMD, 6 healthy control boys, and 22 control adults recruited. Data were collected using smartphone accelerometer and chest-worn heart rate sensors. The gold standard EE values were obtained from the COSMED K4b2 portable cardiopulmonary metabolic unit worn by boys (aged 6-10 years) with DMD and controls. Data from this sensor setup were collected simultaneously during a series of concurrent activities. Linear regression and nonlinear machine-learning-based approaches were used to analyze the relationship between accelerometer and heart rate readings and COSMED values. Existing calorimetry equations using linear regression and nonlinear machine-learning-based models, developed for healthy adults and young children, give low correlation to actual EE values in children with disabilities (14%-40%). The proposed model for boys with DMD uses ensemble machine learning techniques and gives a 91% correlation with actual measured EE values (root mean square error of 0.017). Our results confirm that the methods developed to determine EE using accelerometer and heart rate sensor values in normal adults are not appropriate for children with disabilities and should not be used. A much more accurate model is obtained using machine-learning-based nonlinear regression specifically developed for this target population. ©Amit Pande, Prasant Mohapatra, Alina Nicorici, Jay J Han. Originally published in JMIR Rehabilitation and Assistive Technology (http://rehab.jmir.org), 19.07.2016.

  10. THE MILKY WAY PROJECT: LEVERAGING CITIZEN SCIENCE AND MACHINE LEARNING TO DETECT INTERSTELLAR BUBBLES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beaumont, Christopher N.; Williams, Jonathan P.; Goodman, Alyssa A.

    We present Brut, an algorithm to identify bubbles in infrared images of the Galactic midplane. Brut is based on the Random Forest algorithm, and uses bubbles identified by >35,000 citizen scientists from the Milky Way Project to discover the identifying characteristics of bubbles in images from the Spitzer Space Telescope. We demonstrate that Brut's ability to identify bubbles is comparable to expert astronomers. We use Brut to re-assess the bubbles in the Milky Way Project catalog, and find that 10%-30% of the objects in this catalog are non-bubble interlopers. Relative to these interlopers, high-reliability bubbles are more confined to themore » mid-plane, and display a stronger excess of young stellar objects along and within bubble rims. Furthermore, Brut is able to discover bubbles missed by previous searches—particularly bubbles near bright sources which have low contrast relative to their surroundings. Brut demonstrates the synergies that exist between citizen scientists, professional scientists, and machine learning techniques. In cases where ''untrained' citizens can identify patterns that machines cannot detect without training, machine learning algorithms like Brut can use the output of citizen science projects as input training sets, offering tremendous opportunities to speed the pace of scientific discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.« less

  11. Jet-images — deep learning edition

    DOE PAGES

    de Oliveira, Luke; Kagan, Michael; Mackey, Lester; ...

    2016-07-13

    Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons. Modern deep learning algorithms trained on jet images can out-perform standard physically-motivated feature driven approaches to jet tagging. We develop techniques for visualizing how these features are learned by the network and what additional information is used to improve performance. Finally, this interplay between physically-motivated feature driven tools and supervised learning algorithms is generalmore » and can be used to significantly increase the sensitivity to discover new particles and new forces, and gain a deeper understanding of the physics within jets.« less

  12. Jet-images — deep learning edition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    de Oliveira, Luke; Kagan, Michael; Mackey, Lester

    Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons. Modern deep learning algorithms trained on jet images can out-perform standard physically-motivated feature driven approaches to jet tagging. We develop techniques for visualizing how these features are learned by the network and what additional information is used to improve performance. Finally, this interplay between physically-motivated feature driven tools and supervised learning algorithms is generalmore » and can be used to significantly increase the sensitivity to discover new particles and new forces, and gain a deeper understanding of the physics within jets.« less

  13. Toward Intelligent Machine Learning Algorithms

    DTIC Science & Technology

    1988-05-01

    Machine learning is recognized as a tool for improving the performance of many kinds of systems, yet most machine learning systems themselves are not...directed systems, and with the addition of a knowledge store for organizing and maintaining knowledge to assist learning, a learning machine learning (L...ML) algorithm is possible. The necessary components of L-ML systems are presented along with several case descriptions of existing machine learning systems

  14. Web Mining: Machine Learning for Web Applications.

    ERIC Educational Resources Information Center

    Chen, Hsinchun; Chau, Michael

    2004-01-01

    Presents an overview of machine learning research and reviews methods used for evaluating machine learning systems. Ways that machine-learning algorithms were used in traditional information retrieval systems in the "pre-Web" era are described, and the field of Web mining and how machine learning has been used in different Web mining…

  15. Machine Learning and Computer Vision System for Phenotype Data Acquisition and Analysis in Plants.

    PubMed

    Navarro, Pedro J; Pérez, Fernando; Weiss, Julia; Egea-Cortines, Marcos

    2016-05-05

    Phenomics is a technology-driven approach with promising future to obtain unbiased data of biological systems. Image acquisition is relatively simple. However data handling and analysis are not as developed compared to the sampling capacities. We present a system based on machine learning (ML) algorithms and computer vision intended to solve the automatic phenotype data analysis in plant material. We developed a growth-chamber able to accommodate species of various sizes. Night image acquisition requires near infrared lightning. For the ML process, we tested three different algorithms: k-nearest neighbour (kNN), Naive Bayes Classifier (NBC), and Support Vector Machine. Each ML algorithm was executed with different kernel functions and they were trained with raw data and two types of data normalisation. Different metrics were computed to determine the optimal configuration of the machine learning algorithms. We obtained a performance of 99.31% in kNN for RGB images and a 99.34% in SVM for NIR. Our results show that ML techniques can speed up phenomic data analysis. Furthermore, both RGB and NIR images can be segmented successfully but may require different ML algorithms for segmentation.

  16. Fall classification by machine learning using mobile phones.

    PubMed

    Albert, Mark V; Kording, Konrad; Herrmann, Megan; Jayaraman, Arun

    2012-01-01

    Fall prevention is a critical component of health care; falls are a common source of injury in the elderly and are associated with significant levels of mortality and morbidity. Automatically detecting falls can allow rapid response to potential emergencies; in addition, knowing the cause or manner of a fall can be beneficial for prevention studies or a more tailored emergency response. The purpose of this study is to demonstrate techniques to not only reliably detect a fall but also to automatically classify the type. We asked 15 subjects to simulate four different types of falls-left and right lateral, forward trips, and backward slips-while wearing mobile phones and previously validated, dedicated accelerometers. Nine subjects also wore the devices for ten days, to provide data for comparison with the simulated falls. We applied five machine learning classifiers to a large time-series feature set to detect falls. Support vector machines and regularized logistic regression were able to identify a fall with 98% accuracy and classify the type of fall with 99% accuracy. This work demonstrates how current machine learning approaches can simplify data collection for prevention in fall-related research as well as improve rapid response to potential injuries due to falls.

  17. Data Mining and Machine Learning in Astronomy

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Brunner, Robert J.

    We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

  18. Fast machine-learning online optimization of ultra-cold-atom experiments.

    PubMed

    Wigley, P B; Everitt, P J; van den Hengel, A; Bastian, J W; Sooriyabandara, M A; McDonald, G D; Hardman, K S; Quinlivan, C D; Manju, P; Kuhn, C C N; Petersen, I R; Luiten, A N; Hope, J J; Robins, N P; Hush, M R

    2016-05-16

    We apply an online optimization process based on machine learning to the production of Bose-Einstein condensates (BEC). BEC is typically created with an exponential evaporation ramp that is optimal for ergodic dynamics with two-body s-wave interactions and no other loss rates, but likely sub-optimal for real experiments. Through repeated machine-controlled scientific experimentation and observations our 'learner' discovers an optimal evaporation ramp for BEC production. In contrast to previous work, our learner uses a Gaussian process to develop a statistical model of the relationship between the parameters it controls and the quality of the BEC produced. We demonstrate that the Gaussian process machine learner is able to discover a ramp that produces high quality BECs in 10 times fewer iterations than a previously used online optimization technique. Furthermore, we show the internal model developed can be used to determine which parameters are essential in BEC creation and which are unimportant, providing insight into the optimization process of the system.

  19. Fast machine-learning online optimization of ultra-cold-atom experiments

    PubMed Central

    Wigley, P. B.; Everitt, P. J.; van den Hengel, A.; Bastian, J. W.; Sooriyabandara, M. A.; McDonald, G. D.; Hardman, K. S.; Quinlivan, C. D.; Manju, P.; Kuhn, C. C. N.; Petersen, I. R.; Luiten, A. N.; Hope, J. J.; Robins, N. P.; Hush, M. R.

    2016-01-01

    We apply an online optimization process based on machine learning to the production of Bose-Einstein condensates (BEC). BEC is typically created with an exponential evaporation ramp that is optimal for ergodic dynamics with two-body s-wave interactions and no other loss rates, but likely sub-optimal for real experiments. Through repeated machine-controlled scientific experimentation and observations our ‘learner’ discovers an optimal evaporation ramp for BEC production. In contrast to previous work, our learner uses a Gaussian process to develop a statistical model of the relationship between the parameters it controls and the quality of the BEC produced. We demonstrate that the Gaussian process machine learner is able to discover a ramp that produces high quality BECs in 10 times fewer iterations than a previously used online optimization technique. Furthermore, we show the internal model developed can be used to determine which parameters are essential in BEC creation and which are unimportant, providing insight into the optimization process of the system. PMID:27180805

  20. Bridge Health Monitoring Using a Machine Learning Strategy

    DOT National Transportation Integrated Search

    2017-01-01

    The goal of this project was to cast the SHM problem within a statistical pattern recognition framework. Techniques borrowed from speaker recognition, particularly speaker verification, were used as this discipline deals with problems very similar to...

  1. On the convergence of nanotechnology and Big Data analysis for computer-aided diagnosis.

    PubMed

    Rodrigues, Jose F; Paulovich, Fernando V; de Oliveira, Maria Cf; de Oliveira, Osvaldo N

    2016-04-01

    An overview is provided of the challenges involved in building computer-aided diagnosis systems capable of precise medical diagnostics based on integration and interpretation of data from different sources and formats. The availability of massive amounts of data and computational methods associated with the Big Data paradigm has brought hope that such systems may soon be available in routine clinical practices, which is not the case today. We focus on visual and machine learning analysis of medical data acquired with varied nanotech-based techniques and on methods for Big Data infrastructure. Because diagnosis is essentially a classification task, we address the machine learning techniques with supervised and unsupervised classification, making a critical assessment of the progress already made in the medical field and the prospects for the near future. We also advocate that successful computer-aided diagnosis requires a merge of methods and concepts from nanotechnology and Big Data analysis.

  2. Hybrid machine learning technique for forecasting Dhaka stock market timing decisions.

    PubMed

    Banik, Shipra; Khodadad Khan, A F M; Anwer, Mohammad

    2014-01-01

    Forecasting stock market has been a difficult job for applied researchers owing to nature of facts which is very noisy and time varying. However, this hypothesis has been featured by several empirical experiential studies and a number of researchers have efficiently applied machine learning techniques to forecast stock market. This paper studied stock prediction for the use of investors. It is always true that investors typically obtain loss because of uncertain investment purposes and unsighted assets. This paper proposes a rough set model, a neural network model, and a hybrid neural network and rough set model to find optimal buy and sell of a share on Dhaka stock exchange. Investigational findings demonstrate that our proposed hybrid model has higher precision than the single rough set model and the neural network model. We believe this paper findings will help stock investors to decide about optimal buy and/or sell time on Dhaka stock exchange.

  3. Artificial intelligence in healthcare: past, present and future.

    PubMed

    Jiang, Fei; Jiang, Yong; Zhi, Hui; Dong, Yi; Li, Hao; Ma, Sufeng; Wang, Yilong; Dong, Qiang; Shen, Haipeng; Wang, Yongjun

    2017-12-01

    Artificial intelligence (AI) aims to mimic human cognitive functions. It is bringing a paradigm shift to healthcare, powered by increasing availability of healthcare data and rapid progress of analytics techniques. We survey the current status of AI applications in healthcare and discuss its future. AI can be applied to various types of healthcare data (structured and unstructured). Popular AI techniques include machine learning methods for structured data, such as the classical support vector machine and neural network, and the modern deep learning, as well as natural language processing for unstructured data. Major disease areas that use AI tools include cancer, neurology and cardiology. We then review in more details the AI applications in stroke, in the three major areas of early detection and diagnosis, treatment, as well as outcome prediction and prognosis evaluation. We conclude with discussion about pioneer AI systems, such as IBM Watson, and hurdles for real-life deployment of AI.

  4. Hybrid Machine Learning Technique for Forecasting Dhaka Stock Market Timing Decisions

    PubMed Central

    Banik, Shipra; Khodadad Khan, A. F. M.; Anwer, Mohammad

    2014-01-01

    Forecasting stock market has been a difficult job for applied researchers owing to nature of facts which is very noisy and time varying. However, this hypothesis has been featured by several empirical experiential studies and a number of researchers have efficiently applied machine learning techniques to forecast stock market. This paper studied stock prediction for the use of investors. It is always true that investors typically obtain loss because of uncertain investment purposes and unsighted assets. This paper proposes a rough set model, a neural network model, and a hybrid neural network and rough set model to find optimal buy and sell of a share on Dhaka stock exchange. Investigational findings demonstrate that our proposed hybrid model has higher precision than the single rough set model and the neural network model. We believe this paper findings will help stock investors to decide about optimal buy and/or sell time on Dhaka stock exchange. PMID:24701205

  5. Using machine learning to accelerate sampling-based inversion

    NASA Astrophysics Data System (ADS)

    Valentine, A. P.; Sambridge, M.

    2017-12-01

    In most cases, a complete solution to a geophysical inverse problem (including robust understanding of the uncertainties associated with the result) requires a sampling-based approach. However, the computational burden is high, and proves intractable for many problems of interest. There is therefore considerable value in developing techniques that can accelerate sampling procedures.The main computational cost lies in evaluation of the forward operator (e.g. calculation of synthetic seismograms) for each candidate model. Modern machine learning techniques-such as Gaussian Processes-offer a route for constructing a computationally-cheap approximation to this calculation, which can replace the accurate solution during sampling. Importantly, the accuracy of the approximation can be refined as inversion proceeds, to ensure high-quality results.In this presentation, we describe and demonstrate this approach-which can be seen as an extension of popular current methods, such as the Neighbourhood Algorithm, and bridges the gap between prior- and posterior-sampling frameworks.

  6. Computational Analysis of Behavior.

    PubMed

    Egnor, S E Roian; Branson, Kristin

    2016-07-08

    In this review, we discuss the emerging field of computational behavioral analysis-the use of modern methods from computer science and engineering to quantitatively measure animal behavior. We discuss aspects of experiment design important to both obtaining biologically relevant behavioral data and enabling the use of machine vision and learning techniques for automation. These two goals are often in conflict. Restraining or restricting the environment of the animal can simplify automatic behavior quantification, but it can also degrade the quality or alter important aspects of behavior. To enable biologists to design experiments to obtain better behavioral measurements, and computer scientists to pinpoint fruitful directions for algorithm improvement, we review known effects of artificial manipulation of the animal on behavior. We also review machine vision and learning techniques for tracking, feature extraction, automated behavior classification, and automated behavior discovery, the assumptions they make, and the types of data they work best with.

  7. Predicting ozone profile shape from satellite UV spectra

    NASA Astrophysics Data System (ADS)

    Xu, Jian; Loyola, Diego; Romahn, Fabian; Doicu, Adrian

    2017-04-01

    Identifying ozone profile shape is a critical yet challenging job for the accurate reconstruction of vertical distributions of atmospheric ozone that is relevant to climate change and air quality. Motivated by the need to develop an approach to reliably and efficiently estimate vertical information of ozone and inspired by the success of machine learning techniques, this work proposes a new algorithm for deriving ozone profile shapes from ultraviolet (UV) absorption spectra that are recorded by satellite instruments, e.g. GOME series and the future Sentinel missions. The proposed algorithm formulates this particular inverse problem in a classification framework rather than a conventional inversion one and places an emphasis on effectively characterizing various profile shapes based on machine learning techniques. Furthermore, a comparison of the ozone profiles from real GOME-2 data estimated by our algorithm and the classical retrieval algorithm (Optimal Estimation Method) is performed.

  8. Artificial intelligence in healthcare: past, present and future

    PubMed Central

    Jiang, Fei; Jiang, Yong; Zhi, Hui; Dong, Yi; Li, Hao; Ma, Sufeng; Wang, Yilong; Dong, Qiang; Shen, Haipeng; Wang, Yongjun

    2017-01-01

    Artificial intelligence (AI) aims to mimic human cognitive functions. It is bringing a paradigm shift to healthcare, powered by increasing availability of healthcare data and rapid progress of analytics techniques. We survey the current status of AI applications in healthcare and discuss its future. AI can be applied to various types of healthcare data (structured and unstructured). Popular AI techniques include machine learning methods for structured data, such as the classical support vector machine and neural network, and the modern deep learning, as well as natural language processing for unstructured data. Major disease areas that use AI tools include cancer, neurology and cardiology. We then review in more details the AI applications in stroke, in the three major areas of early detection and diagnosis, treatment, as well as outcome prediction and prognosis evaluation. We conclude with discussion about pioneer AI systems, such as IBM Watson, and hurdles for real-life deployment of AI. PMID:29507784

  9. Reverse engineering highlights potential principles of large gene regulatory network design and learning.

    PubMed

    Carré, Clément; Mas, André; Krouk, Gabriel

    2017-01-01

    Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 10 4 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data ( Escherichia coli K14 network reconstruction using network and transcriptomic data), we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells.

  10. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less

  11. Application of machine learning for the evaluation of turfgrass plots using aerial images

    NASA Astrophysics Data System (ADS)

    Ding, Ke; Raheja, Amar; Bhandari, Subodh; Green, Robert L.

    2016-05-01

    Historically, investigation of turfgrass characteristics have been limited to visual ratings. Although relevant information may result from such evaluations, final inferences may be questionable because of the subjective nature in which the data is collected. Recent advances in computer vision techniques allow researchers to objectively measure turfgrass characteristics such as percent ground cover, turf color, and turf quality from the digital images. This paper focuses on developing a methodology for automated assessment of turfgrass quality from aerial images. Images of several turfgrass plots of varying quality were gathered using a camera mounted on an unmanned aerial vehicle. The quality of these plots were also evaluated based on visual ratings. The goal was to use the aerial images to generate quality evaluations on a regular basis for the optimization of water treatment. Aerial images are used to train a neural network so that appropriate features such as intensity, color, and texture of the turfgrass are extracted from these images. Neural network is a nonlinear classifier commonly used in machine learning. The output of the neural network trained model is the ratings of the grass, which is compared to the visual ratings. Currently, the quality and the color of turfgrass, measured as the greenness of the grass, are evaluated. The textures are calculated using the Gabor filter and co-occurrence matrix. Other classifiers such as support vector machines and simpler linear regression models such as Ridge regression and LARS regression are also used. The performance of each model is compared. The results show encouraging potential for using machine learning techniques for the evaluation of turfgrass quality and color.

  12. Machine Learning Topological Invariants with Neural Networks

    NASA Astrophysics Data System (ADS)

    Zhang, Pengfei; Shen, Huitao; Zhai, Hui

    2018-02-01

    In this Letter we supervisedly train neural networks to distinguish different topological phases in the context of topological band insulators. After training with Hamiltonians of one-dimensional insulators with chiral symmetry, the neural network can predict their topological winding numbers with nearly 100% accuracy, even for Hamiltonians with larger winding numbers that are not included in the training data. These results show a remarkable success that the neural network can capture the global and nonlinear topological features of quantum phases from local inputs. By opening up the neural network, we confirm that the network does learn the discrete version of the winding number formula. We also make a couple of remarks regarding the role of the symmetry and the opposite effect of regularization techniques when applying machine learning to physical systems.

  13. Using Trained Pixel Classifiers to Select Images of Interest

    NASA Technical Reports Server (NTRS)

    Mazzoni, D.; Wagstaff, K.; Castano, R.

    2004-01-01

    We present a machine-learning-based approach to ranking images based on learned priorities. Unlike previous methods for image evaluation, which typically assess the value of each image based on the presence of predetermined specific features, this method involves using two levels of machine-learning classifiers: one level is used to classify each pixel as belonging to one of a group of rather generic classes, and another level is used to rank the images based on these pixel classifications, given some example rankings from a scientist as a guide. Initial results indicate that the technique works well, producing new rankings that match the scientist's rankings significantly better than would be expected by chance. The method is demonstrated for a set of images collected by a Mars field-test rover.

  14. Using Machine Learning to Advance Personality Assessment and Theory.

    PubMed

    Bleidorn, Wiebke; Hopwood, Christopher James

    2018-05-01

    Machine learning has led to important advances in society. One of the most exciting applications of machine learning in psychological science has been the development of assessment tools that can powerfully predict human behavior and personality traits. Thus far, machine learning approaches to personality assessment have focused on the associations between social media and other digital records with established personality measures. The goal of this article is to expand the potential of machine learning approaches to personality assessment by embedding it in a more comprehensive construct validation framework. We review recent applications of machine learning to personality assessment, place machine learning research in the broader context of fundamental principles of construct validation, and provide recommendations for how to use machine learning to advance our understanding of personality.

  15. Finding Waldo: Learning about Users from their Interactions.

    PubMed

    Brown, Eli T; Ottley, Alvitta; Zhao, Helen; Quan Lin; Souvenir, Richard; Endert, Alex; Chang, Remco

    2014-12-01

    Visual analytics is inherently a collaboration between human and computer. However, in current visual analytics systems, the computer has limited means of knowing about its users and their analysis processes. While existing research has shown that a user's interactions with a system reflect a large amount of the user's reasoning process, there has been limited advancement in developing automated, real-time techniques that mine interactions to learn about the user. In this paper, we demonstrate that we can accurately predict a user's task performance and infer some user personality traits by using machine learning techniques to analyze interaction data. Specifically, we conduct an experiment in which participants perform a visual search task, and apply well-known machine learning algorithms to three encodings of the users' interaction data. We achieve, depending on algorithm and encoding, between 62% and 83% accuracy at predicting whether each user will be fast or slow at completing the task. Beyond predicting performance, we demonstrate that using the same techniques, we can infer aspects of the user's personality factors, including locus of control, extraversion, and neuroticism. Further analyses show that strong results can be attained with limited observation time: in one case 95% of the final accuracy is gained after a quarter of the average task completion time. Overall, our findings show that interactions can provide information to the computer about its human collaborator, and establish a foundation for realizing mixed-initiative visual analytics systems.

  16. Automatic welding detection by an intelligent tool pipe inspection

    NASA Astrophysics Data System (ADS)

    Arizmendi, C. J.; Garcia, W. L.; Quintero, M. A.

    2015-07-01

    This work provide a model based on machine learning techniques in welds recognition, based on signals obtained through in-line inspection tool called “smart pig” in Oil and Gas pipelines. The model uses a signal noise reduction phase by means of pre-processing algorithms and attribute-selection techniques. The noise reduction techniques were selected after a literature review and testing with survey data. Subsequently, the model was trained using recognition and classification algorithms, specifically artificial neural networks and support vector machines. Finally, the trained model was validated with different data sets and the performance was measured with cross validation and ROC analysis. The results show that is possible to identify welding automatically with an efficiency between 90 and 98 percent.

  17. Leveraging knowledge engineering and machine learning for microbial bio-manufacturing.

    PubMed

    Oyetunde, Tolutola; Bao, Forrest Sheng; Chen, Jiung-Wen; Martin, Hector Garcia; Tang, Yinjie J

    2018-05-03

    Genome scale modeling (GSM) predicts the performance of microbial workhorses and helps identify beneficial gene targets. GSM integrated with intracellular flux dynamics, omics, and thermodynamics have shown remarkable progress in both elucidating complex cellular phenomena and computational strain design (CSD). Nonetheless, these models still show high uncertainty due to a poor understanding of innate pathway regulations, metabolic burdens, and other factors (such as stress tolerance and metabolite channeling). Besides, the engineered hosts may have genetic mutations or non-genetic variations in bioreactor conditions and thus CSD rarely foresees fermentation rate and titer. Metabolic models play important role in design-build-test-learn cycles for strain improvement, and machine learning (ML) may provide a viable complementary approach for driving strain design and deciphering cellular processes. In order to develop quality ML models, knowledge engineering leverages and standardizes the wealth of information in literature (e.g., genomic/phenomic data, synthetic biology strategies, and bioprocess variables). Data driven frameworks can offer new constraints for mechanistic models to describe cellular regulations, to design pathways, to search gene targets, and to estimate fermentation titer/rate/yield under specified growth conditions (e.g., mixing, nutrients, and O 2 ). This review highlights the scope of information collections, database constructions, and machine learning techniques (such as deep learning and transfer learning), which may facilitate "Learn and Design" for strain development. Copyright © 2018. Published by Elsevier Inc.

  18. Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach.

    PubMed

    Borghetti, Brett J; Giametta, Joseph J; Rusnock, Christina F

    2017-02-01

    We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

  19. Code Optimization and Parallelization on the Origins: Looking from Users' Perspective

    NASA Technical Reports Server (NTRS)

    Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)

    2002-01-01

    Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.

  20. In vivo quantification of plant starch reserves at micrometer resolution using X-ray microCT imaging and machine learning.

    PubMed

    Earles, J Mason; Knipfer, Thorsten; Tixier, Aude; Orozco, Jessica; Reyes, Clarissa; Zwieniecki, Maciej A; Brodersen, Craig R; McElrone, Andrew J

    2018-03-08

    Starch is the primary energy storage molecule used by most terrestrial plants to fuel respiration and growth during periods of limited to no photosynthesis, and its depletion can drive plant mortality. Destructive techniques at coarse spatial scales exist to quantify starch, but these techniques face methodological challenges that can lead to uncertainty about the lability of tissue-specific starch pools and their role in plant survival. Here, we demonstrate how X-ray microcomputed tomography (microCT) and a machine learning algorithm can be coupled to quantify plant starch content in vivo, repeatedly and nondestructively over time in grapevine stems (Vitis spp.). Starch content estimated for xylem axial and ray parenchyma cells from microCT images was correlated strongly with enzymatically measured bulk-tissue starch concentration on the same stems. After validating our machine learning algorithm, we then characterized the spatial distribution of starch concentration in living stems at micrometer resolution, and identified starch depletion in live plants under experimental conditions designed to halt photosynthesis and starch production, initiating the drawdown of stored starch pools. Using X-ray microCT technology for in vivo starch monitoring should enable novel research directed at resolving the spatial and temporal patterns of starch accumulation and depletion in woody plant species. No claim to original US Government works New Phytologist © 2018 New Phytologist Trust.

  1. In-lab versus at-home activity recognition in ambulatory subjects with incomplete spinal cord injury.

    PubMed

    Albert, Mark V; Azeze, Yohannes; Courtois, Michael; Jayaraman, Arun

    2017-02-06

    Although commercially available activity trackers can aid in tracking therapy and recovery of patients, most devices perform poorly for patients with irregular movement patterns. Standard machine learning techniques can be applied on recorded accelerometer signals in order to classify the activities of ambulatory subjects with incomplete spinal cord injury in a way that is specific to this population and the location of the recording-at home or in the clinic. Subjects were instructed to perform a standardized set of movements while wearing a waist-worn accelerometer in the clinic and at-home. Activities included lying, sitting, standing, walking, wheeling, and stair climbing. Multiple classifiers and validation methods were used to quantify the ability of the machine learning techniques to distinguish the activities recorded in-lab or at-home. In the lab, classifiers trained and tested using within-subject cross-validation provided an accuracy of 91.6%. When the classifier was trained on data collected in the lab but tested on at home data, the accuracy fell to 54.6% indicating distinct movement patterns between locations. However, the accuracy of the at-home classifications, when training the classifier with at-home data, improved to 85.9%. Individuals with unique movement patterns can benefit from using tailored activity recognition algorithms easily implemented using modern machine learning methods on collected movement data.

  2. Using Blood Indexes to Predict Overweight Statuses: An Extreme Learning Machine-Based Approach

    PubMed Central

    Chen, Huiling; Yang, Bo; Liu, Dayou; Liu, Wenbin; Liu, Yanlong; Zhang, Xiuhua; Hu, Lufeng

    2015-01-01

    The number of the overweight people continues to rise across the world. Studies have shown that being overweight can increase health risks, such as high blood pressure, diabetes mellitus, coronary heart disease, and certain forms of cancer. Therefore, identifying the overweight status in people is critical to prevent and decrease health risks. This study explores a new technique that uses blood and biochemical measurements to recognize the overweight condition. A new machine learning technique, an extreme learning machine, was developed to accurately detect the overweight status from a pool of 225 overweight and 251 healthy subjects. The group included 179 males and 297 females. The detection method was rigorously evaluated against the real-life dataset for accuracy, sensitivity, specificity, and AUC (area under the receiver operating characteristic (ROC) curve) criterion. Additionally, the feature selection was investigated to identify correlating factors for the overweight status. The results demonstrate that there are significant differences in blood and biochemical indexes between healthy and overweight people (p-value < 0.01). According to the feature selection, the most important correlated indexes are creatinine, hemoglobin, hematokrit, uric Acid, red blood cells, high density lipoprotein, alanine transaminase, triglyceride, and γ-glutamyl transpeptidase. These are consistent with the results of Spearman test analysis. The proposed method holds promise as a new, accurate method for identifying the overweight status in subjects. PMID:26600199

  3. PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants.

    PubMed

    Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G; Hardoim, Pablo R; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M T

    2017-03-04

    Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane ( Saccharum spp.) and in maize ( Zea mays ). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.

  4. PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants

    PubMed Central

    Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G.; Hardoim, Pablo R.; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M. T.

    2017-01-01

    Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms. PMID:29657283

  5. An Analysis of a Digital Variant of the Trail Making Test Using Machine Learning Techniques

    PubMed Central

    Dahmen, Jessamyn; Cook, Diane; Fellows, Robert; Schmitter-Edgecombe, Maureen

    2017-01-01

    BACKGROUND The goal of this work is to develop a digital version of a standard cognitive assessment, the Trail Making Test (TMT), and assess its utility. OBJECTIVE This paper introduces a novel digital version of the TMT and introduces a machine learning based approach to assess its capabilities. METHODS Using digital Trail Making Test (dTMT) data collected from (N=54) older adult participants as feature sets, we use machine learning techniques to analyze the utility of the dTMT and evaluate the insights provided by the digital features. RESULTS Predicted TMT scores correlate well with clinical digital test scores (r=0.98) and paper time to completion scores (r=0.65). Predicted TICS exhibited a small correlation with clinically-derived TICS scores (r=0.12 Part A, r=0.10 Part B). Predicted FAB scores exhibited a small correlation with clinically-derived FAB scores (r=0.13 Part A, r=0.29 for Part B). Digitally-derived features were also used to predict diagnosis (AUC of 0.65). CONCLUSION Our findings indicate that the dTMT is capable of measuring the same aspects of cognition as the paper-based TMT. Furthermore, the dTMT’s additional data may be able to help monitor other cognitive processes not captured by the paper-based TMT alone. PMID:27886019

  6. The identification of cis-regulatory elements: A review from a machine learning perspective.

    PubMed

    Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

    2015-12-01

    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.

  7. As above, so below? Towards understanding inverse models in BCI

    NASA Astrophysics Data System (ADS)

    Lindgren, Jussi T.

    2018-02-01

    Objective. In brain-computer interfaces (BCI), measurements of the user’s brain activity are classified into commands for the computer. With EEG-based BCIs, the origins of the classified phenomena are often considered to be spatially localized in the cortical volume and mixed in the EEG. We investigate if more accurate BCIs can be obtained by reconstructing the source activities in the volume. Approach. We contrast the physiology-driven source reconstruction with data-driven representations obtained by statistical machine learning. We explain these approaches in a common linear dictionary framework and review the different ways to obtain the dictionary parameters. We consider the effect of source reconstruction on some major difficulties in BCI classification, namely information loss, feature selection and nonstationarity of the EEG. Main results. Our analysis suggests that the approaches differ mainly in their parameter estimation. Physiological source reconstruction may thus be expected to improve BCI accuracy if machine learning is not used or where it produces less optimal parameters. We argue that the considered difficulties of surface EEG classification can remain in the reconstructed volume and that data-driven techniques are still necessary. Finally, we provide some suggestions for comparing approaches. Significance. The present work illustrates the relationships between source reconstruction and machine learning-based approaches for EEG data representation. The provided analysis and discussion should help in understanding, applying, comparing and improving such techniques in the future.

  8. Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis.

    PubMed

    Ambale-Venkatesh, Bharath; Yang, Xiaoying; Wu, Colin O; Liu, Kiang; Hundley, W Gregory; McClelland, Robyn; Gomes, Antoinette S; Folsom, Aaron R; Shea, Steven; Guallar, Eliseo; Bluemke, David A; Lima, João A C

    2017-10-13

    Machine learning may be useful to characterize cardiovascular risk, predict outcomes, and identify biomarkers in population studies. To test the ability of random survival forests, a machine learning technique, to predict 6 cardiovascular outcomes in comparison to standard cardiovascular risk scores. We included participants from the MESA (Multi-Ethnic Study of Atherosclerosis). Baseline measurements were used to predict cardiovascular outcomes over 12 years of follow-up. MESA was designed to study progression of subclinical disease to cardiovascular events where participants were initially free of cardiovascular disease. All 6814 participants from MESA, aged 45 to 84 years, from 4 ethnicities, and 6 centers across the United States were included. Seven-hundred thirty-five variables from imaging and noninvasive tests, questionnaires, and biomarker panels were obtained. We used the random survival forests technique to identify the top-20 predictors of each outcome. Imaging, electrocardiography, and serum biomarkers featured heavily on the top-20 lists as opposed to traditional cardiovascular risk factors. Age was the most important predictor for all-cause mortality. Fasting glucose levels and carotid ultrasonography measures were important predictors of stroke. Coronary Artery Calcium score was the most important predictor of coronary heart disease and all atherosclerotic cardiovascular disease combined outcomes. Left ventricular structure and function and cardiac troponin-T were among the top predictors for incident heart failure. Creatinine, age, and ankle-brachial index were among the top predictors of atrial fibrillation. TNF-α (tissue necrosis factor-α) and IL (interleukin)-2 soluble receptors and NT-proBNP (N-Terminal Pro-B-Type Natriuretic Peptide) levels were important across all outcomes. The random survival forests technique performed better than established risk scores with increased prediction accuracy (decreased Brier score by 10%-25%). Machine learning in conjunction with deep phenotyping improves prediction accuracy in cardiovascular event prediction in an initially asymptomatic population. These methods may lead to greater insights on subclinical disease markers without apriori assumptions of causality. URL: http://www.clinicaltrials.gov. Unique identifier: NCT00005487. © 2017 American Heart Association, Inc.

  9. Deep convolutional neural network for classifying Fusarium wilt of radish from unmanned aerial vehicles

    NASA Astrophysics Data System (ADS)

    Ha, Jin Gwan; Moon, Hyeonjoon; Kwak, Jin Tae; Hassan, Syed Ibrahim; Dang, Minh; Lee, O. New; Park, Han Yong

    2017-10-01

    Recently, unmanned aerial vehicles (UAVs) have gained much attention. In particular, there is a growing interest in utilizing UAVs for agricultural applications such as crop monitoring and management. We propose a computerized system that is capable of detecting Fusarium wilt of radish with high accuracy. The system adopts computer vision and machine learning techniques, including deep learning, to process the images captured by UAVs at low altitudes and to identify the infected radish. The whole radish field is first segmented into three distinctive regions (radish, bare ground, and mulching film) via a softmax classifier and K-means clustering. Then, the identified radish regions are further classified into healthy radish and Fusarium wilt of radish using a deep convolutional neural network (CNN). In identifying radish, bare ground, and mulching film from a radish field, we achieved an accuracy of ≥97.4%. In detecting Fusarium wilt of radish, the CNN obtained an accuracy of 93.3%. It also outperformed the standard machine learning algorithm, obtaining 82.9% accuracy. Therefore, UAVs equipped with computational techniques are promising tools for improving the quality and efficiency of agriculture today.

  10. Exploring prediction uncertainty of spatial data in geostatistical and machine learning Approaches

    NASA Astrophysics Data System (ADS)

    Klump, J. F.; Fouedjio, F.

    2017-12-01

    Geostatistical methods such as kriging with external drift as well as machine learning techniques such as quantile regression forest have been intensively used for modelling spatial data. In addition to providing predictions for target variables, both approaches are able to deliver a quantification of the uncertainty associated with the prediction at a target location. Geostatistical approaches are, by essence, adequate for providing such prediction uncertainties and their behaviour is well understood. However, they often require significant data pre-processing and rely on assumptions that are rarely met in practice. Machine learning algorithms such as random forest regression, on the other hand, require less data pre-processing and are non-parametric. This makes the application of machine learning algorithms to geostatistical problems an attractive proposition. The objective of this study is to compare kriging with external drift and quantile regression forest with respect to their ability to deliver reliable prediction uncertainties of spatial data. In our comparison we use both simulated and real world datasets. Apart from classical performance indicators, comparisons make use of accuracy plots, probability interval width plots, and the visual examinations of the uncertainty maps provided by the two approaches. By comparing random forest regression to kriging we found that both methods produced comparable maps of estimated values for our variables of interest. However, the measure of uncertainty provided by random forest seems to be quite different to the measure of uncertainty provided by kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. These preliminary results raise questions about assessing the risks associated with decisions based on the predictions from geostatistical and machine learning algorithms in a spatial context, e.g. mineral exploration.

  11. Machine-learning-assisted materials discovery using failed experiments

    NASA Astrophysics Data System (ADS)

    Raccuglia, Paul; Elbert, Katherine C.; Adler, Philip D. F.; Falk, Casey; Wenny, Malia B.; Mollo, Aurelio; Zeller, Matthias; Friedler, Sorelle A.; Schrier, Joshua; Norquist, Alexander J.

    2016-05-01

    Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.

  12. Comprehensive decision tree models in bioinformatics.

    PubMed

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

  13. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449

  14. Optimization of classification and regression analysis of four monoclonal antibodies from Raman spectra using collaborative machine learning approach.

    PubMed

    Le, Laetitia Minh Maï; Kégl, Balázs; Gramfort, Alexandre; Marini, Camille; Nguyen, David; Cherti, Mehdi; Tfaili, Sana; Tfayli, Ali; Baillet-Guffroy, Arlette; Prognon, Patrice; Chaminade, Pierre; Caudron, Eric

    2018-07-01

    The use of monoclonal antibodies (mAbs) constitutes one of the most important strategies to treat patients suffering from cancers such as hematological malignancies and solid tumors. These antibodies are prescribed by the physician and prepared by hospital pharmacists. An analytical control enables the quality of the preparations to be ensured. The aim of this study was to explore the development of a rapid analytical method for quality control. The method used four mAbs (Infliximab, Bevacizumab, Rituximab and Ramucirumab) at various concentrations and was based on recording Raman data and coupling them to a traditional chemometric and machine learning approach for data analysis. Compared to conventional linear approach, prediction errors are reduced with a data-driven approach using statistical machine learning methods. In the latter, preprocessing and predictive models are jointly optimized. An additional original aspect of the work involved on submitting the problem to a collaborative data challenge platform called Rapid Analytics and Model Prototyping (RAMP). This allowed using solutions from about 300 data scientists in collaborative work. Using machine learning, the prediction of the four mAbs samples was considerably improved. The best predictive model showed a combined error of 2.4% versus 14.6% using linear approach. The concentration and classification errors were 5.8% and 0.7%, only three spectra were misclassified over the 429 spectra of the test set. This large improvement obtained with machine learning techniques was uniform for all molecules but maximal for Bevacizumab with an 88.3% reduction on combined errors (2.1% versus 17.9%). Copyright © 2018 Elsevier B.V. All rights reserved.

  15. Integrated Multi-Scale Data Analytics and Machine Learning for the Distribution Grid and Building-to-Grid Interface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stewart, Emma M.; Hendrix, Val; Chertkov, Michael

    This white paper introduces the application of advanced data analytics to the modernized grid. In particular, we consider the field of machine learning and where it is both useful, and not useful, for the particular field of the distribution grid and buildings interface. While analytics, in general, is a growing field of interest, and often seen as the golden goose in the burgeoning distribution grid industry, its application is often limited by communications infrastructure, or lack of a focused technical application. Overall, the linkage of analytics to purposeful application in the grid space has been limited. In this paper wemore » consider the field of machine learning as a subset of analytical techniques, and discuss its ability and limitations to enable the future distribution grid and the building-to-grid interface. To that end, we also consider the potential for mixing distributed and centralized analytics and the pros and cons of these approaches. Machine learning is a subfield of computer science that studies and constructs algorithms that can learn from data and make predictions and improve forecasts. Incorporation of machine learning in grid monitoring and analysis tools may have the potential to solve data and operational challenges that result from increasing penetration of distributed and behind-the-meter energy resources. There is an exponentially expanding volume of measured data being generated on the distribution grid, which, with appropriate application of analytics, may be transformed into intelligible, actionable information that can be provided to the right actors – such as grid and building operators, at the appropriate time to enhance grid or building resilience, efficiency, and operations against various metrics or goals – such as total carbon reduction or other economic benefit to customers. While some basic analysis into these data streams can provide a wealth of information, computational and human boundaries on performing the analysis are becoming significant, with more data and multi-objective concerns. Efficient applications of analysis and the machine learning field are being considered in the loop.« less

  16. Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, Wenjian; Singh, Rajiv R. P.; Scalettar, Richard T.

    Here, we apply unsupervised machine learning techniques, mainly principal component analysis (PCA), to compare and contrast the phase behavior and phase transitions in several classical spin models - the square and triangular-lattice Ising models, the Blume-Capel model, a highly degenerate biquadratic-exchange spin-one Ising (BSI) model, and the 2D XY model, and examine critically what machine learning is teaching us. We find that quantified principal components from PCA not only allow exploration of different phases and symmetry-breaking, but can distinguish phase transition types and locate critical points. We show that the corresponding weight vectors have a clear physical interpretation, which ismore » particularly interesting in the frustrated models such as the triangular antiferromagnet, where they can point to incipient orders. Unlike the other well-studied models, the properties of the BSI model are less well known. Using both PCA and conventional Monte Carlo analysis, we demonstrate that the BSI model shows an absence of phase transition and macroscopic ground-state degeneracy. The failure to capture the 'charge' correlations (vorticity) in the BSI model (XY model) from raw spin configurations points to some of the limitations of PCA. Finally, we employ a nonlinear unsupervised machine learning procedure, the 'antoencoder method', and demonstrate that it too can be trained to capture phase transitions and critical points.« less

  17. Integrating multisensor satellite data merging and image reconstruction in support of machine learning for better water quality management.

    PubMed

    Chang, Ni-Bin; Bai, Kaixu; Chen, Chi-Farn

    2017-10-01

    Monitoring water quality changes in lakes, reservoirs, estuaries, and coastal waters is critical in response to the needs for sustainable development. This study develops a remote sensing-based multiscale modeling system by integrating multi-sensor satellite data merging and image reconstruction algorithms in support of feature extraction with machine learning leading to automate continuous water quality monitoring in environmentally sensitive regions. This new Earth observation platform, termed "cross-mission data merging and image reconstruction with machine learning" (CDMIM), is capable of merging multiple satellite imageries to provide daily water quality monitoring through a series of image processing, enhancement, reconstruction, and data mining/machine learning techniques. Two existing key algorithms, including Spectral Information Adaptation and Synthesis Scheme (SIASS) and SMart Information Reconstruction (SMIR), are highlighted to support feature extraction and content-based mapping. Whereas SIASS can support various data merging efforts to merge images collected from cross-mission satellite sensors, SMIR can overcome data gaps by reconstructing the information of value-missing pixels due to impacts such as cloud obstruction. Practical implementation of CDMIM was assessed by predicting the water quality over seasons in terms of the concentrations of nutrients and chlorophyll-a, as well as water clarity in Lake Nicaragua, providing synergistic efforts to better monitor the aquatic environment and offer insightful lake watershed management strategies. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy

    PubMed Central

    Mani, Subramani; Chen, Yukun; Li, Xia; Arlinghaus, Lori; Chakravarthy, A Bapsi; Abramson, Vandana; Bhave, Sandeep R; Levy, Mia A; Xu, Hua; Yankeelov, Thomas E

    2013-01-01

    Objective To employ machine learning methods to predict the eventual therapeutic response of breast cancer patients after a single cycle of neoadjuvant chemotherapy (NAC). Materials and methods Quantitative dynamic contrast-enhanced MRI and diffusion-weighted MRI data were acquired on 28 patients before and after one cycle of NAC. A total of 118 semiquantitative and quantitative parameters were derived from these data and combined with 11 clinical variables. We used Bayesian logistic regression in combination with feature selection using a machine learning framework for predictive model building. Results The best predictive models using feature selection obtained an area under the curve of 0.86 and an accuracy of 0.86, with a sensitivity of 0.88 and a specificity of 0.82. Discussion With the numerous options for NAC available, development of a method to predict response early in the course of therapy is needed. Unfortunately, by the time most patients are found not to be responding, their disease may no longer be surgically resectable, and this situation could be avoided by the development of techniques to assess response earlier in the treatment regimen. The method outlined here is one possible solution to this important clinical problem. Conclusions Predictive modeling approaches based on machine learning using readily available clinical and quantitative MRI data show promise in distinguishing breast cancer responders from non-responders after the first cycle of NAC. PMID:23616206

  19. Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination

    DOE PAGES

    Hu, Wenjian; Singh, Rajiv R. P.; Scalettar, Richard T.

    2017-06-19

    Here, we apply unsupervised machine learning techniques, mainly principal component analysis (PCA), to compare and contrast the phase behavior and phase transitions in several classical spin models - the square and triangular-lattice Ising models, the Blume-Capel model, a highly degenerate biquadratic-exchange spin-one Ising (BSI) model, and the 2D XY model, and examine critically what machine learning is teaching us. We find that quantified principal components from PCA not only allow exploration of different phases and symmetry-breaking, but can distinguish phase transition types and locate critical points. We show that the corresponding weight vectors have a clear physical interpretation, which ismore » particularly interesting in the frustrated models such as the triangular antiferromagnet, where they can point to incipient orders. Unlike the other well-studied models, the properties of the BSI model are less well known. Using both PCA and conventional Monte Carlo analysis, we demonstrate that the BSI model shows an absence of phase transition and macroscopic ground-state degeneracy. The failure to capture the 'charge' correlations (vorticity) in the BSI model (XY model) from raw spin configurations points to some of the limitations of PCA. Finally, we employ a nonlinear unsupervised machine learning procedure, the 'antoencoder method', and demonstrate that it too can be trained to capture phase transitions and critical points.« less

  20. Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination

    NASA Astrophysics Data System (ADS)

    Hu, Wenjian; Singh, Rajiv R. P.; Scalettar, Richard T.

    2017-06-01

    We apply unsupervised machine learning techniques, mainly principal component analysis (PCA), to compare and contrast the phase behavior and phase transitions in several classical spin models—the square- and triangular-lattice Ising models, the Blume-Capel model, a highly degenerate biquadratic-exchange spin-1 Ising (BSI) model, and the two-dimensional X Y model—and we examine critically what machine learning is teaching us. We find that quantified principal components from PCA not only allow the exploration of different phases and symmetry-breaking, but they can distinguish phase-transition types and locate critical points. We show that the corresponding weight vectors have a clear physical interpretation, which is particularly interesting in the frustrated models such as the triangular antiferromagnet, where they can point to incipient orders. Unlike the other well-studied models, the properties of the BSI model are less well known. Using both PCA and conventional Monte Carlo analysis, we demonstrate that the BSI model shows an absence of phase transition and macroscopic ground-state degeneracy. The failure to capture the "charge" correlations (vorticity) in the BSI model (X Y model) from raw spin configurations points to some of the limitations of PCA. Finally, we employ a nonlinear unsupervised machine learning procedure, the "autoencoder method," and we demonstrate that it too can be trained to capture phase transitions and critical points.

  1. A cooperative approach among methods for photometric redshifts estimation: an application to KiDS data

    NASA Astrophysics Data System (ADS)

    Cavuoti, S.; Tortora, C.; Brescia, M.; Longo, G.; Radovich, M.; Napolitano, N. R.; Amaro, V.; Vellucci, C.; La Barbera, F.; Getman, F.; Grado, A.

    2017-04-01

    Photometric redshifts (photo-z) are fundamental in galaxy surveys to address different topics, from gravitational lensing and dark matter distribution to galaxy evolution. The Kilo Degree Survey (KiDS), I.e. the European Southern Observatory (ESO) public survey on the VLT Survey Telescope (VST), provides the unprecedented opportunity to exploit a large galaxy data set with an exceptional image quality and depth in the optical wavebands. Using a KiDS subset of about 25000 galaxies with measured spectroscopic redshifts, we have derived photo-z using (I) three different empirical methods based on supervised machine learning; (II) the Bayesian photometric redshift model (or BPZ); and (III) a classical spectral energy distribution (SED) template fitting procedure (LE PHARE). We confirm that, in the regions of the photometric parameter space properly sampled by the spectroscopic templates, machine learning methods provide better redshift estimates, with a lower scatter and a smaller fraction of outliers. SED fitting techniques, however, provide useful information on the galaxy spectral type, which can be effectively used to constrain systematic errors and to better characterize potential catastrophic outliers. Such classification is then used to specialize the training of regression machine learning models, by demonstrating that a hybrid approach, involving SED fitting and machine learning in a single collaborative framework, can be effectively used to improve the accuracy of photo-z estimates.

  2. Nonlinear machine learning in soft materials engineering and design

    NASA Astrophysics Data System (ADS)

    Ferguson, Andrew

    The inherently many-body nature of molecular folding and colloidal self-assembly makes it challenging to identify the underlying collective mechanisms and pathways governing system behavior, and has hindered rational design of soft materials with desired structure and function. Fundamentally, there exists a predictive gulf between the architecture and chemistry of individual molecules or colloids and the collective many-body thermodynamics and kinetics. Integrating machine learning techniques with statistical thermodynamics provides a means to bridge this divide and identify emergent folding pathways and self-assembly mechanisms from computer simulations or experimental particle tracking data. We will survey a few of our applications of this framework that illustrate the value of nonlinear machine learning in understanding and engineering soft materials: the non-equilibrium self-assembly of Janus colloids into pinwheels, clusters, and archipelagos; engineering reconfigurable ''digital colloids'' as a novel high-density information storage substrate; probing hierarchically self-assembling onjugated asphaltenes in crude oil; and determining macromolecular folding funnels from measurements of single experimental observables. We close with an outlook on the future of machine learning in soft materials engineering, and share some personal perspectives on working at this disciplinary intersection. We acknowledge support for this work from a National Science Foundation CAREER Award (Grant No. DMR-1350008) and the Donors of the American Chemical Society Petroleum Research Fund (ACS PRF #54240-DNI6).

  3. Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.

    PubMed

    Maniruzzaman, Md; Rahman, Md Jahanur; Al-MehediHasan, Md; Suri, Harman S; Abedin, Md Menhazul; El-Baz, Ayman; Suri, Jasjit S

    2018-04-10

    Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.

  4. Manifold learning in machine vision and robotics

    NASA Astrophysics Data System (ADS)

    Bernstein, Alexander

    2017-02-01

    Smart algorithms are used in Machine vision and Robotics to organize or extract high-level information from the available data. Nowadays, Machine learning is an essential and ubiquitous tool to automate extraction patterns or regularities from data (images in Machine vision; camera, laser, and sonar sensors data in Robotics) in order to solve various subject-oriented tasks such as understanding and classification of images content, navigation of mobile autonomous robot in uncertain environments, robot manipulation in medical robotics and computer-assisted surgery, and other. Usually such data have high dimensionality, however, due to various dependencies between their components and constraints caused by physical reasons, all "feasible and usable data" occupy only a very small part in high dimensional "observation space" with smaller intrinsic dimensionality. Generally accepted model of such data is manifold model in accordance with which the data lie on or near an unknown manifold (surface) of lower dimensionality embedded in an ambient high dimensional observation space; real-world high-dimensional data obtained from "natural" sources meet, as a rule, this model. The use of Manifold learning technique in Machine vision and Robotics, which discovers a low-dimensional structure of high dimensional data and results in effective algorithms for solving of a large number of various subject-oriented tasks, is the content of the conference plenary speech some topics of which are in the paper.

  5. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

    PubMed Central

    2011-01-01

    Background Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. Results We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases. Conclusions ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at http://cbio.ensmp.fr/prodige. PMID:21977986

  6. Machine learning properties of materials and molecules with entropy-regularized kernels

    NASA Astrophysics Data System (ADS)

    Ceriotti, Michele; Bartók, Albert; CsáNyi, GáBor; de, Sandip

    Application of machine-learning methods to physics, chemistry and materials science is gaining traction as a strategy to obtain accurate predictions of the properties of matter at a fraction of the typical cost of quantum mechanical electronic structure calculations. In this endeavor, one can leverage general-purpose frameworks for supervised-learning. It is however very important that the input data - for instance the positions of atoms in a molecule or solid - is processed into a form that reflects all the underlying physical symmetries of the problem, and that possesses the regularity properties that are required by machine-learning algorithms. Here we introduce a general strategy to build a representation of this kind. We will start from existing approaches to compare local environments (basically, groups of atoms), and combine them using techniques borrowed from optimal transport theory, discussing the relation between this idea and additive energy decompositions. We will present a few examples demonstrating the potential of this approach as a tool to predict molecular and materials' properties with an accuracy on par with state-of-the-art electronic structure methods. MARVEL NCCR (Swiss National Science Foundation) and ERC StG HBMAP (European Research Council, G.A. 677013).

  7. Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets.

    PubMed

    Pyo, Sujin; Lee, Jaewook; Cha, Mincheol; Jang, Huisu

    2017-01-01

    The prediction of the trends of stocks and index prices is one of the important issues to market participants. Investors have set trading or fiscal strategies based on the trends, and considerable research in various academic fields has been studied to forecast financial markets. This study predicts the trends of the Korea Composite Stock Price Index 200 (KOSPI 200) prices using nonparametric machine learning models: artificial neural network, support vector machines with polynomial and radial basis function kernels. In addition, this study states controversial issues and tests hypotheses about the issues. Accordingly, our results are inconsistent with those of the precedent research, which are generally considered to have high prediction performance. Moreover, Google Trends proved that they are not effective factors in predicting the KOSPI 200 index prices in our frameworks. Furthermore, the ensemble methods did not improve the accuracy of the prediction.

  8. The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction.

    PubMed

    Casey, M

    1996-08-15

    Recurrent neural networks (RNNs) can learn to perform finite state computations. It is shown that an RNN performing a finite state computation must organize its state space to mimic the states in the minimal deterministic finite state machine that can perform that computation, and a precise description of the attractor structure of such systems is given. This knowledge effectively predicts activation space dynamics, which allows one to understand RNN computation dynamics in spite of complexity in activation dynamics. This theory provides a theoretical framework for understanding finite state machine (FSM) extraction techniques and can be used to improve training methods for RNNs performing FSM computations. This provides an example of a successful approach to understanding a general class of complex systems that has not been explicitly designed, e.g., systems that have evolved or learned their internal structure.

  9. Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets

    PubMed Central

    Pyo, Sujin; Lee, Jaewook; Cha, Mincheol

    2017-01-01

    The prediction of the trends of stocks and index prices is one of the important issues to market participants. Investors have set trading or fiscal strategies based on the trends, and considerable research in various academic fields has been studied to forecast financial markets. This study predicts the trends of the Korea Composite Stock Price Index 200 (KOSPI 200) prices using nonparametric machine learning models: artificial neural network, support vector machines with polynomial and radial basis function kernels. In addition, this study states controversial issues and tests hypotheses about the issues. Accordingly, our results are inconsistent with those of the precedent research, which are generally considered to have high prediction performance. Moreover, Google Trends proved that they are not effective factors in predicting the KOSPI 200 index prices in our frameworks. Furthermore, the ensemble methods did not improve the accuracy of the prediction. PMID:29136004

  10. Graph theory for feature extraction and classification: a migraine pathology case study.

    PubMed

    Jorge-Hernandez, Fernando; Garcia Chimeno, Yolanda; Garcia-Zapirain, Begonya; Cabrera Zubizarreta, Alberto; Gomez Beldarrain, Maria Angeles; Fernandez-Ruanova, Begonya

    2014-01-01

    Graph theory is also widely used as a representational form and characterization of brain connectivity network, as is machine learning for classifying groups depending on the features extracted from images. Many of these studies use different techniques, such as preprocessing, correlations, features or algorithms. This paper proposes an automatic tool to perform a standard process using images of the Magnetic Resonance Imaging (MRI) machine. The process includes pre-processing, building the graph per subject with different correlations, atlas, relevant feature extraction according to the literature, and finally providing a set of machine learning algorithms which can produce analyzable results for physicians or specialists. In order to verify the process, a set of images from prescription drug abusers and patients with migraine have been used. In this way, the proper functioning of the tool has been proved, providing results of 87% and 92% of success depending on the classifier used.

  11. An integer batch scheduling model considering learning, forgetting, and deterioration effects for a single machine to minimize total inventory holding cost

    NASA Astrophysics Data System (ADS)

    Yusriski, R.; Sukoyo; Samadhi, T. M. A. A.; Halim, A. H.

    2018-03-01

    This research deals with a single machine batch scheduling model considering the influenced of learning, forgetting, and machine deterioration effects. The objective of the model is to minimize total inventory holding cost, and the decision variables are the number of batches (N), batch sizes (Q[i], i = 1, 2, .., N) and the sequence of processing the resulting batches. The parts to be processed are received at the right time and the right quantities, and all completed parts must be delivered at a common due date. We propose a heuristic procedure based on the Lagrange method to solve the problem. The effectiveness of the procedure is evaluated by comparing the resulting solution to the optimal solution obtained from the enumeration procedure using the integer composition technique and shows that the average effectiveness is 94%.

  12. Quantum machine learning: a classical perspective

    NASA Astrophysics Data System (ADS)

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  13. Local Learning Strategies for Wake Identification

    NASA Astrophysics Data System (ADS)

    Colvert, Brendan; Alsalman, Mohamad; Kanso, Eva

    2017-11-01

    Swimming agents, biological and engineered alike, must navigate the underwater environment to survive. Tasks such as autonomous navigation, foraging, mating, and predation require the ability to extract critical cues from the hydrodynamic environment. A substantial body of evidence supports the hypothesis that biological systems leverage local sensing modalities, including flow sensing, to gain knowledge of their global surroundings. The nonlinear nature and high degree of complexity of fluid dynamics makes the development of algorithms for implementing localized sensing in bioinspired engineering systems essentially intractable for many systems of practical interest. In this work, we use techniques from machine learning for training a bioinspired swimmer to learn from its environment. We demonstrate the efficacy of this strategy by learning how to sense global characteristics of the wakes of other swimmers measured only from local sensory information. We conclude by commenting on the advantages and limitations of this data-driven, machine learning approach and its potential impact on broader applications in underwater sensing and navigation.

  14. Quantum machine learning: a classical perspective

    PubMed Central

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed. PMID:29434508

  15. Quantum machine learning: a classical perspective.

    PubMed

    Ciliberto, Carlo; Herbster, Mark; Ialongo, Alessandro Davide; Pontil, Massimiliano; Rocchetto, Andrea; Severini, Simone; Wossnig, Leonard

    2018-01-01

    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning (ML) techniques to impressive results in regression, classification, data generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets is motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed up classical ML algorithms. Here we review the literature in quantum ML and discuss perspectives for a mixed readership of classical ML and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in ML are identified as promising directions for the field. Practical questions, such as how to upload classical data into quantum form, will also be addressed.

  16. Automatic detection of tweets reporting cases of influenza like illnesses in Australia

    PubMed Central

    2015-01-01

    Early detection of disease outbreaks is critical for disease spread control and management. In this work we investigate the suitability of statistical machine learning approaches to automatically detect Twitter messages (tweets) that are likely to report cases of possible influenza like illnesses (ILI). Empirical results obtained on a large set of tweets originating from the state of Victoria, Australia, in a 3.5 month period show evidence that machine learning classifiers are effective in identifying tweets that mention possible cases of ILI (up to 0.736 F-measure, i.e. the harmonic mean of precision and recall), regardless of the specific technique implemented by the classifier investigated in the study. PMID:25870759

  17. Feature Extraction and Machine Learning for the Classification of Brazilian Savannah Pollen Grains

    PubMed Central

    Souza, Junior Silva; da Silva, Gercina Gonçalves

    2016-01-01

    The classification of pollen species and types is an important task in many areas like forensic palynology, archaeological palynology and melissopalynology. This paper presents the first annotated image dataset for the Brazilian Savannah pollen types that can be used to train and test computer vision based automatic pollen classifiers. A first baseline human and computer performance for this dataset has been established using 805 pollen images of 23 pollen types. In order to access the computer performance, a combination of three feature extractors and four machine learning techniques has been implemented, fine tuned and tested. The results of these tests are also presented in this paper. PMID:27276196

  18. Classification of fMRI resting-state maps using machine learning techniques: A comparative study

    NASA Astrophysics Data System (ADS)

    Gallos, Ioannis; Siettos, Constantinos

    2017-11-01

    We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.

  19. Machine-learning approach for local classification of crystalline structures in multiphase systems

    NASA Astrophysics Data System (ADS)

    Dietz, C.; Kretz, T.; Thoma, M. H.

    2017-07-01

    Machine learning is one of the most popular fields in computer science and has a vast number of applications. In this work we will propose a method that will use a neural network to locally identify crystal structures in a mixed phase Yukawa system consisting of fcc, hcp, and bcc clusters and disordered particles similar to plasma crystals. We compare our approach to already used methods and show that the quality of identification increases significantly. The technique works very well for highly disturbed lattices and shows a flexible and robust way to classify crystalline structures that can be used by only providing particle positions. This leads to insights into highly disturbed crystalline structures.

  20. Pileup Mitigation with Machine Learning (PUMML)

    NASA Astrophysics Data System (ADS)

    Komiske, Patrick T.; Metodiev, Eric M.; Nachman, Benjamin; Schwartz, Matthew D.

    2017-12-01

    Pileup involves the contamination of the energy distribution arising from the primary collision of interest (leading vertex) by radiation from soft collisions (pileup). We develop a new technique for removing this contamination using machine learning and convolutional neural networks. The network takes as input the energy distribution of charged leading vertex particles, charged pileup particles, and all neutral particles and outputs the energy distribution of particles coming from leading vertex alone. The PUMML algorithm performs remarkably well at eliminating pileup distortion on a wide range of simple and complex jet observables. We test the robustness of the algorithm in a number of ways and discuss how the network can be trained directly on data.

  1. Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database.

    PubMed

    Chen-Ying Hung; Wei-Chen Chen; Po-Tsun Lai; Ching-Heng Lin; Chi-Chun Lee

    2017-07-01

    Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.

  2. Evaluation of an Integrated Multi-Task Machine Learning System with Humans in the Loop

    DTIC Science & Technology

    2007-01-01

    machine learning components natural language processing, and optimization...was examined with a test explicitly developed to measure the impact of integrated machine learning when used by a human user in a real world setting...study revealed that integrated machine learning does produce a positive impact on overall performance. This paper also discusses how specific machine learning components contributed to human-system

  3. Ecological interactions and the Netflix problem.

    PubMed

    Desjardins-Proulx, Philippe; Laigle, Idaline; Poisot, Timothée; Gravel, Dominique

    2017-01-01

    Species interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with a supervised machine learning technique. Recommenders are algorithms developed for companies like Netflix to predict whether a customer will like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species' phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species' interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species' interactions. Further work should focus on developing custom similarity measures specialized for ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species.

  4. Ecological interactions and the Netflix problem

    PubMed Central

    Laigle, Idaline; Poisot, Timothée; Gravel, Dominique

    2017-01-01

    Species interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with a supervised machine learning technique. Recommenders are algorithms developed for companies like Netflix to predict whether a customer will like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species’ phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species’ interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species’ interactions. Further work should focus on developing custom similarity measures specialized for ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species. PMID:28828250

  5. Discovering Communicable Scientific Knowledge from Spatio-Temporal Data

    NASA Technical Reports Server (NTRS)

    Schwabacher, Mark; Langley, Pat; Norvig, Peter (Technical Monitor)

    2001-01-01

    This paper describes how we used regression rules to improve upon a result previously published in the Earth science literature. In such a scientific application of machine learning, it is crucially important for the learned models to be understandable and communicable. We recount how we selected a learning algorithm to maximize communicability, and then describe two visualization techniques that we developed to aid in understanding the model by exploiting the spatial nature of the data. We also report how evaluating the learned models across time let us discover an error in the data.

  6. Discovering Communicable Models from Earth Science Data

    NASA Technical Reports Server (NTRS)

    Schwabacher, Mark; Langley, Pat; Potter, Christopher; Klooster, Steven; Torregrosa, Alicia

    2002-01-01

    This chapter describes how we used regression rules to improve upon results previously published in the Earth science literature. In such a scientific application of machine learning, it is crucially important for the learned models to be understandable and communicable. We recount how we selected a learning algorithm to maximize communicability, and then describe two visualization techniques that we developed to aid in understanding the model by exploiting the spatial nature of the data. We also report how evaluating the learned models across time let us discover an error in the data.

  7. Big Data in radiation therapy: challenges and opportunities.

    PubMed

    Lustberg, Tim; van Soest, Johan; Jochems, Arthur; Deist, Timo; van Wijk, Yvonka; Walsh, Sean; Lambin, Philippe; Dekker, Andre

    2017-01-01

    Data collected and generated by radiation oncology can be classified by the Volume, Variety, Velocity and Veracity (4Vs) of Big Data because they are spread across different care providers and not easily shared owing to patient privacy protection. The magnitude of the 4Vs is substantial in oncology, especially owing to imaging modalities and unclear data definitions. To create useful models ideally all data of all care providers are understood and learned from; however, this presents challenges in the guise of poor data quality, patient privacy concerns, geographical spread, interoperability and large volume. In radiation oncology, there are many efforts to collect data for research and innovation purposes. Clinical trials are the gold standard when proving any hypothesis that directly affects the patient. Collecting data in registries with strict predefined rules is also a common approach to find answers. A third approach is to develop data stores that can be used by modern machine learning techniques to provide new insights or answer hypotheses. We believe all three approaches have their strengths and weaknesses, but they should all strive to create Findable, Accessible, Interoperable, Reusable (FAIR) data. To learn from these data, we need distributed learning techniques, sending machine learning algorithms to FAIR data stores around the world, learning from trial data, registries and routine clinical data rather than trying to centralize all data. To improve and personalize medicine, rapid learning platforms must be able to process FAIR "Big Data" to evaluate current clinical practice and to guide further innovation.

  8. A fuzzy pattern matching method based on graph kernel for lithography hotspot detection

    NASA Astrophysics Data System (ADS)

    Nitta, Izumi; Kanazawa, Yuzi; Ishida, Tsutomu; Banno, Koji

    2017-03-01

    In advanced technology nodes, lithography hotspot detection has become one of the most significant issues in design for manufacturability. Recently, machine learning based lithography hotspot detection has been widely investigated, but it has trade-off between detection accuracy and false alarm. To apply machine learning based technique to the physical verification phase, designers require minimizing undetected hotspots to avoid yield degradation. They also need a ranking of similar known patterns with a detected hotspot to prioritize layout pattern to be corrected. To achieve high detection accuracy and to prioritize detected hotspots, we propose a novel lithography hotspot detection method using Delaunay triangulation and graph kernel based machine learning. Delaunay triangulation extracts features of hotspot patterns where polygons locate irregularly and closely one another, and graph kernel expresses inner structure of graphs. Additionally, our method provides similarity between two patterns and creates a list of similar training patterns with a detected hotspot. Experiments results on ICCAD 2012 benchmarks show that our method achieves high accuracy with allowable range of false alarm. We also show the ranking of the similar known patterns with a detected hotspot.

  9. Machine Learning Techniques for Prediction of Early Childhood Obesity.

    PubMed

    Dugan, T M; Mukhopadhyay, S; Carroll, A; Downs, S

    2015-01-01

    This paper aims to predict childhood obesity after age two, using only data collected prior to the second birthday by a clinical decision support system called CHICA. Analyses of six different machine learning methods: RandomTree, RandomForest, J48, ID3, Naïve Bayes, and Bayes trained on CHICA data show that an accurate, sensitive model can be created. Of the methods analyzed, the ID3 model trained on the CHICA dataset proved the best overall performance with accuracy of 85% and sensitivity of 89%. Additionally, the ID3 model had a positive predictive value of 84% and a negative predictive value of 88%. The structure of the tree also gives insight into the strongest predictors of future obesity in children. Many of the strongest predictors seen in the ID3 modeling of the CHICA dataset have been independently validated in the literature as correlated with obesity, thereby supporting the validity of the model. This study demonstrated that data from a production clinical decision support system can be used to build an accurate machine learning model to predict obesity in children after age two.

  10. A Framework for Final Drive Simultaneous Failure Diagnosis Based on Fuzzy Entropy and Sparse Bayesian Extreme Learning Machine

    PubMed Central

    Ye, Qing; Pan, Hao; Liu, Changhua

    2015-01-01

    This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F 1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach. PMID:25722717

  11. Machine learning of frustrated classical spin models. I. Principal component analysis

    NASA Astrophysics Data System (ADS)

    Wang, Ce; Zhai, Hui

    2017-10-01

    This work aims at determining whether artificial intelligence can recognize a phase transition without prior human knowledge. If this were successful, it could be applied to, for instance, analyzing data from the quantum simulation of unsolved physical models. Toward this goal, we first need to apply the machine learning algorithm to well-understood models and see whether the outputs are consistent with our prior knowledge, which serves as the benchmark for this approach. In this work, we feed the computer data generated by the classical Monte Carlo simulation for the X Y model in frustrated triangular and union jack lattices, which has two order parameters and exhibits two phase transitions. We show that the outputs of the principal component analysis agree very well with our understanding of different orders in different phases, and the temperature dependences of the major components detect the nature and the locations of the phase transitions. Our work offers promise for using machine learning techniques to study sophisticated statistical models, and our results can be further improved by using principal component analysis with kernel tricks and the neural network method.

  12. Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques.

    PubMed

    Kebschull, Moritz; Papapanou, Panos N

    2017-01-01

    Although contemporary high-throughput -omics methods produce high-dimensional data, the resulting wealth of information is difficult to assess using traditional statistical procedures. Machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups.Here, we demonstrate the utility of (1) supervised classification algorithms in class validation, and (2) unsupervised clustering in class discovery. We use data from our previous work that described the transcriptional profiles of gingival tissue samples obtained from subjects suffering from chronic or aggressive periodontitis (1) to test whether the two diagnostic entities were also characterized by differences on the molecular level, and (2) to search for a novel, alternative classification of periodontitis based on the tissue transcriptomes.Using machine learning technology, we provide evidence for diagnostic imprecision in the currently accepted classification of periodontitis, and demonstrate that a novel, alternative classification based on differences in gingival tissue transcriptomes is feasible. The outlined procedures allow for the unbiased interrogation of high-dimensional datasets for characteristic underlying classes, and are applicable to a broad range of -omics data.

  13. Machine-learning-based real-bogus system for the HSC-SSP moving object detection pipeline

    NASA Astrophysics Data System (ADS)

    Lin, Hsing-Wen; Chen, Ying-Tung; Wang, Jen-Hung; Wang, Shiang-Yu; Yoshida, Fumi; Ip, Wing-Huen; Miyazaki, Satoshi; Terai, Tsuyoshi

    2018-01-01

    Machine-learning techniques are widely applied in many modern optical sky surveys, e.g., Pan-STARRS1, PTF/iPTF, and the Subaru/Hyper Suprime-Cam survey, to reduce human intervention in data verification. In this study, we have established a machine-learning-based real-bogus system to reject false detections in the Subaru/Hyper-Suprime-Cam Strategic Survey Program (HSC-SSP) source catalog. Therefore, the HSC-SSP moving object detection pipeline can operate more effectively due to the reduction of false positives. To train the real-bogus system, we use stationary sources as the real training set and "flagged" data as the bogus set. The training set contains 47 features, most of which are photometric measurements and shape moments generated from the HSC image reduction pipeline (hscPipe). Our system can reach a true positive rate (tpr) ˜96% with a false positive rate (fpr) ˜1% or tpr ˜99% at fpr ˜5%. Therefore, we conclude that stationary sources are decent real training samples, and using photometry measurements and shape moments can reject false positives effectively.

  14. A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data.

    PubMed

    Wolfson, Julian; Bandyopadhyay, Sunayan; Elidrisi, Mohamed; Vazquez-Benitez, Gabriela; Vock, David M; Musgrove, Donald; Adomavicius, Gediminas; Johnson, Paul E; O'Connor, Patrick J

    2015-09-20

    Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system. Copyright © 2015 John Wiley & Sons, Ltd.

  15. Full-Physics Inverse Learning Machine for Satellite Remote Sensing of Ozone Profile Shapes and Tropospheric Columns

    NASA Astrophysics Data System (ADS)

    Xu, J.; Heue, K.-P.; Coldewey-Egbers, M.; Romahn, F.; Doicu, A.; Loyola, D.

    2018-04-01

    Characterizing vertical distributions of ozone from nadir-viewing satellite measurements is known to be challenging, particularly the ozone information in the troposphere. A novel retrieval algorithm called Full-Physics Inverse Learning Machine (FP-ILM), has been developed at DLR in order to estimate ozone profile shapes based on machine learning techniques. In contrast to traditional inversion methods, the FP-ILM algorithm formulates the profile shape retrieval as a classification problem. Its implementation comprises a training phase to derive an inverse function from synthetic measurements, and an operational phase in which the inverse function is applied to real measurements. This paper extends the ability of the FP-ILM retrieval to derive tropospheric ozone columns from GOME- 2 measurements. Results of total and tropical tropospheric ozone columns are compared with the ones using the official GOME Data Processing (GDP) product and the convective-cloud-differential (CCD) method, respectively. Furthermore, the FP-ILM framework will be used for the near-real-time processing of the new European Sentinel sensors with their unprecedented spectral and spatial resolution and corresponding large increases in the amount of data.

  16. The applications of machine learning algorithms in the modeling of estrogen-like chemicals.

    PubMed

    Liu, Huanxiang; Yao, Xiaojun; Gramatica, Paola

    2009-06-01

    Increasing concern is being shown by the scientific community, government regulators, and the public about endocrine-disrupting chemicals that, in the environment, are adversely affecting human and wildlife health through a variety of mechanisms, mainly estrogen receptor-mediated mechanisms of toxicity. Because of the large number of such chemicals in the environment, there is a great need for an effective means of rapidly assessing endocrine-disrupting activity in the toxicology assessment process. When faced with the challenging task of screening large libraries of molecules for biological activity, the benefits of computational predictive models based on quantitative structure-activity relationships to identify possible estrogens become immediately obvious. Recently, in order to improve the accuracy of prediction, some machine learning techniques were introduced to build more effective predictive models. In this review we will focus our attention on some recent advances in the use of these methods in modeling estrogen-like chemicals. The advantages and disadvantages of the machine learning algorithms used in solving this problem, the importance of the validation and performance assessment of the built models as well as their applicability domains will be discussed.

  17. Galaxy morphology - An unsupervised machine learning approach

    NASA Astrophysics Data System (ADS)

    Schutter, A.; Shamir, L.

    2015-09-01

    Structural properties poses valuable information about the formation and evolution of galaxies, and are important for understanding the past, present, and future universe. Here we use unsupervised machine learning methodology to analyze a network of similarities between galaxy morphological types, and automatically deduce a morphological sequence of galaxies. Application of the method to the EFIGI catalog show that the morphological scheme produced by the algorithm is largely in agreement with the De Vaucouleurs system, demonstrating the ability of computer vision and machine learning methods to automatically profile galaxy morphological sequences. The unsupervised analysis method is based on comprehensive computer vision techniques that compute the visual similarities between the different morphological types. Rather than relying on human cognition, the proposed system deduces the similarities between sets of galaxy images in an automatic manner, and is therefore not limited by the number of galaxies being analyzed. The source code of the method is publicly available, and the protocol of the experiment is included in the paper so that the experiment can be replicated, and the method can be used to analyze user-defined datasets of galaxy images.

  18. Electrical test prediction using hybrid metrology and machine learning

    NASA Astrophysics Data System (ADS)

    Breton, Mary; Chao, Robin; Muthinti, Gangadhara Raja; de la Peña, Abraham A.; Simon, Jacques; Cepler, Aron J.; Sendelbach, Matthew; Gaudiello, John; Emans, Susan; Shifrin, Michael; Etzioni, Yoav; Urenski, Ronen; Lee, Wei Ti

    2017-03-01

    Electrical test measurement in the back-end of line (BEOL) is crucial for wafer and die sorting as well as comparing intended process splits. Any in-line, nondestructive technique in the process flow to accurately predict these measurements can significantly improve mean-time-to-detect (MTTD) of defects and improve cycle times for yield and process learning. Measuring after BEOL metallization is commonly done for process control and learning, particularly with scatterometry (also called OCD (Optical Critical Dimension)), which can solve for multiple profile parameters such as metal line height or sidewall angle and does so within patterned regions. This gives scatterometry an advantage over inline microscopy-based techniques, which provide top-down information, since such techniques can be insensitive to sidewall variations hidden under the metal fill of the trench. But when faced with correlation to electrical test measurements that are specific to the BEOL processing, both techniques face the additional challenge of sampling. Microscopy-based techniques are sampling-limited by their small probe size, while scatterometry is traditionally limited (for microprocessors) to scribe targets that mimic device ground rules but are not necessarily designed to be electrically testable. A solution to this sampling challenge lies in a fast reference-based machine learning capability that allows for OCD measurement directly of the electrically-testable structures, even when they are not OCD-compatible. By incorporating such direct OCD measurements, correlation to, and therefore prediction of, resistance of BEOL electrical test structures is significantly improved. Improvements in prediction capability for multiple types of in-die electrically-testable device structures is demonstrated. To further improve the quality of the prediction of the electrical resistance measurements, hybrid metrology using the OCD measurements as well as X-ray metrology (XRF) is used. Hybrid metrology is the practice of combining information from multiple sources in order to enable or improve the measurement of one or more critical parameters. Here, the XRF measurements are used to detect subtle changes in barrier layer composition and thickness that can have second-order effects on the electrical resistance of the test structures. By accounting for such effects with the aid of the X-ray-based measurements, further improvement in the OCD correlation to electrical test measurements is achieved. Using both types of solution incorporation of fast reference-based machine learning on nonOCD-compatible test structures, and hybrid metrology combining OCD with XRF technology improvement in BEOL cycle time learning could be accomplished through improved prediction capability.

  19. A Novel Application of Machine Learning Methods to Model Microcontroller Upset Due to Intentional Electromagnetic Interference

    NASA Astrophysics Data System (ADS)

    Bilalic, Rusmir

    A novel application of support vector machines (SVMs), artificial neural networks (ANNs), and Gaussian processes (GPs) for machine learning (GPML) to model microcontroller unit (MCU) upset due to intentional electromagnetic interference (IEMI) is presented. In this approach, an MCU performs a counting operation (0-7) while electromagnetic interference in the form of a radio frequency (RF) pulse is direct-injected into the MCU clock line. Injection times with respect to the clock signal are the clock low, clock rising edge, clock high, and the clock falling edge periods in the clock window during which the MCU is performing initialization and executing the counting procedure. The intent is to cause disruption in the counting operation and model the probability of effect (PoE) using machine learning tools. Five experiments were executed as part of this research, each of which contained a set of 38,300 training points and 38,300 test points, for a total of 383,000 total points with the following experiment variables: injection times with respect to the clock signal, injected RF power, injected RF pulse width, and injected RF frequency. For the 191,500 training points, the average training error was 12.47%, while for the 191,500 test points the average test error was 14.85%, meaning that on average, the machine was able to predict MCU upset with an 85.15% accuracy. Leaving out the results for the worst-performing model (SVM with a linear kernel), the test prediction accuracy for the remaining machines is almost 89%. All three machine learning methods (ANNs, SVMs, and GPML) showed excellent and consistent results in their ability to model and predict the PoE on an MCU due to IEMI. The GP approach performed best during training with a 7.43% average training error, while the ANN technique was most accurate during the test with a 10.80% error.

  20. Data Collision Prevention with Overflow Hashing Technique in Closed Hash Searching Process

    NASA Astrophysics Data System (ADS)

    Rahim, Robbi; Nurjamiyah; Rafika Dewi, Arie

    2017-12-01

    Hash search is a method that can be used for various search processes such as search engines, sorting, machine learning, neural network and so on, in the search process the possibility of collision data can happen and to prevent the occurrence of collision can be done in several ways one of them is to use Overflow technique, the use of this technique perform with varying length of data and this technique can prevent the occurrence of data collisions.

Top