Web Mining: Machine Learning for Web Applications.
ERIC Educational Resources Information Center
Chen, Hsinchun; Chau, Michael
2004-01-01
Presents an overview of machine learning research and reviews methods used for evaluating machine learning systems. Ways that machine-learning algorithms were used in traditional information retrieval systems in the "pre-Web" era are described, and the field of Web mining and how machine learning has been used in different Web mining…
Advances in Machine Learning and Data Mining for Astronomy
NASA Astrophysics Data System (ADS)
Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.
2012-03-01
Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.
ERIC Educational Resources Information Center
Chen, Hsinchun
2003-01-01
Discusses information retrieval techniques used on the World Wide Web. Topics include machine learning in information extraction; relevance feedback; information filtering and recommendation; text classification and text clustering; Web mining, based on data mining techniques; hyperlink structure; and Web size. (LRW)
Luo, Gang
2017-12-01
For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic.
Luo, Gang
2017-01-01
For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic. PMID:29177022
AstroML: Python-powered Machine Learning for Astronomy
NASA Astrophysics Data System (ADS)
Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.
2014-01-01
As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.
A systematic review of data mining and machine learning for air pollution epidemiology.
Bellinger, Colin; Mohomed Jabbar, Mohomed Shazan; Zaïane, Osmar; Osornio-Vargas, Alvaro
2017-11-28
Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.
2017-03-01
neuro ICP care beyond trauma care. 15. SUBJECT TERMS Advanced machine learning techniques, intracranial pressure, vital signs, monitoring...death and disability in combat casualties [1,2]. Approximately 2 million head injuries occur annually in the United States, resulting in more than...editor. Machine learning and data mining in pattern recognition. Proceedings of the 8th International Workshop on Machine Learning and Data Mining in
ERIC Educational Resources Information Center
Kirrane, Diane E.
1990-01-01
As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)
Kernel Methods for Mining Instance Data in Ontologies
NASA Astrophysics Data System (ADS)
Bloehdorn, Stephan; Sure, York
The amount of ontologies and meta data available on the Web is constantly growing. The successful application of machine learning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machine learning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.
Data Mining and Machine Learning in Astronomy
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.; Brunner, Robert J.
We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those in which data mining techniques directly contributed to improving science, and important current and future directions, including probability density functions, parallel algorithms, Peta-Scale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
An Evolutionary Machine Learning Framework for Big Data Sequence Mining
ERIC Educational Resources Information Center
Kamath, Uday Krishna
2014-01-01
Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…
Data Mining at NASA: From Theory to Applications
NASA Technical Reports Server (NTRS)
Srivastava, Ashok N.
2009-01-01
This slide presentation demonstrates the data mining/machine learning capabilities of NASA Ames and Intelligent Data Understanding (IDU) group. This will encompass the work done recently in the group by various group members. The IDU group develops novel algorithms to detect, classify, and predict events in large data streams for scientific and engineering systems. This presentation for Knowledge Discovery and Data Mining 2009 is to demonstrate the data mining/machine learning capabilities of NASA Ames and IDU group. This will encompass the work done re cently in the group by various group members.
Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers
García-Gonzalo, Esperanza; Fernández-Muñiz, Zulima; García Nieto, Paulino José; Bernardo Sánchez, Antonio; Menéndez Fernández, Marta
2016-01-01
The mining industry relies heavily on empirical analysis for design and prediction. An empirical design method, called the critical span graph, was developed specifically for rock stability analysis in entry-type excavations, based on an extensive case-history database of cut and fill mining in Canada. This empirical span design chart plots the critical span against rock mass rating for the observed case histories and has been accepted by many mining operations for the initial span design of cut and fill stopes. Different types of analysis have been used to classify the observed cases into stable, potentially unstable and unstable groups. The main purpose of this paper is to present a new method for defining rock stability areas of the critical span graph, which applies machine learning classifiers (support vector machine and extreme learning machine). The results show a reasonable correlation with previous guidelines. These machine learning methods are good tools for developing empirical methods, since they make no assumptions about the regression function. With this software, it is easy to add new field observations to a previous database, improving prediction output with the addition of data that consider the local conditions for each mine. PMID:28773653
Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers.
García-Gonzalo, Esperanza; Fernández-Muñiz, Zulima; García Nieto, Paulino José; Bernardo Sánchez, Antonio; Menéndez Fernández, Marta
2016-06-29
The mining industry relies heavily on empirical analysis for design and prediction. An empirical design method, called the critical span graph, was developed specifically for rock stability analysis in entry-type excavations, based on an extensive case-history database of cut and fill mining in Canada. This empirical span design chart plots the critical span against rock mass rating for the observed case histories and has been accepted by many mining operations for the initial span design of cut and fill stopes. Different types of analysis have been used to classify the observed cases into stable, potentially unstable and unstable groups. The main purpose of this paper is to present a new method for defining rock stability areas of the critical span graph, which applies machine learning classifiers (support vector machine and extreme learning machine). The results show a reasonable correlation with previous guidelines. These machine learning methods are good tools for developing empirical methods, since they make no assumptions about the regression function. With this software, it is easy to add new field observations to a previous database, improving prediction output with the addition of data that consider the local conditions for each mine.
Thutmose - Investigation of Machine Learning-Based Intrusion Detection Systems
2016-06-01
research is being done to incorporate the field of machine learning into intrusion detection. Machine learning is a branch of artificial intelligence (AI...adversarial drift." Proceedings of the 2013 ACM workshop on Artificial intelligence and security. ACM. (2013) Kantarcioglu, M., Xi, B., and Clifton, C. "A...34 Proceedings of the 4th ACM workshop on Security and artificial intelligence . ACM. (2011) Dua, S., and Du, X. Data Mining and Machine Learning in
Data mining in bioinformatics using Weka.
Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H
2004-10-12
The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.
Survey of Analysis of Crime Detection Techniques Using Data Mining and Machine Learning
NASA Astrophysics Data System (ADS)
Prabakaran, S.; Mitra, Shilpa
2018-04-01
Data mining is the field containing procedures for finding designs or patterns in a huge dataset, it includes strategies at the convergence of machine learning and database framework. It can be applied to various fields like future healthcare, market basket analysis, education, manufacturing engineering, crime investigation etc. Among these, crime investigation is an interesting application to process crime characteristics to help the society for a better living. This paper survey various data mining techniques used in this domain. This study may be helpful in designing new strategies for crime prediction and analysis.
Transfer Learning beyond Text Classification
NASA Astrophysics Data System (ADS)
Yang, Qiang
Transfer learning is a new machine learning and data mining framework that allows the training and test data to come from different distributions or feature spaces. We can find many novel applications of machine learning and data mining where transfer learning is necessary. While much has been done in transfer learning in text classification and reinforcement learning, there has been a lack of documented success stories of novel applications of transfer learning in other areas. In this invited article, I will argue that transfer learning is in fact quite ubiquitous in many real world applications. In this article, I will illustrate this point through an overview of a broad spectrum of applications of transfer learning that range from collaborative filtering to sensor based location estimation and logical action model learning for AI planning. I will also discuss some potential future directions of transfer learning.
AstroML: "better, faster, cheaper" towards state-of-the-art data mining and machine learning
NASA Astrophysics Data System (ADS)
Ivezic, Zeljko; Connolly, Andrew J.; Vanderplas, Jacob
2015-01-01
We present AstroML, a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy, and distributed under an open license. AstroML contains a growing library of statistical and machine learning routines for analyzing astronomical data in Python, loaders for several open astronomical datasets (such as SDSS and other recent major surveys), and a large suite of examples of analyzing and visualizing astronomical datasets. AstroML is especially suitable for introducing undergraduate students to numerical research projects and for graduate students to rapidly undertake cutting-edge research. The long-term goal of astroML is to provide a community repository for fast Python implementations of common tools and routines used for statistical data analysis in astronomy and astrophysics (see http://www.astroml.org).
Machine Learning and Data Mining Methods in Diabetes Research.
Kavakiotis, Ioannis; Tsave, Olga; Salifoglou, Athanasios; Maglaveras, Nicos; Vlahavas, Ioannis; Chouvarda, Ioanna
2017-01-01
The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.
Automated Data Assimilation and Flight Planning for Multi-Platform Observation Missions
NASA Technical Reports Server (NTRS)
Oza, Nikunj; Morris, Robert A.; Strawa, Anthony; Kurklu, Elif; Keely, Leslie
2008-01-01
This is a progress report on an effort in which our goal is to demonstrate the effectiveness of automated data mining and planning for the daily management of Earth Science missions. Currently, data mining and machine learning technologies are being used by scientists at research labs for validating Earth science models. However, few if any of these advanced techniques are currently being integrated into daily mission operations. Consequently, there are significant gaps in the knowledge that can be derived from the models and data that are used each day for guiding mission activities. The result can be sub-optimal observation plans, lack of useful data, and wasteful use of resources. Recent advances in data mining, machine learning, and planning make it feasible to migrate these technologies into the daily mission planning cycle. We describe the design of a closed loop system for data acquisition, processing, and flight planning that integrates the results of machine learning into the flight planning process.
Current Developments in Machine Learning Techniques in Biological Data Mining.
Dumancas, Gerard G; Adrianto, Indra; Bello, Ghalib; Dozmorov, Mikhail
2017-01-01
This supplement is intended to focus on the use of machine learning techniques to generate meaningful information on biological data. This supplement under Bioinformatics and Biology Insights aims to provide scientists and researchers working in this rapid and evolving field with online, open-access articles authored by leading international experts in this field. Advances in the field of biology have generated massive opportunities to allow the implementation of modern computational and statistical techniques. Machine learning methods in particular, a subfield of computer science, have evolved as an indispensable tool applied to a wide spectrum of bioinformatics applications. Thus, it is broadly used to investigate the underlying mechanisms leading to a specific disease, as well as the biomarker discovery process. With a growth in this specific area of science comes the need to access up-to-date, high-quality scholarly articles that will leverage the knowledge of scientists and researchers in the various applications of machine learning techniques in mining biological data.
Machine learning and medicine: book review and commentary.
Koprowski, Robert; Foster, Kenneth R
2018-02-01
This article is a review of the book "Master machine learning algorithms, discover how they work and implement them from scratch" (ISBN: not available, 37 USD, 163 pages) edited by Jason Brownlee published by the Author, edition, v1.10 http://MachineLearningMastery.com . An accompanying commentary discusses some of the issues that are involved with use of machine learning and data mining techniques to develop predictive models for diagnosis or prognosis of disease, and to call attention to additional requirements for developing diagnostic and prognostic algorithms that are generally useful in medicine. Appendix provides examples that illustrate potential problems with machine learning that are not addressed in the reviewed book.
ERIC Educational Resources Information Center
Dhar, Vasant
1998-01-01
Shows how counterfactuals and machine learning methods can be used to guide exploration of large databases that addresses some of the fundamental problems that organizations face in learning from data. Discusses data mining, particularly in the financial arena; generating useful knowledge from data; and the evaluation of counterfactuals. (LRW)
Machine learning approaches to analysing textual injury surveillance data: a systematic review.
Vallmuur, Kirsten
2015-06-01
To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Systematic review. The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field. Copyright © 2015 Elsevier Ltd. All rights reserved.
Ten quick tips for machine learning in computational biology.
Chicco, Davide
2017-01-01
Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. We believe our ten suggestions can strongly help any machine learning practitioner to carry on a successful project in computational biology and related sciences.
Development of a Workbench to Address the Educational Data Mining Bottleneck
ERIC Educational Resources Information Center
Rodrigo, Ma. Mercedes T.; Baker, Ryan S. J. d.; McLaren, Bruce M.; Jayme, Alejandra; Dy, Thomas T.
2012-01-01
In recent years, machine-learning software packages have made it easier for educational data mining researchers to create real-time detectors of cognitive skill as well as of metacognitive and motivational behavior that can be used to improve student learning. However, there remain challenges to overcome for these methods to become available to…
Software tool for data mining and its applications
NASA Astrophysics Data System (ADS)
Yang, Jie; Ye, Chenzhou; Chen, Nianyi
2002-03-01
A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics
Torii, Manabu; Tilak, Sameer S.; Doan, Son; Zisook, Daniel S.; Fan, Jung-wei
2016-01-01
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research. PMID:27375358
Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics.
Torii, Manabu; Tilak, Sameer S; Doan, Son; Zisook, Daniel S; Fan, Jung-Wei
2016-01-01
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.
Mining the Galaxy Zoo Database: Machine Learning Applications
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.
2010-01-01
The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.
Constructing and Classifying Email Networks from Raw Forensic Images
2016-09-01
data mining for sequence and pattern mining ; in medical imaging for image segmentation; and in computer vision for object recognition” [28]. 2.3.1...machine learning and data mining suite that is written in Python. It provides a platform for experiment selection, recommendation systems, and...predictivemod- eling. The Orange library is a hierarchically-organized toolbox of data mining components. Data filtering and probability assessment are at the
Recent advances in environmental data mining
NASA Astrophysics Data System (ADS)
Leuenberger, Michael; Kanevski, Mikhail
2016-04-01
Due to the large amount and complexity of data available nowadays in geo- and environmental sciences, we face the need to develop and incorporate more robust and efficient methods for their analysis, modelling and visualization. An important part of these developments deals with an elaboration and application of a contemporary and coherent methodology following the process from data collection to the justification and communication of the results. Recent fundamental progress in machine learning (ML) can considerably contribute to the development of the emerging field - environmental data science. The present research highlights and investigates the different issues that can occur when dealing with environmental data mining using cutting-edge machine learning algorithms. In particular, the main attention is paid to the description of the self-consistent methodology and two efficient algorithms - Random Forest (RF, Breiman, 2001) and Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. Despite the fact that they are based on two different concepts, i.e. decision trees vs artificial neural networks, they both propose promising results for complex, high dimensional and non-linear data modelling. In addition, the study discusses several important issues of data driven modelling, including feature selection and uncertainties. The approach considered is accompanied by simulated and real data case studies from renewable resources assessment and natural hazards tasks. In conclusion, the current challenges and future developments in statistical environmental data learning are discussed. References - Breiman, L., 2001. Random Forests. Machine Learning 45 (1), 5-32. - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392. - Leuenberger, M., Kanevski, M., 2015. Extreme Learning Machines for spatial environmental data. Computers and Geosciences 85, 64-73.
ERIC Educational Resources Information Center
Blikstein, Paulo; Worsley, Marcelo
2016-01-01
New high-frequency multimodal data collection technologies and machine learning analysis techniques could offer new insights into learning, especially when students have the opportunity to generate unique, personalized artifacts, such as computer programs, robots, and solutions engineering challenges. To date most of the work on learning analytics…
Mining protein function from text using term-based support vector machines
Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J
2005-01-01
Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
Dipnall, Joanna F.
2016-01-01
Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.
Morota, Gota; Ventura, Ricardo V; Silva, Fabyano F; Koyama, Masanori; Fernando, Samodha C
2018-04-14
Precision animal agriculture is poised to rise to prominence in the livestock enterprise in the domains of management, production, welfare, sustainability, health surveillance, and environmental footprint. Considerable progress has been made in the use of tools to routinely monitor and collect information from animals and farms in a less laborious manner than before. These efforts have enabled the animal sciences to embark on information technology-driven discoveries to improve animal agriculture. However, the growing amount and complexity of data generated by fully automated, high-throughput data recording or phenotyping platforms, including digital images, sensor and sound data, unmanned systems, and information obtained from real-time noninvasive computer vision, pose challenges to the successful implementation of precision animal agriculture. The emerging fields of machine learning and data mining are expected to be instrumental in helping meet the daunting challenges facing global agriculture. Yet, their impact and potential in "big data" analysis have not been adequately appreciated in the animal science community, where this recognition has remained only fragmentary. To address such knowledge gaps, this article outlines a framework for machine learning and data mining and offers a glimpse into how they can be applied to solve pressing problems in animal sciences.
Author Detection on a Mobile Phone
2011-03-01
handwriting , and to mine sales data for profitable trends. Two broad categories of machine learning are supervised learn- ing and unsupervised learning...evaluation,” AI 2006: Advances in Artificial Intelligence, p. 1015–1021, 2006. [23] “Gartner says worldwide mobile phone sales grew 17 per cent in first
Open Research Challenges with Big Data - A Data-Scientist s Perspective
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R
In this paper, we discuss data-driven discovery challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are data mining algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emergingmore » and outstanding challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security, healthcare and manufacturing to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less
Traffic Flow Management: Data Mining Update
NASA Technical Reports Server (NTRS)
Grabbe, Shon R.
2012-01-01
This presentation provides an update on recent data mining efforts that have been designed to (1) identify like/similar days in the national airspace system, (2) cluster/aggregate national-level rerouting data and (3) apply machine learning techniques to predict when Ground Delay Programs are required at a weather-impacted airport
ERIC Educational Resources Information Center
Ifenthaler, Dirk; Widanapathirana, Chathuranga
2014-01-01
Interest in collecting and mining large sets of educational data on student background and performance to conduct research on learning and instruction has developed as an area generally referred to as learning analytics. Higher education leaders are recognizing the value of learning analytics for improving not only learning and teaching but also…
Developing an Intelligent Diagnosis and Assessment E-Learning Tool for Introductory Programming
ERIC Educational Resources Information Center
Huang, Chenn-Jung; Chen, Chun-Hua; Luo, Yun-Cheng; Chen, Hong-Xin; Chuang, Yi-Ta
2008-01-01
Recently, a lot of open source e-learning platforms have been offered for free in the Internet. We thus incorporate the intelligent diagnosis and assessment tool into an open software e-learning platform developed for programming language courses, wherein the proposed learning diagnosis assessment tools based on text mining and machine learning…
CANFAR+Skytree: A Cloud Computing and Data Mining System for Astronomy
NASA Astrophysics Data System (ADS)
Ball, N. M.
2013-10-01
This is a companion Focus Demonstration article to the CANFAR+Skytree poster (Ball 2013, this volume), demonstrating the usage of the Skytree machine learning software on the Canadian Advanced Network for Astronomical Research (CANFAR) cloud computing system. CANFAR+Skytree is the world's first cloud computing system for data mining in astronomy.
Data Mining in Earth System Science (DMESS 2011)
Forrest M. Hoffman; J. Walter Larson; Richard Tran Mills; Bhorn-Gustaf Brooks; Auroop R. Ganguly; William Hargrove; et al
2011-01-01
From field-scale measurements to global climate simulations and remote sensing, the growing body of very large and long time series Earth science data are increasingly difficult to analyze, visualize, and interpret. Data mining, information theoretic, and machine learning techniquesâsuch as cluster analysis, singular value decomposition, block entropy, Fourier and...
NASA Astrophysics Data System (ADS)
Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali
Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).
Survey of Machine Learning Methods for Database Security
NASA Astrophysics Data System (ADS)
Kamra, Ashish; Ber, Elisa
Application of machine learning techniques to database security is an emerging area of research. In this chapter, we present a survey of various approaches that use machine learning/data mining techniques to enhance the traditional security mechanisms of databases. There are two key database security areas in which these techniques have found applications, namely, detection of SQL Injection attacks and anomaly detection for defending against insider threats. Apart from the research prototypes and tools, various third-party commercial products are also available that provide database activity monitoring solutions by profiling database users and applications. We present a survey of such products. We end the chapter with a primer on mechanisms for responding to database anomalies.
VoPham, Trang; Hart, Jaime E; Laden, Francine; Chiang, Yao-Yi
2018-04-17
Geospatial artificial intelligence (geoAI) is an emerging scientific discipline that combines innovations in spatial science, artificial intelligence methods in machine learning (e.g., deep learning), data mining, and high-performance computing to extract knowledge from spatial big data. In environmental epidemiology, exposure modeling is a commonly used approach to conduct exposure assessment to determine the distribution of exposures in study populations. geoAI technologies provide important advantages for exposure modeling in environmental epidemiology, including the ability to incorporate large amounts of big spatial and temporal data in a variety of formats; computational efficiency; flexibility in algorithms and workflows to accommodate relevant characteristics of spatial (environmental) processes including spatial nonstationarity; and scalability to model other environmental exposures across different geographic areas. The objectives of this commentary are to provide an overview of key concepts surrounding the evolving and interdisciplinary field of geoAI including spatial data science, machine learning, deep learning, and data mining; recent geoAI applications in research; and potential future directions for geoAI in environmental epidemiology.
Introduction to machine learning for brain imaging.
Lemm, Steven; Blankertz, Benjamin; Dickhaus, Thorsten; Müller, Klaus-Robert
2011-05-15
Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences. Copyright © 2010 Elsevier Inc. All rights reserved.
The Next Era: Deep Learning in Pharmaceutical Research.
Ekins, Sean
2016-11-01
Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule's properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.
Piccinini, Filippo; Balassa, Tamas; Szkalisity, Abel; Molnar, Csaba; Paavolainen, Lassi; Kujala, Kaisa; Buzas, Krisztina; Sarazova, Marie; Pietiainen, Vilja; Kutay, Ulrike; Smith, Kevin; Horvath, Peter
2017-06-28
High-content, imaging-based screens now routinely generate data on a scale that precludes manual verification and interrogation. Software applying machine learning has become an essential tool to automate analysis, but these methods require annotated examples to learn from. Efficiently exploring large datasets to find relevant examples remains a challenging bottleneck. Here, we present Advanced Cell Classifier (ACC), a graphical software package for phenotypic analysis that addresses these difficulties. ACC applies machine-learning and image-analysis methods to high-content data generated by large-scale, cell-based experiments. It features methods to mine microscopic image data, discover new phenotypes, and improve recognition performance. We demonstrate that these features substantially expedite the training process, successfully uncover rare phenotypes, and improve the accuracy of the analysis. ACC is extensively documented, designed to be user-friendly for researchers without machine-learning expertise, and distributed as a free open-source tool at www.cellclassifier.org. Copyright © 2017 Elsevier Inc. All rights reserved.
76 FR 70075 - Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-10
... Detection Systems for Continuous Mining Machines in Underground Coal Mines AGENCY: Mine Safety and Health... proposed rule addressing Proximity Detection Systems for Continuous Mining Machines in Underground Coal... Detection Systems for Continuous Mining Machines in Underground Coal Mines. MSHA conducted hearings on...
76 FR 63238 - Proximity Detection Systems for Continuous Mining Machines in Underground Coal Mines
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-12
... Detection Systems for Continuous Mining Machines in Underground Coal Mines AGENCY: Mine Safety and Health... Agency's proposed rule addressing Proximity Detection Systems for Continuous Mining Machines in... proposed rule for Proximity Detection Systems on Continuous Mining Machines in Underground Coal Mines. Due...
A systematic mapping study of process mining
NASA Astrophysics Data System (ADS)
Maita, Ana Rocío Cárdenas; Martins, Lucas Corrêa; López Paz, Carlos Ramón; Rafferty, Laura; Hung, Patrick C. K.; Peres, Sarajane Marques; Fantinato, Marcelo
2018-05-01
This study systematically assesses the process mining scenario from 2005 to 2014. The analysis of 705 papers evidenced 'discovery' (71%) as the main type of process mining addressed and 'categorical prediction' (25%) as the main mining task solved. The most applied traditional technique is the 'graph structure-based' ones (38%). Specifically concerning computational intelligence and machine learning techniques, we concluded that little relevance has been given to them. The most applied are 'evolutionary computation' (9%) and 'decision tree' (6%), respectively. Process mining challenges, such as balancing among robustness, simplicity, accuracy and generalization, could benefit from a larger use of such techniques.
The LSST Data Mining Research Agenda
NASA Astrophysics Data System (ADS)
Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.
2008-12-01
We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
A Comparison of different learning models used in Data Mining for Medical Data
NASA Astrophysics Data System (ADS)
Srimani, P. K.; Koti, Manjula Sanjay
2011-12-01
The present study aims at investigating the different Data mining learning models for different medical data sets and to give practical guidelines to select the most appropriate algorithm for a specific medical data set. In practical situations, it is absolutely necessary to take decisions with regard to the appropriate models and parameters for diagnosis and prediction problems. Learning models and algorithms are widely implemented for rule extraction and the prediction of system behavior. In this paper, some of the well-known Machine Learning(ML) systems are investigated for different methods and are tested on five medical data sets. The practical criteria for evaluating different learning models are presented and the potential benefits of the proposed methodology for diagnosis and learning are suggested.
Generating a Spanish Affective Dictionary with Supervised Learning Techniques
ERIC Educational Resources Information Center
Bermudez-Gonzalez, Daniel; Miranda-Jiménez, Sabino; García-Moreno, Raúl-Ulises; Calderón-Nepamuceno, Dora
2016-01-01
Nowadays, machine learning techniques are being used in several Natural Language Processing (NLP) tasks such as Opinion Mining (OM). OM is used to analyse and determine the affective orientation of texts. Usually, OM approaches use affective dictionaries in order to conduct sentiment analysis. These lexicons are labeled manually with affective…
Obtaining Accurate Probabilities Using Classifier Calibration
ERIC Educational Resources Information Center
Pakdaman Naeini, Mahdi
2016-01-01
Learning probabilistic classification and prediction models that generate accurate probabilities is essential in many prediction and decision-making tasks in machine learning and data mining. One way to achieve this goal is to post-process the output of classification models to obtain more accurate probabilities. These post-processing methods are…
Nariya, Maulik K; Kim, Jae Hyun; Xiong, Jian; Kleindl, Peter A; Hewarathna, Asha; Fisher, Adam C; Joshi, Sangeeta B; Schöneich, Christian; Forrest, M Laird; Middaugh, C Russell; Volkin, David B; Deeds, Eric J
2017-11-01
There is growing interest in generating physicochemical and biological analytical data sets to compare complex mixture drugs, for example, products from different manufacturers. In this work, we compare various crofelemer samples prepared from a single lot by filtration with varying molecular weight cutoffs combined with incubation for different times at different temperatures. The 2 preceding articles describe experimental data sets generated from analytical characterization of fractionated and degraded crofelemer samples. In this work, we use data mining techniques such as principal component analysis and mutual information scores to help visualize the data and determine discriminatory regions within these large data sets. The mutual information score identifies chemical signatures that differentiate crofelemer samples. These signatures, in many cases, would likely be missed by traditional data analysis tools. We also found that supervised learning classifiers robustly discriminate samples with around 99% classification accuracy, indicating that mathematical models of these physicochemical data sets are capable of identifying even subtle differences in crofelemer samples. Data mining and machine learning techniques can thus identify fingerprint-type attributes of complex mixture drugs that may be used for comparative characterization of products. Copyright © 2017 American Pharmacists Association®. All rights reserved.
On-line Machine Learning and Event Detection in Petascale Data Streams
NASA Astrophysics Data System (ADS)
Thompson, David R.; Wagstaff, K. L.
2012-01-01
Traditional statistical data mining involves off-line analysis in which all data are available and equally accessible. However, petascale datasets have challenged this premise since it is often impossible to store, let alone analyze, the relevant observations. This has led the machine learning community to investigate adaptive processing chains where data mining is a continuous process. Here pattern recognition permits triage and followup decisions at multiple stages of a processing pipeline. Such techniques can also benefit new astronomical instruments such as the Large Synoptic Survey Telescope (LSST) and Square Kilometre Array (SKA) that will generate petascale data volumes. We summarize some machine learning perspectives on real time data mining, with representative cases of astronomical applications and event detection in high volume datastreams. The first is a "supervised classification" approach currently used for transient event detection at the Very Long Baseline Array (VLBA). It injects known signals of interest - faint single-pulse anomalies - and tunes system parameters to recover these events. This permits meaningful event detection for diverse instrument configurations and observing conditions whose noise cannot be well-characterized in advance. Second, "semi-supervised novelty detection" finds novel events based on statistical deviations from previous patterns. It detects outlier signals of interest while considering known examples of false alarm interference. Applied to data from the Parkes pulsar survey, the approach identifies anomalous "peryton" phenomena that do not match previous event models. Finally, we consider online light curve classification that can trigger adaptive followup measurements of candidate events. Classifier performance analyses suggest optimal survey strategies, and permit principled followup decisions from incomplete data. These examples trace a broad range of algorithm possibilities available for online astronomical data mining. This talk describes research performed at the Jet Propulsion Laboratory, California Institute of Technology. Copyright 2012, All Rights Reserved. U.S. Government support acknowledged.
Jo, ByungWan
2018-01-01
The implementation of wireless sensor networks (WSNs) for monitoring the complex, dynamic, and harsh environment of underground coal mines (UCMs) is sought around the world to enhance safety. However, previously developed smart systems are limited to monitoring or, in a few cases, can report events. Therefore, this study introduces a reliable, efficient, and cost-effective internet of things (IoT) system for air quality monitoring with newly added features of assessment and pollutant prediction. This system is comprised of sensor modules, communication protocols, and a base station, running Azure Machine Learning (AML) Studio over it. Arduino-based sensor modules with eight different parameters were installed at separate locations of an operational UCM. Based on the sensed data, the proposed system assesses mine air quality in terms of the mine environment index (MEI). Principal component analysis (PCA) identified CH4, CO, SO2, and H2S as the most influencing gases significantly affecting mine air quality. The results of PCA were fed into the ANN model in AML studio, which enabled the prediction of MEI. An optimum number of neurons were determined for both actual input and PCA-based input parameters. The results showed a better performance of the PCA-based ANN for MEI prediction, with R2 and RMSE values of 0.6654 and 0.2104, respectively. Therefore, the proposed Arduino and AML-based system enhances mine environmental safety by quickly assessing and predicting mine air quality. PMID:29561777
Jo, ByungWan; Khan, Rana Muhammad Asad
2018-03-21
The implementation of wireless sensor networks (WSNs) for monitoring the complex, dynamic, and harsh environment of underground coal mines (UCMs) is sought around the world to enhance safety. However, previously developed smart systems are limited to monitoring or, in a few cases, can report events. Therefore, this study introduces a reliable, efficient, and cost-effective internet of things (IoT) system for air quality monitoring with newly added features of assessment and pollutant prediction. This system is comprised of sensor modules, communication protocols, and a base station, running Azure Machine Learning (AML) Studio over it. Arduino-based sensor modules with eight different parameters were installed at separate locations of an operational UCM. Based on the sensed data, the proposed system assesses mine air quality in terms of the mine environment index (MEI). Principal component analysis (PCA) identified CH₄, CO, SO₂, and H₂S as the most influencing gases significantly affecting mine air quality. The results of PCA were fed into the ANN model in AML studio, which enabled the prediction of MEI. An optimum number of neurons were determined for both actual input and PCA-based input parameters. The results showed a better performance of the PCA-based ANN for MEI prediction, with R ² and RMSE values of 0.6654 and 0.2104, respectively. Therefore, the proposed Arduino and AML-based system enhances mine environmental safety by quickly assessing and predicting mine air quality.
PMLB: a large benchmark suite for machine learning evaluation and comparison.
Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H
2017-01-01
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Data Mining Research with the LSST
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Strauss, M. A.; Tyson, J. A.
2007-12-01
The LSST catalog database will exceed 10 petabytes, comprising several hundred attributes for 5 billion galaxies, 10 billion stars, and over 1 billion variable sources (optical variables, transients, or moving objects), extracted from over 20,000 square degrees of deep imaging in 5 passbands with thorough time domain coverage: 1000 visits over the 10-year LSST survey lifetime. The opportunities are enormous for novel scientific discoveries within this rich time-domain ultra-deep multi-band survey database. Data Mining, Machine Learning, and Knowledge Discovery research opportunities with the LSST are now under study, with a potential for new collaborations to develop to contribute to these investigations. We will describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. We also give some illustrative examples of current scientific data mining research in astronomy, and point out where new research is needed. In particular, the data mining research community will need to address several issues in the coming years as we prepare for the LSST data deluge. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; visual data mining algorithms for visual exploration of the data; indexing of multi-attribute multi-dimensional astronomical databases (beyond RA-Dec spatial indexing) for rapid querying of petabyte databases; and more. Finally, we will identify opportunities for synergistic collaboration between the data mining research group and the LSST Data Management and Science Collaboration teams.
30 CFR 75.1719-4 - Mining machines, cap lamps; requirements.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 30 Mineral Resources 1 2012-07-01 2012-07-01 false Mining machines, cap lamps; requirements. 75... Mining machines, cap lamps; requirements. (a) Paint used on exterior surfaces of mining machines shall... frames or reflecting tape shall be installed on each end of mining machines, except that continuous...
30 CFR 75.1719-4 - Mining machines, cap lamps; requirements.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 30 Mineral Resources 1 2014-07-01 2014-07-01 false Mining machines, cap lamps; requirements. 75... Mining machines, cap lamps; requirements. (a) Paint used on exterior surfaces of mining machines shall... frames or reflecting tape shall be installed on each end of mining machines, except that continuous...
The Next Era: Deep Learning in Pharmaceutical Research
Ekins, Sean
2016-01-01
Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule’s properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique. PMID:27599991
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree.
Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-Koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid
2016-10-01
Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. The proposed model had an accuracy of 94.84% (. 24.42) in order to correct prediction of the ESD disease. Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD.
Scalable Machine Learning for Massive Astronomical Datasets
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.; Gray, A.
2014-04-01
We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
Scalable Machine Learning for Massive Astronomical Datasets
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.; Astronomy Data Centre, Canadian
2014-01-01
We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
Screening Electronic Health Record-Related Patient Safety Reports Using Machine Learning.
Marella, William M; Sparnon, Erin; Finley, Edward
2017-03-01
The objective of this study was to develop a semiautomated approach to screening cases that describe hazards associated with the electronic health record (EHR) from a mandatory, population-based patient safety reporting system. Potentially relevant cases were identified through a query of the Pennsylvania Patient Safety Reporting System. A random sample of cases were manually screened for relevance and divided into training, testing, and validation data sets to develop a machine learning model. This model was used to automate screening of remaining potentially relevant cases. Of the 4 algorithms tested, a naive Bayes kernel performed best, with an area under the receiver operating characteristic curve of 0.927 ± 0.023, accuracy of 0.855 ± 0.033, and F score of 0.877 ± 0.027. The machine learning model and text mining approach described here are useful tools for identifying and analyzing adverse event and near-miss reports. Although reporting systems are beginning to incorporate structured fields on health information technology and the EHR, these methods can identify related events that reporters classify in other ways. These methods can facilitate analysis of legacy safety reports by retrieving health information technology-related and EHR-related events from databases without fields and controlled values focused on this subject and distinguishing them from reports in which the EHR is mentioned only in passing. Machine learning and text mining are useful additions to the patient safety toolkit and can be used to semiautomate screening and analysis of unstructured text in safety reports from frontline staff.
Literature Mining of Pathogenesis-Related Proteins in Human Pathogens for Database Annotation
2009-10-01
person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control...submission and for literature mining result display with automatically tagged abstracts. I. Literature data sets for machine learning algorithm training...mass spectrometry) proteomics data from Burkholderia strains. • Task1 ( M13 -15): Preliminary analysis of the Burkholderia proteomic space
Van Landeghem, Sofie; Abeel, Thomas; Saeys, Yvan; Van de Peer, Yves
2010-09-15
In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).
NASA Astrophysics Data System (ADS)
Gibril, Mohamed Barakat A.; Idrees, Mohammed Oludare; Yao, Kouame; Shafri, Helmi Zulhaidi Mohd
2018-01-01
The growing use of optimization for geographic object-based image analysis and the possibility to derive a wide range of information about the image in textual form makes machine learning (data mining) a versatile tool for information extraction from multiple data sources. This paper presents application of data mining for land-cover classification by fusing SPOT-6, RADARSAT-2, and derived dataset. First, the images and other derived indices (normalized difference vegetation index, normalized difference water index, and soil adjusted vegetation index) were combined and subjected to segmentation process with optimal segmentation parameters obtained using combination of spatial and Taguchi statistical optimization. The image objects, which carry all the attributes of the input datasets, were extracted and related to the target land-cover classes through data mining algorithms (decision tree) for classification. To evaluate the performance, the result was compared with two nonparametric classifiers: support vector machine (SVM) and random forest (RF). Furthermore, the decision tree classification result was evaluated against six unoptimized trials segmented using arbitrary parameter combinations. The result shows that the optimized process produces better land-use land-cover classification with overall classification accuracy of 91.79%, 87.25%, and 88.69% for SVM and RF, respectively, while the results of the six unoptimized classifications yield overall accuracy between 84.44% and 88.08%. Higher accuracy of the optimized data mining classification approach compared to the unoptimized results indicates that the optimization process has significant impact on the classification quality.
NASA Astrophysics Data System (ADS)
Yosipof, Abraham; Guedes, Rita C.; García-Sosa, Alfonso T.
2018-05-01
Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neuronal network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.
Yosipof, Abraham; Guedes, Rita C; García-Sosa, Alfonso T
2018-01-01
Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features or in case of visualization methods uncover underlying patterns in the feature space. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neural network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.
Alves, Pedro; Liu, Shuang; Wang, Daifeng; Gerstein, Mark
2018-01-01
Machine learning is an integral part of computational biology, and has already shown its use in various applications, such as prognostic tests. In the last few years in the non-biological machine learning community, ensembling techniques have shown their power in data mining competitions such as the Netflix challenge; however, such methods have not found wide use in computational biology. In this work, we endeavor to show how ensembling techniques can be applied to practical problems, including problems in the field of bioinformatics, and how they often outperform other machine learning techniques in both predictive power and robustness. Furthermore, we develop a methodology of ensembling, Multi-Swarm Ensemble (MSWE) by using multiple particle swarm optimizations and demonstrate its ability to further enhance the performance of ensembles.
An IPSO-SVM algorithm for security state prediction of mine production logistics system
NASA Astrophysics Data System (ADS)
Zhang, Yanliang; Lei, Junhui; Ma, Qiuli; Chen, Xin; Bi, Runfang
2017-06-01
A theoretical basis for the regulation of corporate security warning and resources was provided in order to reveal the laws behind the security state in mine production logistics. Considering complex mine production logistics system and the variable is difficult to acquire, a superior security status predicting model of mine production logistics system based on the improved particle swarm optimization and support vector machine (IPSO-SVM) is proposed in this paper. Firstly, through the linear adjustments of inertia weight and learning weights, the convergence speed and search accuracy are enhanced with the aim to deal with situations associated with the changeable complexity and the data acquisition difficulty. The improved particle swarm optimization (IPSO) is then introduced to resolve the problem of parameter settings in traditional support vector machines (SVM). At the same time, security status index system is built to determine the classification standards of safety status. The feasibility and effectiveness of this method is finally verified using the experimental results.
Study of Environmental Data Complexity using Extreme Learning Machine
NASA Astrophysics Data System (ADS)
Leuenberger, Michael; Kanevski, Mikhail
2017-04-01
The main goals of environmental data science using machine learning algorithm deal, in a broad sense, around the calibration, the prediction and the visualization of hidden relationship between input and output variables. In order to optimize the models and to understand the phenomenon under study, the characterization of the complexity (at different levels) should be taken into account. Therefore, the identification of the linear or non-linear behavior between input and output variables adds valuable information for the knowledge of the phenomenon complexity. The present research highlights and investigates the different issues that can occur when identifying the complexity (linear/non-linear) of environmental data using machine learning algorithm. In particular, the main attention is paid to the description of a self-consistent methodology for the use of Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. By applying two ELM models (with linear and non-linear activation functions) and by comparing their efficiency, quantification of the linearity can be evaluated. The considered approach is accompanied by simulated and real high dimensional and multivariate data case studies. In conclusion, the current challenges and future development in complexity quantification using environmental data mining are discussed. References - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392. - Leuenberger, M., Kanevski, M., 2015. Extreme Learning Machines for spatial environmental data. Computers and Geosciences 85, 64-73.
Distributed communications and control network for robotic mining
NASA Technical Reports Server (NTRS)
Schiffbauer, William H.
1989-01-01
The application of robotics to coal mining machines is one approach pursued to increase productivity while providing enhanced safety for the coal miner. Toward that end, a network composed of microcontrollers, computers, expert systems, real time operating systems, and a variety of program languages are being integrated that will act as the backbone for intelligent machine operation. Actual mining machines, including a few customized ones, have been given telerobotic semiautonomous capabilities by applying the described network. Control devices, intelligent sensors and computers onboard these machines are showing promise of achieving improved mining productivity and safety benefits. Current research using these machines involves navigation, multiple machine interaction, machine diagnostics, mineral detection, and graphical machine representation. Guidance sensors and systems employed include: sonar, laser rangers, gyroscopes, magnetometers, clinometers, and accelerometers. Information on the network of hardware/software and its implementation on mining machines are presented. Anticipated coal production operations using the network are discussed. A parallelism is also drawn between the direction of present day underground coal mining research to how the lunar soil (regolith) may be mined. A conceptual lunar mining operation that employs a distributed communication and control network is detailed.
Topic categorisation of statements in suicide notes with integrated rules and machine learning.
Kovačević, Aleksandar; Dehghan, Azad; Keane, John A; Nenadic, Goran
2012-01-01
We describe and evaluate an automated approach used as part of the i2b2 2011 challenge to identify and categorise statements in suicide notes into one of 15 topics, including Love, Guilt, Thankfulness, Hopelessness and Instructions. The approach combines a set of lexico-syntactic rules with a set of models derived by machine learning from a training dataset. The machine learning models rely on named entities, lexical, lexico-semantic and presentation features, as well as the rules that are applicable to a given statement. On a testing set of 300 suicide notes, the approach showed the overall best micro F-measure of up to 53.36%. The best precision achieved was 67.17% when only rules are used, whereas best recall of 50.57% was with integrated rules and machine learning. While some topics (eg, Sorrow, Anger, Blame) prove challenging, the performance for relatively frequent (eg, Love) and well-scoped categories (eg, Thankfulness) was comparatively higher (precision between 68% and 79%), suggesting that automated text mining approaches can be effective in topic categorisation of suicide notes.
Large-scale machine learning and evaluation platform for real-time traffic surveillance
NASA Astrophysics Data System (ADS)
Eichel, Justin A.; Mishra, Akshaya; Miller, Nicholas; Jankovic, Nicholas; Thomas, Mohan A.; Abbott, Tyler; Swanson, Douglas; Keller, Joel
2016-09-01
In traffic engineering, vehicle detectors are trained on limited datasets, resulting in poor accuracy when deployed in real-world surveillance applications. Annotating large-scale high-quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud-based positive and negative mining process and a large-scale learning and evaluation system for the application of automatic traffic measurements and classification. The proposed positive and negative mining process addresses the quality of crowd sourced ground truth data through machine learning review and human feedback mechanisms. The proposed learning and evaluation system uses a distributed cloud computing framework to handle data-scaling issues associated with large numbers of samples and a high-dimensional feature space. The system is trained using AdaBoost on 1,000,000 Haar-like features extracted from 70,000 annotated video frames. The trained real-time vehicle detector achieves an accuracy of at least 95% for 1/2 and about 78% for 19/20 of the time when tested on ˜7,500,000 video frames. At the end of 2016, the dataset is expected to have over 1 billion annotated video frames.
Microcomputer network for control of a continuous mining machine. Information circular/1993
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schiffbauer, W.H.
1993-01-01
The paper details a microcomputer-based control and monitoring network that was developed in-house by the U.S. Bureau of Mines, and installed on a Joy 14 continuous mining machine. The network consists of microcomputers that are connected together via a single twisted pair cable. Each microcomputer was developed to provide a particular function in the control process. Machine-mounted microcomputers in conjunction with the appropriate sensors provide closed-loop control of the machine, navigation, and environmental monitoring. Off-the-machine microcomputers provide remote control of the machine, sensor status, and a connection to the network so that external computers can access network data and controlmore » the continuous mining machine. Although the network was installed on a Joy 14 continuous mining machine, its use extends beyond it. Its generic structure lends itself to installation onto most mining machine types.« less
Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.
Armañanzas, Rubén; Alonso-Nanclares, Lidia; Defelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G; Defelipe, Javier; Bielza, Concha; Larrañaga, Pedro
2013-01-01
Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery.
Machine Learning Approach for the Outcome Prediction of Temporal Lobe Epilepsy Surgery
DeFelipe-Oroquieta, Jesús; Kastanauskaite, Asta; de Sola, Rafael G.; DeFelipe, Javier; Bielza, Concha; Larrañaga, Pedro
2013-01-01
Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE). Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery. PMID:23646148
Automation and robotics technology for intelligent mining systems
NASA Technical Reports Server (NTRS)
Welsh, Jeffrey H.
1989-01-01
The U.S. Bureau of Mines is approaching the problems of accidents and efficiency in the mining industry through the application of automation and robotics to mining systems. This technology can increase safety by removing workers from hazardous areas of the mines or from performing hazardous tasks. The short-term goal of the Automation and Robotics program is to develop technology that can be implemented in the form of an autonomous mining machine using current continuous mining machine equipment. In the longer term, the goal is to conduct research that will lead to new intelligent mining systems that capitalize on the capabilities of robotics. The Bureau of Mines Automation and Robotics program has been structured to produce the technology required for the short- and long-term goals. The short-term goal of application of automation and robotics to an existing mining machine, resulting in autonomous operation, is expected to be accomplished within five years. Key technology elements required for an autonomous continuous mining machine are well underway and include machine navigation systems, coal-rock interface detectors, machine condition monitoring, and intelligent computer systems. The Bureau of Mines program is described, including status of key technology elements for an autonomous continuous mining machine, the program schedule, and future work. Although the program is directed toward underground mining, much of the technology being developed may have applications for space systems or mining on the Moon or other planets.
Exploring Characterizations of Learning Object Repositories Using Data Mining Techniques
NASA Astrophysics Data System (ADS)
Segura, Alejandra; Vidal, Christian; Menendez, Victor; Zapata, Alfredo; Prieto, Manuel
Learning object repositories provide a platform for the sharing of Web-based educational resources. As these repositories evolve independently, it is difficult for users to have a clear picture of the kind of contents they give access to. Metadata can be used to automatically extract a characterization of these resources by using machine learning techniques. This paper presents an exploratory study carried out in the contents of four public repositories that uses clustering and association rule mining algorithms to extract characterizations of repository contents. The results of the analysis include potential relationships between different attributes of learning objects that may be useful to gain an understanding of the kind of resources available and eventually develop search mechanisms that consider repository descriptions as a criteria in federated search.
He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo
2017-03-01
Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree
Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid
2016-01-01
Introduction: Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. Objective: we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. Methods: we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. Results: The proposed model had an accuracy of 94.84% ( Standard Deviation: 24.42) in order to correct prediction of the ESD disease. Conclusions: Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD. PMID:28077889
30 CFR 75.205 - Installation of roof support using mining machines with integral roof bolters.
Code of Federal Regulations, 2011 CFR
2011-07-01
... machines with integral roof bolters. 75.205 Section 75.205 Mineral Resources MINE SAFETY AND HEALTH... Roof Support § 75.205 Installation of roof support using mining machines with integral roof bolters. When roof bolts are installed by a continuous mining machine with intregal roof bolting equipment: (a...
Human Systems Integration (HSI) Associated Development Activities in Japan
2008-06-12
machine learning and data mining methods. The continuous effort ( KAIZEN ) to improve the analysis phases are illustrated in Figure 14. Although there...model Extraction of a workflow Extraction of a control rule Variation analysis and improvement Plant operation KAIZEN Fig. 14
SparkText: Biomedical Text Mining on Big Data Framework.
Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M
Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
SparkText: Biomedical Text Mining on Big Data Framework
He, Karen Y.; Wang, Kai
2016-01-01
Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
Automatic Earthquake Detection by Active Learning
NASA Astrophysics Data System (ADS)
Bergen, K.; Beroza, G. C.
2017-12-01
In recent years, advances in machine learning have transformed fields such as image recognition, natural language processing and recommender systems. Many of these performance gains have relied on the availability of large, labeled data sets to train high-accuracy models; labeled data sets are those for which each sample includes a target class label, such as waveforms tagged as either earthquakes or noise. Earthquake seismologists are increasingly leveraging machine learning and data mining techniques to detect and analyze weak earthquake signals in large seismic data sets. One of the challenges in applying machine learning to seismic data sets is the limited labeled data problem; learning algorithms need to be given examples of earthquake waveforms, but the number of known events, taken from earthquake catalogs, may be insufficient to build an accurate detector. Furthermore, earthquake catalogs are known to be incomplete, resulting in training data that may be biased towards larger events and contain inaccurate labels. This challenge is compounded by the class imbalance problem; the events of interest, earthquakes, are infrequent relative to noise in continuous data sets, and many learning algorithms perform poorly on rare classes. In this work, we investigate the use of active learning for automatic earthquake detection. Active learning is a type of semi-supervised machine learning that uses a human-in-the-loop approach to strategically supplement a small initial training set. The learning algorithm incorporates domain expertise through interaction between a human expert and the algorithm, with the algorithm actively posing queries to the user to improve detection performance. We demonstrate the potential of active machine learning to improve earthquake detection performance with limited available training data.
Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.
2014-01-01
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
Supporting Solar Physics Research via Data Mining
NASA Astrophysics Data System (ADS)
Angryk, Rafal; Banda, J.; Schuh, M.; Ganesan Pillai, K.; Tosun, H.; Martens, P.
2012-05-01
In this talk we will briefly introduce three pillars of data mining (i.e. frequent patterns discovery, classification, and clustering), and discuss some possible applications of known data mining techniques which can directly benefit solar physics research. In particular, we plan to demonstrate applicability of frequent patterns discovery methods for the verification of hypotheses about co-occurrence (in space and time) of filaments and sigmoids. We will also show how classification/machine learning algorithms can be utilized to verify human-created software modules to discover individual types of solar phenomena. Finally, we will discuss applicability of clustering techniques to image data processing.
NASA Astrophysics Data System (ADS)
Herold, Julia; Abouna, Sylvie; Zhou, Luxian; Pelengaris, Stella; Epstein, David B. A.; Khan, Michael; Nattkemper, Tim W.
2009-02-01
In the last years, bioimaging has turned from qualitative measurements towards a high-throughput and highcontent modality, providing multiple variables for each biological sample analyzed. We present a system which combines machine learning based semantic image annotation and visual data mining to analyze such new multivariate bioimage data. Machine learning is employed for automatic semantic annotation of regions of interest. The annotation is the prerequisite for a biological object-oriented exploration of the feature space derived from the image variables. With the aid of visual data mining, the obtained data can be explored simultaneously in the image as well as in the feature domain. Especially when little is known of the underlying data, for example in the case of exploring the effects of a drug treatment, visual data mining can greatly aid the process of data evaluation. We demonstrate how our system is used for image evaluation to obtain information relevant to diabetes study and screening of new anti-diabetes treatments. Cells of the Islet of Langerhans and whole pancreas in pancreas tissue samples are annotated and object specific molecular features are extracted from aligned multichannel fluorescence images. These are interactively evaluated for cell type classification in order to determine the cell number and mass. Only few parameters need to be specified which makes it usable also for non computer experts and allows for high-throughput analysis.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-08
... Collection; Comment Request; High-Voltage Continuous Mining Machines Standards for Underground Coal Mines... Act of 1995. This program helps to assure that requested data can be provided in the desired format... maintains the safe use of high-voltage continuous mining machines in underground coal mines by requiring...
Smart Point Cloud: Definition and Remaining Challenges
NASA Astrophysics Data System (ADS)
Poux, F.; Hallot, P.; Neuville, R.; Billen, R.
2016-10-01
Dealing with coloured point cloud acquired from terrestrial laser scanner, this paper identifies remaining challenges for a new data structure: the smart point cloud. This concept arises with the statement that massive and discretized spatial information from active remote sensing technology is often underused due to data mining limitations. The generalisation of point cloud data associated with the heterogeneity and temporality of such datasets is the main issue regarding structure, segmentation, classification, and interaction for an immediate understanding. We propose to use both point cloud properties and human knowledge through machine learning to rapidly extract pertinent information, using user-centered information (smart data) rather than raw data. A review of feature detection, machine learning frameworks and database systems indexed both for mining queries and data visualisation is studied. Based on existing approaches, we propose a new 3-block flexible framework around device expertise, analytic expertise and domain base reflexion. This contribution serves as the first step for the realisation of a comprehensive smart point cloud data structure.
Shouval, R; Bondi, O; Mishan, H; Shimoni, A; Unger, R; Nagler, A
2014-03-01
Data collected from hematopoietic SCT (HSCT) centers are becoming more abundant and complex owing to the formation of organized registries and incorporation of biological data. Typically, conventional statistical methods are used for the development of outcome prediction models and risk scores. However, these analyses carry inherent properties limiting their ability to cope with large data sets with multiple variables and samples. Machine learning (ML), a field stemming from artificial intelligence, is part of a wider approach for data analysis termed data mining (DM). It enables prediction in complex data scenarios, familiar to practitioners and researchers. Technological and commercial applications are all around us, gradually entering clinical research. In the following review, we would like to expose hematologists and stem cell transplanters to the concepts, clinical applications, strengths and limitations of such methods and discuss current research in HSCT. The aim of this review is to encourage utilization of the ML and DM techniques in the field of HSCT, including prediction of transplantation outcome and donor selection.
Men, Hong; Shi, Yan; Fu, Songlin; Jiao, Yanan; Qiao, Yu; Liu, Jingjing
2017-01-01
Multi-sensor data fusion can provide more comprehensive and more accurate analysis results. However, it also brings some redundant information, which is an important issue with respect to finding a feature-mining method for intuitive and efficient analysis. This paper demonstrates a feature-mining method based on variable accumulation to find the best expression form and variables’ behavior affecting beer flavor. First, e-tongue and e-nose were used to gather the taste and olfactory information of beer, respectively. Second, principal component analysis (PCA), genetic algorithm-partial least squares (GA-PLS), and variable importance of projection (VIP) scores were applied to select feature variables of the original fusion set. Finally, the classification models based on support vector machine (SVM), random forests (RF), and extreme learning machine (ELM) were established to evaluate the efficiency of the feature-mining method. The result shows that the feature-mining method based on variable accumulation obtains the main feature affecting beer flavor information, and the best classification performance for the SVM, RF, and ELM models with 96.67%, 94.44%, and 98.33% prediction accuracy, respectively. PMID:28753917
Knowledge based word-concept model estimation and refinement for biomedical text mining.
Jimeno Yepes, Antonio; Berlanga, Rafael
2015-02-01
Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
An immune-inspired semi-supervised algorithm for breast cancer diagnosis.
Peng, Lingxi; Chen, Wenbin; Zhou, Wubai; Li, Fufang; Yang, Jin; Zhang, Jiandong
2016-10-01
Breast cancer is the most frequently and world widely diagnosed life-threatening cancer, which is the leading cause of cancer death among women. Early accurate diagnosis can be a big plus in treating breast cancer. Researchers have approached this problem using various data mining and machine learning techniques such as support vector machine, artificial neural network, etc. The computer immunology is also an intelligent method inspired by biological immune system, which has been successfully applied in pattern recognition, combination optimization, machine learning, etc. However, most of these diagnosis methods belong to a supervised diagnosis method. It is very expensive to obtain labeled data in biology and medicine. In this paper, we seamlessly integrate the state-of-the-art research on life science with artificial intelligence, and propose a semi-supervised learning algorithm to reduce the need for labeled data. We use two well-known benchmark breast cancer datasets in our study, which are acquired from the UCI machine learning repository. Extensive experiments are conducted and evaluated on those two datasets. Our experimental results demonstrate the effectiveness and efficiency of our proposed algorithm, which proves that our algorithm is a promising automatic diagnosis method for breast cancer. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Gaur, Pallavi; Chaturvedi, Anoop
2017-07-22
The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.
Semi-Supervised Clustering for High-Dimensional and Sparse Features
ERIC Educational Resources Information Center
Yan, Su
2010-01-01
Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…
Topic Models for Link Prediction in Document Networks
ERIC Educational Resources Information Center
Kataria, Saurabh
2012-01-01
Recent explosive growth of interconnected document collections such as citation networks, network of web pages, content generated by crowd-sourcing in collaborative environments, etc., has posed several challenging problems for data mining and machine learning community. One central problem in the domain of document networks is that of "link…
Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela
2015-05-01
Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Data Mining Citizen Science Results
NASA Astrophysics Data System (ADS)
Borne, K. D.
2012-12-01
Scientific discovery from big data is enabled through multiple channels, including data mining (through the application of machine learning algorithms) and human computation (commonly implemented through citizen science tasks). We will describe the results of new data mining experiments on the results from citizen science activities. Discovering patterns, trends, and anomalies in data are among the powerful contributions of citizen science. Establishing scientific algorithms that can subsequently re-discover the same types of patterns, trends, and anomalies in automatic data processing pipelines will ultimately result from the transformation of those human algorithms into computer algorithms, which can then be applied to much larger data collections. Scientific discovery from big data is thus greatly amplified through the marriage of data mining with citizen science.
Lu, Ake Tzu-Hui; Austin, Erin; Bonner, Ashley; Huang, Hsin-Hsiung; Cantor, Rita M
2014-09-01
Machine learning methods (MLMs), designed to develop models using high-dimensional predictors, have been used to analyze genome-wide genetic and genomic data to predict risks for complex traits. We summarize the results from six contributions to our Genetic Analysis Workshop 18 working group; these investigators applied MLMs and data mining to analyses of rare and common genetic variants measured in pedigrees. To develop risk profiles, group members analyzed blood pressure traits along with single-nucleotide polymorphisms and rare variant genotypes derived from sequence and imputation analyses in large Mexican American pedigrees. Supervised MLMs included penalized regression with varying penalties, support vector machines, and permanental classification. Unsupervised MLMs included sparse principal components analysis and sparse graphical models. Entropy-based components analyses were also used to mine these data. None of the investigators fully capitalized on the genetic information provided by the complete pedigrees. Their approaches either corrected for the nonindependence of the individuals within the pedigrees or analyzed only those who were independent. Some methods allowed for covariate adjustment, whereas others did not. We evaluated these methods using a variety of metrics. Four contributors conducted primary analyses on the real data, and the other two research groups used the simulated data with and without knowledge of the underlying simulation model. One group used the answers to the simulated data to assess power and type I errors. Although the MLMs applied were substantially different, each research group concluded that MLMs have advantages over standard statistical approaches with these high-dimensional data. © 2014 WILEY PERIODICALS, INC.
A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam
NASA Astrophysics Data System (ADS)
Foozy, Cik Feresa Mohd; Ahmad, Rabiah; Faizal Abdollah, M. A.; Chai Wen, Chuah
2017-08-01
SMS Spamming is a serious attack that can manipulate the use of the SMS by spreading the advertisement in bulk. By sending the unwanted SMS that contain advertisement can make the users feeling disturb and this against the privacy of the mobile users. To overcome these issues, many studies have proposed to detect SMS Spam by using data mining tools. This paper will do a comparative study using five machine learning techniques such as Naïve Bayes, K-NN (K-Nearest Neighbour Algorithm), Decision Tree, Random Forest and Decision Stumps to observe the accuracy result between RapidMiner and WEKA for dataset SMS Spam UCI Machine Learning repository.
Greene, Casey S; Tan, Jie; Ung, Matthew; Moore, Jason H; Cheng, Chao
2014-12-01
Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both "machine learning" algorithms as well as "unsupervised" and "supervised" examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. © 2014 Wiley Periodicals, Inc.
Machine Learning for Detecting Gene-Gene Interactions
McKinney, Brett A.; Reif, David M.; Ritchie, Marylyn D.; Moore, Jason H.
2011-01-01
Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics. PMID:16722772
NASA Astrophysics Data System (ADS)
Leighs, J. A.; Halling-Brown, M. D.; Patel, M. N.
2018-03-01
The UK currently has a national breast cancer-screening program and images are routinely collected from a number of screening sites, representing a wealth of invaluable data that is currently under-used. Radiologists evaluate screening images manually and recall suspicious cases for further analysis such as biopsy. Histological testing of biopsy samples confirms the malignancy of the tumour, along with other diagnostic and prognostic characteristics such as disease grade. Machine learning is becoming increasingly popular for clinical image classification problems, as it is capable of discovering patterns in data otherwise invisible. This is particularly true when applied to medical imaging features; however clinical datasets are often relatively small. A texture feature extraction toolkit has been developed to mine a wide range of features from medical images such as mammograms. This study analysed a dataset of 1,366 radiologist-marked, biopsy-proven malignant lesions obtained from the OPTIMAM Medical Image Database (OMI-DB). Exploratory data analysis methods were employed to better understand extracted features. Machine learning techniques including Classification and Regression Trees (CART), ensemble methods (e.g. random forests), and logistic regression were applied to the data to predict the disease grade of the analysed lesions. Prediction scores of up to 83% were achieved; sensitivity and specificity of the models trained have been discussed to put the results into a clinical context. The results show promise in the ability to predict prognostic indicators from the texture features extracted and thus enable prioritisation of care for patients at greatest risk.
New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases
NASA Astrophysics Data System (ADS)
Brescia, Massimo
2012-11-01
Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.
A microcomputer network for control of a continuous mining machine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schiffbauer, W.H.
1993-12-31
This report details a microcomputer-based control and monitoring network that was developed in-house by the U.S. Bureau of Mines and installed on a continuous mining machine. The network consists of microcomputers that are connected together via a single twisted-pair cable. Each microcomputer was developed to provide a particular function in the control process. Machine-mounted microcomputers, in conjunction with the appropriate sensors, provide closed-loop control of the machine, navigation, and environmental monitoring. Off-the-machine microcomputers provide remote control of the machine, sensor status, and a connection to the network so that external computers can access network data and control the continuous miningmore » machine. Because of the network`s generic structure, it can be installed on most mining machines.« less
Unmanned Mine of the 21st Centuries
NASA Astrophysics Data System (ADS)
Semykina, Irina; Grigoryev, Aleksandr; Gargayev, Andrey; Zavyalov, Valeriy
2017-11-01
The article is analytical. It considers the construction principles of the automation system structure which realize the concept of «unmanned mine». All of these principles intend to deal with problems caused by a continuous complication of mining-and-geological conditions at coalmine such as the labor safety and health protection, the weak integration of different mining automation subsystems and the deficiency of optimal balance between a quantity of resource and energy consumed by mining machines and their throughput. The authors describe the main problems and neck stage of mining machines autonomation and automation subsystem. The article makes a general survey of the applied «unmanned technology» in the field of mining such as the remotely operated autonomous complexes, the underground positioning systems of mining machines using infrared radiation in mine workings etc. The concept of «unmanned mine» is considered with an example of the robotic road heading machine. In the final, the authors analyze the techniques and methods that could solve the task of underground mining without human labor.
NASA Astrophysics Data System (ADS)
Hoffmann, Achim; Mahidadia, Ashesh
The purpose of this chapter is to present fundamental ideas and techniques of machine learning suitable for the field of this book, i.e., for automated scientific discovery. The chapter focuses on those symbolic machine learning methods, which produce results that are suitable to be interpreted and understood by humans. This is particularly important in the context of automated scientific discovery as the scientific theories to be produced by machines are usually meant to be interpreted by humans. This chapter contains some of the most influential ideas and concepts in machine learning research to give the reader a basic insight into the field. After the introduction in Sect. 1, general ideas of how learning problems can be framed are given in Sect. 2. The section provides useful perspectives to better understand what learning algorithms actually do. Section 3 presents the Version space model which is an early learning algorithm as well as a conceptual framework, that provides important insight into the general mechanisms behind most learning algorithms. In section 4, a family of learning algorithms, the AQ family for learning classification rules is presented. The AQ family belongs to the early approaches in machine learning. The next, Sect. 5 presents the basic principles of decision tree learners. Decision tree learners belong to the most influential class of inductive learning algorithms today. Finally, a more recent group of learning systems are presented in Sect. 6, which learn relational concepts within the framework of logic programming. This is a particularly interesting group of learning systems since the framework allows also to incorporate background knowledge which may assist in generalisation. Section 7 discusses Association Rules - a technique that comes from the related field of Data mining. Section 8 presents the basic idea of the Naive Bayesian Classifier. While this is a very popular learning technique, the learning result is not well suited for human comprehension as it is essentially a large collection of probability values. In Sect. 9, we present a generic method for improving accuracy of a given learner by generatingmultiple classifiers using variations of the training data. While this works well in most cases, the resulting classifiers have significantly increased complexity and, hence, tend to destroy the human readability of the learning result that a single learner may produce. Section 10 contains a summary, mentions briefly other techniques not discussed in this chapter and presents outlook on the potential of machine learning in the future.
Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J; Bao, Forrest Sheng
2016-04-01
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.
Wu, Stephen Gang; Wang, Yuxuan; Jiang, Wu; Oyetunde, Tolutola; Yao, Ruilian; Zhang, Xuehong; Shimizu, Kazuyuki; Tang, Yinjie J.; Bao, Forrest Sheng
2016-01-01
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species. PMID:27092947
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research
ERIC Educational Resources Information Center
He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne
2018-01-01
In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…
30 CFR 18.54 - High-voltage continuous mining machines.
Code of Federal Regulations, 2010 CFR
2010-07-01
... and Design Requirements § 18.54 High-voltage continuous mining machines. (a) Separation of high... removed. (c) Circuit-interrupting devices. Circuit-interrupting devices must be designed and installed to... ground. (e) Onboard ungrounded, three-phase power circuit. A continuous mining machine designed with an...
30 CFR 18.54 - High-voltage continuous mining machines.
Code of Federal Regulations, 2013 CFR
2013-07-01
... and Design Requirements § 18.54 High-voltage continuous mining machines. (a) Separation of high... removed. (c) Circuit-interrupting devices. Circuit-interrupting devices must be designed and installed to... ground. (e) Onboard ungrounded, three-phase power circuit. A continuous mining machine designed with an...
30 CFR 18.54 - High-voltage continuous mining machines.
Code of Federal Regulations, 2014 CFR
2014-07-01
... and Design Requirements § 18.54 High-voltage continuous mining machines. (a) Separation of high... removed. (c) Circuit-interrupting devices. Circuit-interrupting devices must be designed and installed to... ground. (e) Onboard ungrounded, three-phase power circuit. A continuous mining machine designed with an...
30 CFR 18.54 - High-voltage continuous mining machines.
Code of Federal Regulations, 2012 CFR
2012-07-01
... and Design Requirements § 18.54 High-voltage continuous mining machines. (a) Separation of high... removed. (c) Circuit-interrupting devices. Circuit-interrupting devices must be designed and installed to... ground. (e) Onboard ungrounded, three-phase power circuit. A continuous mining machine designed with an...
30 CFR 18.54 - High-voltage continuous mining machines.
Code of Federal Regulations, 2011 CFR
2011-07-01
... and Design Requirements § 18.54 High-voltage continuous mining machines. (a) Separation of high... removed. (c) Circuit-interrupting devices. Circuit-interrupting devices must be designed and installed to... ground. (e) Onboard ungrounded, three-phase power circuit. A continuous mining machine designed with an...
Semisupervised Support Vector Machines With Tangent Space Intrinsic Manifold Regularization.
Sun, Shiliang; Xie, Xijiong
2016-09-01
Semisupervised learning has been an active research topic in machine learning and data mining. One main reason is that labeling examples is expensive and time-consuming, while there are large numbers of unlabeled examples available in many practical problems. So far, Laplacian regularization has been widely used in semisupervised learning. In this paper, we propose a new regularization method called tangent space intrinsic manifold regularization. It is intrinsic to data manifold and favors linear functions on the manifold. Fundamental elements involved in the formulation of the regularization are local tangent space representations, which are estimated by local principal component analysis, and the connections that relate adjacent tangent spaces. Simultaneously, we explore its application to semisupervised classification and propose two new learning algorithms called tangent space intrinsic manifold regularized support vector machines (TiSVMs) and tangent space intrinsic manifold regularized twin SVMs (TiTSVMs). They effectively integrate the tangent space intrinsic manifold regularization consideration. The optimization of TiSVMs can be solved by a standard quadratic programming, while the optimization of TiTSVMs can be solved by a pair of standard quadratic programmings. The experimental results of semisupervised classification problems show the effectiveness of the proposed semisupervised learning algorithms.
Toward Usable Interactive Analytics: Coupling Cognition and Computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Endert, Alexander; North, Chris; Chang, Remco
Interactive analytics provide users a myriad of computational means to aid in extracting meaningful information from large and complex datasets. Much prior work focuses either on advancing the capabilities of machine-centric approaches by the data mining and machine learning communities, or human-driven methods by the visualization and CHI communities. However, these methods do not yet support a true human-machine symbiotic relationship where users and machines work together collaboratively and adapt to each other to advance an interactive analytic process. In this paper we discuss some of the inherent issues, outlining what we believe are the steps toward usable interactive analyticsmore » that will ultimately increase the effectiveness for both humans and computers to produce insights.« less
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-30
... Coal Mines ACTION: Notice. SUMMARY: The Department of Labor (DOL) is submitting the Mine Safety and... Continuous Mining Machines Standards for Underground Coal Mines,'' to the Office of Management and Budget... continuous mining machines (HVCMM) in underground coal mines by requiring records of testing, examination and...
Real-time individualized training vectors for experiential learning.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Willis, Matt; Tucker, Eilish Marie; Raybourn, Elaine Marie
2011-01-01
Military training utilizing serious games or virtual worlds potentially generate data that can be mined to better understand how trainees learn in experiential exercises. Few data mining approaches for deployed military training games exist. Opportunities exist to collect and analyze these data, as well as to construct a full-history learner model. Outcomes discussed in the present document include results from a quasi-experimental research study on military game-based experiential learning, the deployment of an online game for training evidence collection, and results from a proof-of-concept pilot study on the development of individualized training vectors. This Lab Directed Research & Development (LDRD)more » project leveraged products within projects, such as Titan (Network Grand Challenge), Real-Time Feedback and Evaluation System, (America's Army Adaptive Thinking and Leadership, DARWARS Ambush! NK), and Dynamic Bayesian Networks to investigate whether machine learning capabilities could perform real-time, in-game similarity vectors of learner performance, toward adaptation of content delivery, and quantitative measurement of experiential learning.« less
30 CFR 18.96 - Preparation of machines for inspection; requirements.
Code of Federal Regulations, 2010 CFR
2010-07-01
... TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Field Approval of Electrically Operated Mining Equipment § 18.96 Preparation of machines for inspection... place at which a field approval investigation will be conducted with respect to any machine, the...
30 CFR 18.96 - Preparation of machines for inspection; requirements.
Code of Federal Regulations, 2011 CFR
2011-07-01
... TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Field Approval of Electrically Operated Mining Equipment § 18.96 Preparation of machines for inspection... place at which a field approval investigation will be conducted with respect to any machine, the...
Mining Twitter Data to Improve Detection of Schizophrenia
McManus, Kimberly; Mallory, Emily K.; Goldfeder, Rachel L.; Haynes, Winston A.; Tatum, Jonathan D.
2015-01-01
Individuals who suffer from schizophrenia comprise I percent of the United States population and are four times more likely to die of suicide than the general US population. Identification of at-risk individuals with schizophrenia is challenging when they do not seek treatment. Microblogging platforms allow users to share their thoughts and emotions with the world in short snippets of text. In this work, we leveraged the large corpus of Twitter posts and machine-learning methodologies to detect individuals with schizophrenia. Using features from tweets such as emoticon use, posting time of day, and dictionary terms, we trained, built, and validated several machine learning models. Our support vector machine model achieved the best performance with 92% precision and 71% recall on the held-out test set. Additionally, we built a web application that dynamically displays summary statistics between cohorts. This enables outreach to undiagnosed individuals, improved physician diagnoses, and destigmatization of schizophrenia. PMID:26306253
Machine Learning methods for Quantitative Radiomic Biomarkers.
Parmar, Chintan; Grossmann, Patrick; Bussink, Johan; Lambin, Philippe; Aerts, Hugo J W L
2015-08-17
Radiomics extracts and mines large number of medical imaging features quantifying tumor phenotypic characteristics. Highly accurate and reliable machine-learning approaches can drive the success of radiomic applications in clinical care. In this radiomic study, fourteen feature selection methods and twelve classification methods were examined in terms of their performance and stability for predicting overall survival. A total of 440 radiomic features were extracted from pre-treatment computed tomography (CT) images of 464 lung cancer patients. To ensure the unbiased evaluation of different machine-learning methods, publicly available implementations along with reported parameter configurations were used. Furthermore, we used two independent radiomic cohorts for training (n = 310 patients) and validation (n = 154 patients). We identified that Wilcoxon test based feature selection method WLCX (stability = 0.84 ± 0.05, AUC = 0.65 ± 0.02) and a classification method random forest RF (RSD = 3.52%, AUC = 0.66 ± 0.03) had highest prognostic performance with high stability against data perturbation. Our variability analysis indicated that the choice of classification method is the most dominant source of performance variation (34.21% of total variance). Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice.
2011-01-01
Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981
Water spray ventilator system for continuous mining machines
Page, Steven J.; Mal, Thomas
1995-01-01
The invention relates to a water spray ventilator system mounted on a continuous mining machine to streamline airflow and provide effective face ventilation of both respirable dust and methane in underground coal mines. This system has two side spray nozzles mounted one on each side of the mining machine and six spray nozzles disposed on a manifold mounted to the underside of the machine boom. The six spray nozzles are angularly and laterally oriented on the manifold so as to provide non-overlapping spray patterns along the length of the cutter drum.
Applications of Deep Learning and Reinforcement Learning to Biological Data.
Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano
2018-06-01
Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
Diamond Eye: a distributed architecture for image data mining
NASA Astrophysics Data System (ADS)
Burl, Michael C.; Fowlkes, Charless; Roden, Joe; Stechert, Andre; Mukhtar, Saleem
1999-02-01
Diamond Eye is a distributed software architecture, which enables users (scientists) to analyze large image collections by interacting with one or more custom data mining servers via a Java applet interface. Each server is coupled with an object-oriented database and a computational engine, such as a network of high-performance workstations. The database provides persistent storage and supports querying of the 'mined' information. The computational engine provides parallel execution of expensive image processing, object recognition, and query-by-content operations. Key benefits of the Diamond Eye architecture are: (1) the design promotes trial evaluation of advanced data mining and machine learning techniques by potential new users (all that is required is to point a web browser to the appropriate URL), (2) software infrastructure that is common across a range of science mining applications is factored out and reused, and (3) the system facilitates closer collaborations between algorithm developers and domain experts.
ERIC Educational Resources Information Center
Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank
2012-01-01
Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…
ERIC Educational Resources Information Center
Jarman, Jay
2011-01-01
This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…
75 FR 20918 - High-Voltage Continuous Mining Machine Standard for Underground Coal Mines
Federal Register 2010, 2011, 2012, 2013, 2014
2010-04-22
... DEPARTMENT OF LABOR Mine Safety and Health Administration 30 CFR Parts 18 and 75 RIN 1219-AB34 High-Voltage Continuous Mining Machine Standard for Underground Coal Mines Correction In rule document 2010-7309 beginning on page 17529 in the issue of Tuesday, April 6, 2010, make the following correction...
The accident analysis of mobile mine machinery in Indian opencast coal mines.
Kumar, R; Ghosh, A K
2014-01-01
This paper presents the analysis of large mining machinery related accidents in Indian opencast coal mines. The trends of coal production, share of mining methods in production, machinery deployment in open cast mines, size and population of machinery, accidents due to machinery, types and causes of accidents have been analysed from the year 1995 to 2008. The scrutiny of accidents during this period reveals that most of the responsible factors are machine reversal, haul road design, human fault, operator's fault, machine fault, visibility and dump design. Considering the types of machines, namely, dumpers, excavators, dozers and loaders together the maximum number of fatal accidents has been caused by operator's faults and human faults jointly during the period from 1995 to 2008. The novel finding of this analysis is that large machines with state-of-the-art safety system did not reduce the fatal accidents in Indian opencast coal mines.
High pressure water jet mining machine
Barker, Clark R.
1981-05-05
A high pressure water jet mining machine for the longwall mining of coal is described. The machine is generally in the shape of a plowshare and is advanced in the direction in which the coal is cut. The machine has mounted thereon a plurality of nozzle modules each containing a high pressure water jet nozzle disposed to oscillate in a particular plane. The nozzle modules are oriented to cut in vertical and horizontal planes on the leading edge of the machine and the coal so cut is cleaved off by the wedge-shaped body.
Human factors model concerning the man-machine interface of mining crewstations
NASA Technical Reports Server (NTRS)
Rider, James P.; Unger, Richard L.
1989-01-01
The U.S. Bureau of Mines is developing a computer model to analyze the human factors aspect of mining machine operator compartments. The model will be used as a research tool and as a design aid. It will have the capability to perform the following: simulated anthropometric or reach assessment, visibility analysis, illumination analysis, structural analysis of the protective canopy, operator fatigue analysis, and computation of an ingress-egress rating. The model will make extensive use of graphics to simplify data input and output. Two dimensional orthographic projections of the machine and its operator compartment are digitized and the data rebuilt into a three dimensional representation of the mining machine. Anthropometric data from either an individual or any size population may be used. The model is intended for use by equipment manufacturers and mining companies during initial design work on new machines. In addition to its use in machine design, the model should prove helpful as an accident investigation tool and for determining the effects of machine modifications made in the field on the critical areas of visibility and control reach ability.
NASA Astrophysics Data System (ADS)
Koptev, V. Yu
2017-02-01
The work represents the results of studying basic interconnected criteria of separate equipment units of the transport network machines fleet, depending on production and mining factors to improve the transport systems management. Justifying the selection of a control system necessitates employing new methodologies and models, augmented with stability and transport flow criteria, accounting for mining work development dynamics on mining sites. A necessary condition is the accounting of technical and operating parameters related to vehicle operation. Modern open pit mining dispatching systems must include such kinds of the information database. An algorithm forming a machine fleet is presented based on multi-variation task solution in connection with defining reasonable operating features of a machine working as a part of a complex. Proposals cited in the work may apply to mining machines (drilling equipment, excavators) and construction equipment (bulldozers, cranes, pile-drivers), city transport and other types of production activities using machine fleet.
Opinion mining on book review using CNN-L2-SVM algorithm
NASA Astrophysics Data System (ADS)
Rozi, M. F.; Mukhlash, I.; Soetrisno; Kimura, M.
2018-03-01
Review of a product can represent quality of a product itself. An extraction to that review can be used to know sentiment of that opinion. Process to extract useful information of user review is called Opinion Mining. Review extraction model that is enhancing nowadays is Deep Learning model. This Model has been used by many researchers to obtain excellent performance on Natural Language Processing. In this research, one of deep learning model, Convolutional Neural Network (CNN) is used for feature extraction and L2 Support Vector Machine (SVM) as classifier. These methods are implemented to know the sentiment of book review data. The result of this method shows state-of-the art performance in 83.23% for training phase and 64.6% for testing phase.
30 CFR 56.14107 - Moving machine parts.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Moving machine parts. 56.14107 Section 56.14107 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR METAL AND NONMETAL MINE... Safety Devices and Maintenance Requirements § 56.14107 Moving machine parts. (a) Moving machine parts...
30 CFR 57.14107 - Moving machine parts.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Moving machine parts. 57.14107 Section 57.14107 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR METAL AND NONMETAL MINE... Equipment Safety Devices and Maintenance Requirements § 57.14107 Moving machine parts. (a) Moving machine...
NASA Astrophysics Data System (ADS)
Biały, Witold
2017-06-01
Failure frequency in the mining process, with a focus on the mining machine, has been presented and illustrated by the example of two coal-mines. Two mining systems have been subjected to analysis: a cutter-loader and a plough system. In order to reduce costs generated by failures, maintenance teams should regularly make sure that the machines are used and operated in a rational and effective way. Such activities will allow downtimes to be reduced, and, in consequence, will increase the effectiveness of a mining plant. The evaluation of mining machines' failure frequency contained in this study has been based on one of the traditional quality management tools - the Pareto chart.
Chang, Ni-Bin; Bai, Kaixu; Chen, Chi-Farn
2017-10-01
Monitoring water quality changes in lakes, reservoirs, estuaries, and coastal waters is critical in response to the needs for sustainable development. This study develops a remote sensing-based multiscale modeling system by integrating multi-sensor satellite data merging and image reconstruction algorithms in support of feature extraction with machine learning leading to automate continuous water quality monitoring in environmentally sensitive regions. This new Earth observation platform, termed "cross-mission data merging and image reconstruction with machine learning" (CDMIM), is capable of merging multiple satellite imageries to provide daily water quality monitoring through a series of image processing, enhancement, reconstruction, and data mining/machine learning techniques. Two existing key algorithms, including Spectral Information Adaptation and Synthesis Scheme (SIASS) and SMart Information Reconstruction (SMIR), are highlighted to support feature extraction and content-based mapping. Whereas SIASS can support various data merging efforts to merge images collected from cross-mission satellite sensors, SMIR can overcome data gaps by reconstructing the information of value-missing pixels due to impacts such as cloud obstruction. Practical implementation of CDMIM was assessed by predicting the water quality over seasons in terms of the concentrations of nutrients and chlorophyll-a, as well as water clarity in Lake Nicaragua, providing synergistic efforts to better monitor the aquatic environment and offer insightful lake watershed management strategies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Anomaly detection using temporal data mining in a smart home environment.
Jakkula, V; Cook, D J
2008-01-01
To many people, home is a sanctuary. With the maturing of smart home technologies, many people with cognitive and physical disabilities can lead independent lives in their own homes for extended periods of time. In this paper, we investigate the design of machine learning algorithms that support this goal. We hypothesize that machine learning algorithms can be designed to automatically learn models of resident behavior in a smart home, and that the results can be used to perform automated health monitoring and to detect anomalies. Specifically, our algorithms draw upon the temporal nature of sensor data collected in a smart home to build a model of expected activities and to detect unexpected, and possibly health-critical, events in the home. We validate our algorithms using synthetic data and real activity data collected from volunteers in an automated smart environment. The results from our experiments support our hypothesis that a model can be learned from observed smart home data and used to report anomalies, as they occur, in a smart home.
Quantum-enhanced feature selection with forward selection and backward elimination
NASA Astrophysics Data System (ADS)
He, Zhimin; Li, Lvzhou; Huang, Zhiming; Situ, Haozhen
2018-07-01
Feature selection is a well-known preprocessing technique in machine learning, which can remove irrelevant features to improve the generalization capability of a classifier and reduce training and inference time. However, feature selection is time-consuming, particularly for the applications those have thousands of features, such as image retrieval, text mining and microarray data analysis. It is crucial to accelerate the feature selection process. We propose a quantum version of wrapper-based feature selection, which converts a classical feature selection to its quantum counterpart. It is valuable for machine learning on quantum computer. In this paper, we focus on two popular kinds of feature selection methods, i.e., wrapper-based forward selection and backward elimination. The proposed feature selection algorithm can quadratically accelerate the classical one.
Use of IT platform in determination of efficiency of mining machines
NASA Astrophysics Data System (ADS)
Brodny, Jarosław; Tutak, Magdalena
2018-01-01
Determination of effective use of mining devices has very significant meaning for mining enterprises. High costs of their purchase and tenancy cause that these enterprises tend to the best use of possessed technical potential. However, specifics of mining production causes that this process not always proceeds without interferences. Practical experiences show that determination of objective measure of utilization of machine in mining enterprise is not simple. In the paper a proposition for solution of this problem is presented. For this purpose an IT platform and overall efficiency model OEE were used. This model enables to evaluate the machine in a range of its availability performance and quality of product, and constitutes a quantitative tool of TPM strategy. Adapted to the specificity of mining branch the OEE model together with acquired data from industrial automatic system enabled to determine the partial indicators and overall efficiency of tested machines. Studies were performed for a set of machines directly use in coal exploitation process. They were: longwall-shearer and armoured face conveyor, and beam stage loader. Obtained results clearly indicate that degree of use of machines by mining enterprises are unsatisfactory. Use of IT platforms will significantly facilitate the process of registration, archiving and analytical processing of the acquired data. In the paper there is presented methodology of determination of partial indices and total OEE together with a practical example of its application for investigated machines set. Also IT platform was characterized for its construction, function and application.
Shouval, Roni; Hadanny, Amir; Shlomo, Nir; Iakobishvili, Zaza; Unger, Ron; Zahger, Doron; Alcalai, Ronny; Atar, Shaul; Gottlieb, Shmuel; Matetzky, Shlomi; Goldenberg, Ilan; Beigel, Roy
2017-11-01
Risk scores for prediction of mortality 30-days following a ST-segment elevation myocardial infarction (STEMI) have been developed using a conventional statistical approach. To evaluate an array of machine learning (ML) algorithms for prediction of mortality at 30-days in STEMI patients and to compare these to the conventional validated risk scores. This was a retrospective, supervised learning, data mining study. Out of a cohort of 13,422 patients from the Acute Coronary Syndrome Israeli Survey (ACSIS) registry, 2782 patients fulfilled inclusion criteria and 54 variables were considered. Prediction models for overall mortality 30days after STEMI were developed using 6 ML algorithms. Models were compared to each other and to the Global Registry of Acute Coronary Events (GRACE) and Thrombolysis In Myocardial Infarction (TIMI) scores. Depending on the algorithm, using all available variables, prediction models' performance measured in an area under the receiver operating characteristic curve (AUC) ranged from 0.64 to 0.91. The best models performed similarly to the Global Registry of Acute Coronary Events (GRACE) score (0.87 SD 0.06) and outperformed the Thrombolysis In Myocardial Infarction (TIMI) score (0.82 SD 0.06, p<0.05). Performance of most algorithms plateaued when introduced with 15 variables. Among the top predictors were creatinine, Killip class on admission, blood pressure, glucose level, and age. We present a data mining approach for prediction of mortality post-ST-segment elevation myocardial infarction. The algorithms selected showed competence in prediction across an increasing number of variables. ML may be used for outcome prediction in complex cardiology settings. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Service Modules for Coal Extraction
NASA Technical Reports Server (NTRS)
Gangal, M. D.; Lewis, E. V.
1985-01-01
Service train follows group of mining machines, paying out utility lines as machines progress into coal face. Service train for four mining machines removes gases and coal and provides water and electricity. Flexible, coiling armored carriers protect cables and hoses. High coal production attained by arraying row of machines across face, working side by side.
NASA Technical Reports Server (NTRS)
Gangal, M. D.; Isenberg, L.; Lewis, E. V.
1985-01-01
Proposed system offers safety and large return on investment. System, operating by year 2000, employs machines and processes based on proven principles. According to concept, line of parallel machines, connected in groups of four to service modules, attacks face of coal seam. High-pressure water jets and central auger on each machine break face. Jaws scoop up coal chunks, and auger grinds them and forces fragments into slurry-transport system. Slurry pumped through pipeline to point of use. Concept for highly automated coal-mining system increases productivity, makes mining safer, and protects health of mine workers.
TEES 2.2: Biomedical Event Extraction for Diverse Corpora
2015-01-01
Background The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. Results The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. Conclusions The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. PMID:26551925
TEES 2.2: Biomedical Event Extraction for Diverse Corpora.
Björne, Jari; Salakoski, Tapio
2015-01-01
The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.
Comprehensive decision tree models in bioinformatics.
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Comprehensive Decision Tree Models in Bioinformatics
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449
Digital Family History Data Mining with Neural Networks: A Pilot Study.
Hoyt, Robert; Linnville, Steven; Thaler, Stephen; Moore, Jeffrey
2016-01-01
Following the passage of the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009, electronic health records were widely adopted by eligible physicians and hospitals in the United States. Stage 2 meaningful use menu objectives include a digital family history but no stipulation as to how that information should be used. A variety of data mining techniques now exist for these data, which include artificial neural networks (ANNs) for supervised or unsupervised machine learning. In this pilot study, we applied an ANN-based simulation to a previously reported digital family history to mine the database for trends. A graphical user interface was created to display the input of multiple conditions in the parents and output as the likelihood of diabetes, hypertension, and coronary artery disease in male and female offspring. The results of this pilot study show promise in using ANNs to data mine digital family histories for clinical and research purposes.
New directions in biomedical text annotation: definitions, guidelines and corpus construction
Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit
2006-01-01
Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190
Application of data mining approaches to drug delivery.
Ekins, Sean; Shimada, Jun; Chang, Cheng
2006-11-30
Computational approaches play a key role in all areas of the pharmaceutical industry from data mining, experimental and clinical data capture to pharmacoeconomics and adverse events monitoring. They will likely continue to be indispensable assets along with a growing library of software applications. This is primarily due to the increasingly massive amount of biology, chemistry and clinical data, which is now entering the public domain mainly as a result of NIH and commercially funded projects. We are therefore in need of new methods for mining this mountain of data in order to enable new hypothesis generation. The computational approaches include, but are not limited to, database compilation, quantitative structure activity relationships (QSAR), pharmacophores, network visualization models, decision trees, machine learning algorithms and multidimensional data visualization software that could be used to improve drug delivery after mining public and/or proprietary data. We will discuss some areas of unmet needs in the area of data mining for drug delivery that can be addressed with new software tools or databases of relevance to future pharmaceutical projects.
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2010 CFR
2010-07-01
... TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Field Approval of Electrically Operated Mining Equipment § 18.97 Inspection of machines; minimum... shall be conducted by an electrical representative and such inspection shall include: (1) Examination of...
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2011 CFR
2011-07-01
... TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Field Approval of Electrically Operated Mining Equipment § 18.97 Inspection of machines; minimum... shall be conducted by an electrical representative and such inspection shall include: (1) Examination of...
4. CARPENTER AND MACHINE SHOP AT EAST GREY ROCK MINE, ...
4. CARPENTER AND MACHINE SHOP AT EAST GREY ROCK MINE, LOOKING EAST. THIS IS SAID TO BE THE OLDEST MINE BUILDING LEFT ON BUTTE HILL. SHIV WHEELS FROM VARIOUS LOCATIONS AROUND THE HILL ARE ALSO VISIBLE - Butte Mineyards, Butte, Silver Bow County, MT
CANFAR + Skytree: Mining Massive Datasets as an Essential Part of the Future of Astronomy
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.
2013-01-01
The future study of large astronomical datasets, consisting of hundreds of millions to billions of objects, will be dominated by large computing resources, and by analysis tools of the necessary scalability and sophistication to extract useful information. Significant effort will be required to fulfil their potential as a provider of the next generation of science results. To-date, computing systems have allowed either sophisticated analysis of small datasets, e.g., most astronomy software, or simple analysis of large datasets, e.g., database queries. At the Canadian Astronomy Data Centre, we have combined our cloud computing system, the Canadian Advanced Network for Astronomical Research (CANFAR), with the world's most advanced machine learning software, Skytree, to create the world's first cloud computing system for data mining in astronomy. This allows the full sophistication of the huge fields of data mining and machine learning to be applied to the hundreds of millions of objects that make up current large datasets. CANFAR works by utilizing virtual machines, which appear to the user as equivalent to a desktop. Each machine is replicated as desired to perform large-scale parallel processing. Such an arrangement carries far more flexibility than other cloud systems, because it enables the user to immediately install and run the same code that they already utilize for science on their desktop. We demonstrate the utility of the CANFAR + Skytree system by showing science results obtained, including assigning photometric redshifts with full probability density functions (PDFs) to a catalog of approximately 133 million galaxies from the MegaPipe reductions of the Canada-France-Hawaii Telescope Legacy Wide and Deep surveys. Each PDF is produced nonparametrically from 100 instances of the photometric parameters for each galaxy, generated by perturbing within the errors on the measurements. Hence, we produce, store, and assign redshifts to, a catalog of over 13 billion object instances. This catalog is comparable in size to those expected from next-generation surveys, such as Large Synoptic Survey Telescope. The CANFAR+Skytree system is open for use by any interested member of the astronomical community.
Koedinger, Kenneth R; D'Mello, Sidney; McLaughlin, Elizabeth A; Pardos, Zachary A; Rosé, Carolyn P
2015-01-01
An emerging field of educational data mining (EDM) is building on and contributing to a wide variety of disciplines through analysis of data coming from various educational technologies. EDM researchers are addressing questions of cognition, metacognition, motivation, affect, language, social discourse, etc. using data from intelligent tutoring systems, massive open online courses, educational games and simulations, and discussion forums. The data include detailed action and timing logs of student interactions in user interfaces such as graded responses to questions or essays, steps in rich problem solving environments, games or simulations, discussion forum posts, or chat dialogs. They might also include external sensors such as eye tracking, facial expression, body movement, etc. We review how EDM has addressed the research questions that surround the psychology of learning with an emphasis on assessment, transfer of learning and model discovery, the role of affect, motivation and metacognition on learning, and analysis of language data and collaborative learning. For example, we discuss (1) how different statistical assessment methods were used in a data mining competition to improve prediction of student responses to intelligent tutor tasks, (2) how better cognitive models can be discovered from data and used to improve instruction, (3) how data-driven models of student affect can be used to focus discussion in a dialog-based tutoring system, and (4) how machine learning techniques applied to discussion data can be used to produce automated agents that support student learning as they collaborate in a chat room or a discussion board. © 2015 John Wiley & Sons, Ltd.
30 CFR 75.832 - Frequency of examinations; recordkeeping.
Code of Federal Regulations, 2010 CFR
2010-07-01
... machine examination. At least once every 7 days, a qualified person must examine each high-voltage continuous mining machine to verify that electrical protection, equipment grounding, permissibility, cable... least once every 7 days, and prior to tramming the high-voltage continuous mining machine, a qualified...
NASA Astrophysics Data System (ADS)
Hardinata, Lingga; Warsito, Budi; Suparti
2018-05-01
Complexity of bankruptcy causes the accurate models of bankruptcy prediction difficult to be achieved. Various prediction models have been developed to improve the accuracy of bankruptcy predictions. Machine learning has been widely used to predict because of its adaptive capabilities. Artificial Neural Networks (ANN) is one of machine learning which proved able to complete inference tasks such as prediction and classification especially in data mining. In this paper, we propose the implementation of Jordan Recurrent Neural Networks (JRNN) to classify and predict corporate bankruptcy based on financial ratios. Feedback interconnection in JRNN enable to make the network keep important information well allowing the network to work more effectively. The result analysis showed that JRNN works very well in bankruptcy prediction with average success rate of 81.3785%.
VizieR Online Data Catalog: SDSS-DR9 photometric redshifts (Brescia+, 2014)
NASA Astrophysics Data System (ADS)
Brescia, M.; Cavuoti, S.; Longo, G.; de Stefano, V.
2014-07-01
We present an application of a machine learning method to the estimation of photometric redshifts for the galaxies in the SDSS Data Release 9 (SDSS-DR9). Photometric redshifts for more than 143 million galaxies were produced. The MLPQNA (Multi Layer Perceptron with Quasi Newton Algorithm) model provided within the framework of the DAMEWARE (DAta Mining and Exploration Web Application REsource) is an interpolative method derived from machine learning models. The obtained redshifts have an overall uncertainty of σ=0.023 with a very small average bias of about 3x10-5 and a fraction of catastrophic outliers of about 5%. After removal of the catastrophic outliers, the uncertainty is about σ=0.017. The catalogue files report in their name the range of DEC degrees related to the included objects. (60 data files).
e-IQ and IQ knowledge mining for generalized LDA
NASA Astrophysics Data System (ADS)
Jenkins, Jeffrey; van Bergem, Rutger; Sweet, Charles; Vietsch, Eveline; Szu, Harold
2015-05-01
How can the human brain uncover patterns, associations and features in real-time, real-world data? There must be a general strategy used to transform raw signals into useful features, but representing this generalization in the context of our information extraction tool set is lacking. In contrast to Big Data (BD), Large Data Analysis (LDA) has become a reachable multi-disciplinary goal in recent years due in part to high performance computers and algorithm development, as well as the availability of large data sets. However, the experience of Machine Learning (ML) and information communities has not been generalized into an intuitive framework that is useful to researchers across disciplines. The data exploration phase of data mining is a prime example of this unspoken, ad-hoc nature of ML - the Computer Scientist works with a Subject Matter Expert (SME) to understand the data, and then build tools (i.e. classifiers, etc.) which can benefit the SME and the rest of the researchers in that field. We ask, why is there not a tool to represent information in a meaningful way to the researcher asking the question? Meaning is subjective and contextual across disciplines, so to ensure robustness, we draw examples from several disciplines and propose a generalized LDA framework for independent data understanding of heterogeneous sources which contribute to Knowledge Discovery in Databases (KDD). Then, we explore the concept of adaptive Information resolution through a 6W unsupervised learning methodology feedback system. In this paper, we will describe the general process of man-machine interaction in terms of an asymmetric directed graph theory (digging for embedded knowledge), and model the inverse machine-man feedback (digging for tacit knowledge) as an ANN unsupervised learning methodology. Finally, we propose a collective learning framework which utilizes a 6W semantic topology to organize heterogeneous knowledge and diffuse information to entities within a society in a personalized way.
30 CFR 18.49 - Connection boxes on machines.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Connection boxes on machines. 18.49 Section 18.49 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Construction and...
30 CFR 18.61 - Final inspection of complete machine.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Final inspection of complete machine. 18.61 Section 18.61 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Inspections...
Bidirectional, Automatic Coal-Mining Machine
NASA Technical Reports Server (NTRS)
Collins, Earl R., Jr.
1986-01-01
Proposed coal-mining machine operates in both forward and reverse directions along mine face. New design increases efficiency and productivity, because does not stop cutting as it retreats to starting position after completing pass along face. To further increase efficiency, automatic miner carries its own machinery for crushing coal and feeding it to slurry-transport tube. Dual-drum mining machine cuts coal in two layers, crushes, mixes with water, and feeds it as slurry to haulage tube. At end of pass, foward drum raised so it becomes rear drum, and rear drum lowered, becoming forward drum for return pass.
Investigating Mesoscale Convective Systems and their Predictability Using Machine Learning
NASA Astrophysics Data System (ADS)
Daher, H.; Duffy, D.; Bowen, M. K.
2016-12-01
A mesoscale convective system (MCS) is a thunderstorm region that lasts several hours long and forms near weather fronts and can often develop into tornadoes. Here we seek to answer the question of whether these tornadoes are "predictable" by looking for a defining characteristic(s) separating MCSs that evolve into tornadoes versus those that do not. Using NASA's Modern Era Retrospective-analysis for Research and Applications 2 reanalysis data (M2R12K), we apply several state of the art machine learning techniques to investigate this question. The spatial region examined in this experiment is Tornado Alley in the United States over the peak tornado months. A database containing select variables from M2R12K is created using PostgreSQL. This database is then analyzed using machine learning methods such as Symbolic Aggregate approXimation (SAX) and DBSCAN (an unsupervised density-based data clustering algorithm). The incentive behind using these methods is to mathematically define a MCS so that association rule mining techniques can be used to uncover some sort of signal or teleconnection that will help us forecast which MCSs will result in tornadoes and therefore give society more time to prepare and in turn reduce casualties and destruction.
Machine Learning and Data Mining for Comprehensive Test Ban Treaty Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Russell, S; Vaidya, S
2009-07-30
The Comprehensive Test Ban Treaty (CTBT) is gaining renewed attention in light of growing worldwide interest in mitigating risks of nuclear weapons proliferation and testing. Since the International Monitoring System (IMS) installed the first suite of sensors in the late 1990's, the IMS network has steadily progressed, providing valuable support for event diagnostics. This progress was highlighted at the recent International Scientific Studies (ISS) Conference in Vienna in June 2009, where scientists and domain experts met with policy makers to assess the current status of the CTBT Verification System. A strategic theme within the ISS Conference centered on exploring opportunitiesmore » for further enhancing the detection and localization accuracy of low magnitude events by drawing upon modern tools and techniques for machine learning and large-scale data analysis. Several promising approaches for data exploitation were presented at the Conference. These are summarized in a companion report. In this paper, we introduce essential concepts in machine learning and assess techniques which could provide both incremental and comprehensive value for event discrimination by increasing the accuracy of the final data product, refining On-Site-Inspection (OSI) conclusions, and potentially reducing the cost of future network operations.« less
Machine Learning for Flood Prediction in Google Earth Engine
NASA Astrophysics Data System (ADS)
Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.
2015-12-01
With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.
Machine learning for a Toolkit for Image Mining
NASA Technical Reports Server (NTRS)
Delanoy, Richard L.
1995-01-01
A prototype user environment is described that enables a user with very limited computer skills to collaborate with a computer algorithm to develop search tools (agents) that can be used for image analysis, creating metadata for tagging images, searching for images in an image database on the basis of image content, or as a component of computer vision algorithms. Agents are learned in an ongoing, two-way dialogue between the user and the algorithm. The user points to mistakes made in classification. The algorithm, in response, attempts to discover which image attributes are discriminating between objects of interest and clutter. It then builds a candidate agent and applies it to an input image, producing an 'interest' image highlighting features that are consistent with the set of objects and clutter indicated by the user. The dialogue repeats until the user is satisfied. The prototype environment, called the Toolkit for Image Mining (TIM) is currently capable of learning spectral and textural patterns. Learning exhibits rapid convergence to reasonable levels of performance and, when thoroughly trained, Fo appears to be competitive in discrimination accuracy with other classification techniques.
30 CFR 18.21 - Machines equipped with powered dust collectors.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Machines equipped with powered dust collectors. 18.21 Section 18.21 Mineral Resources MINE SAFETY AND HEALTH ADMINISTRATION, DEPARTMENT OF LABOR TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES...
30 CFR 70.207 - Bimonthly sampling; mechanized mining units.
Code of Federal Regulations, 2011 CFR
2011-07-01
... air will be used to determine the average concentration for that mechanized mining unit. (e) Unless... sampling device as follows: (1) Conventional section using cutting machine. On the cutting machine operator or on the cutting machine within 36 inches inby the normal working position; (2) Conventional section...
Prediction of Backbreak in Open-Pit Blasting Operations Using the Machine Learning Method
NASA Astrophysics Data System (ADS)
Khandelwal, Manoj; Monjezi, M.
2013-03-01
Backbreak is an undesirable phenomenon in blasting operations. It can cause instability of mine walls, falling down of machinery, improper fragmentation, reduced efficiency of drilling, etc. The existence of various effective parameters and their unknown relationships are the main reasons for inaccuracy of the empirical models. Presently, the application of new approaches such as artificial intelligence is highly recommended. In this paper, an attempt has been made to predict backbreak in blasting operations of Soungun iron mine, Iran, incorporating rock properties and blast design parameters using the support vector machine (SVM) method. To investigate the suitability of this approach, the predictions by SVM have been compared with multivariate regression analysis (MVRA). The coefficient of determination (CoD) and the mean absolute error (MAE) were taken as performance measures. It was found that the CoD between measured and predicted backbreak was 0.987 and 0.89 by SVM and MVRA, respectively, whereas the MAE was 0.29 and 1.07 by SVM and MVRA, respectively.
Visual management of large scale data mining projects.
Shah, I; Hunter, L
2000-01-01
This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences. We developed and tested our visualization techniques on this application.
Abar, Orhan; Charnigo, Richard J.; Rayapati, Abner
2017-01-01
Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules. PMID:28736771
2017-03-01
Warfare. 14. SUBJECT TERMS data mining, natural language processing, machine learning, algorithm design , information warfare, propaganda 15. NUMBER OF...Speech Tags. Adapted from [12]. CC Coordinating conjunction PRP$ Possessive pronoun CD Cardinal number RB Adverb DT Determiner RBR Adverb, comparative ... comparative UH Interjection JJS Adjective, superlative VB Verb, base form LS List item marker VBD Verb, past tense MD Modal VBG Verb, gerund or
Pattern Activity Clustering and Evaluation (PACE)
NASA Astrophysics Data System (ADS)
Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna
2012-06-01
With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
Organiscak, J.A.; Beck, T.W.
2015-01-01
The U.S. National Institute for Occupational Safety and Health (NIOSH) Office of Mine Safety and Health Research (OMSHR) has recently studied several redirected scrubber discharge configurations in its full-scale continuous miner gallery for both dust and gas control when using an exhaust face ventilation system. Dust and gas measurements around the continuous mining machine in the laboratory showed that the conventional scrubber discharge directed outby the face with a 12.2-m (40-ft) exhaust curtain setback appeared to be one of the better configurations for controlling dust and gas. Redirecting all the air toward the face equally up both sides of the machine increased the dust and gas concentrations around the machine. When all of the air was redirected toward the face on the off-curtain side of the machine, gas accumulations tended to be reduced at the face, at the expense of increased dust levels in the return and on the curtain side of the mining machine. A 6.1-m (20-ft) exhaust curtain setback without the scrubber operating resulted in the lowest dust levels around the continuous mining machine, but this configuration resulted in some of the highest levels of dust in the return and gas on the off-curtain side of the mining face. Two field studies showed some similarities to the laboratory findings, with elevated dust levels at the rear corners of the continuous miner when all of the scrubber exhaust was redirected toward the face either up the off-tubing side or equally up both sides of the mining machine. PMID:26251566
A data mining approach to predict in situ chlorinated ethene detoxification potential
NASA Astrophysics Data System (ADS)
Lee, J.; Im, J.; Kim, U.; Loeffler, F. E.
2015-12-01
Despite major advances in physicochemical remediation technologies, in situ biostimulation and bioaugmentation treatment aimed at stimulating Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. In practice, selecting the best remedial strategy is challenging due to uncertainties associated with the microbiology (e.g., presence and activity of Dhc) and geochemical factors influencing Dhc activity. Extensive groundwater datasets collected over decades of monitoring exist, but have not been systematically analyzed. In the present study, geochemical and microbial data sets collected from 35 wells at 5 contaminated sites were used to develop a predictive empirical model using a machine learning algorithm (i) to rank the relative importance of parameters that affect in situ reductive dechlorination potential, and (ii) to provide recommendations for selecting the optimal remediation strategy at a specific site. Classification and regression tree (CART) analysis was applied, and a representative classification tree model was developed that allowed short-term prediction of dechlorination potential. Indirect indicators for low dissolved oxygen (e.g., low NO3-and NO2-, high Fe2+ and CH4) were the most influential factors for predicting dechlorination potential, followed by total organic carbon content (TOC) and Dhc cell abundance. These findings indicate that machine learning-based data mining techniques applied to groundwater monitoring data can lead to the development of predictive groundwater remediation models. A major need for improving the predictive capabilities of the data mining approach is a curated, up-to-date and comprehensive collection of groundwater monitoring data.
Tseng, Chih-Jen; Lu, Chi-Jie; Chang, Chi-Chang; Chen, Gin-Den; Cheewakriangkrai, Chalong
2017-05-01
Ovarian cancer is the second leading cause of deaths among gynecologic cancers in the world. Approximately 90% of women with ovarian cancer reported having symptoms long before a diagnosis was made. Literature shows that recurrence should be predicted with regard to their personal risk factors and the clinical symptoms of this devastating cancer. In this study, ensemble learning and five data mining approaches, including support vector machine (SVM), C5.0, extreme learning machine (ELM), multivariate adaptive regression splines (MARS), and random forest (RF), were integrated to rank the importance of risk factors and diagnose the recurrence of ovarian cancer. The medical records and pathologic status were extracted from the Chung Shan Medical University Hospital Tumor Registry. Experimental results illustrated that the integrated C5.0 model is a superior approach in predicting the recurrence of ovarian cancer. Moreover, the classification accuracies of C5.0, ELM, MARS, RF, and SVM indeed increased after using the selected important risk factors as predictors. Our findings suggest that The International Federation of Gynecology and Obstetrics (FIGO), Pathologic M, Age, and Pathologic T were the four most critical risk factors for ovarian cancer recurrence. In summary, the above information can support the important influence of personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with ovarian cancer in all phases of the recurrent trajectory. Copyright © 2017 Elsevier B.V. All rights reserved.
30 CFR 18.93 - Application for field approval; filing procedures.
Code of Federal Regulations, 2012 CFR
2012-07-01
... pursuant to individual written applications for each machine submitted in triplicate on MSHA Form No. 6-1481, by the owner-coal mine operator of the machine. (2) Except as provided in paragraph (b) of this... Mine Health and Safety District Manager for the District in which such machine will be employed. (b...
30 CFR 18.93 - Application for field approval; filing procedures.
Code of Federal Regulations, 2014 CFR
2014-07-01
... pursuant to individual written applications for each machine submitted in triplicate on MSHA Form No. 6-1481, by the owner-coal mine operator of the machine. (2) Except as provided in paragraph (b) of this... Mine Health and Safety District Manager for the District in which such machine will be employed. (b...
30 CFR 18.93 - Application for field approval; filing procedures.
Code of Federal Regulations, 2013 CFR
2013-07-01
... pursuant to individual written applications for each machine submitted in triplicate on MSHA Form No. 6-1481, by the owner-coal mine operator of the machine. (2) Except as provided in paragraph (b) of this... Mine Health and Safety District Manager for the District in which such machine will be employed. (b...
Gene prioritization and clustering by multi-view text mining
2010-01-01
Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336
Underground coal mine instrumentation and test
NASA Technical Reports Server (NTRS)
Burchill, R. F.; Waldron, W. D.
1976-01-01
The need to evaluate mechanical performance of mine tools and to obtain test performance data from candidate systems dictate that an engineering data recording system be built. Because of the wide range of test parameters which would be evaluated, a general purpose data gathering system was designed and assembled to permit maximum versatility. A primary objective of this program was to provide a specific operating evaluation of a longwall mining machine vibration response under normal operating conditions. A number of mines were visited and a candidate for test evaluation was selected, based upon management cooperation, machine suitability, and mine conditions. Actual mine testing took place in a West Virginia mine.
Semantic Framework of Internet of Things for Smart Cities: Case Studies.
Zhang, Ningyu; Chen, Huajun; Chen, Xi; Chen, Jiaoyan
2016-09-14
In recent years, the advancement of sensor technology has led to the generation of heterogeneous Internet-of-Things (IoT) data by smart cities. Thus, the development and deployment of various aspects of IoT-based applications are necessary to mine the potential value of data to the benefit of people and their lives. However, the variety, volume, heterogeneity, and real-time nature of data obtained from smart cities pose considerable challenges. In this paper, we propose a semantic framework that integrates the IoT with machine learning for smart cities. The proposed framework retrieves and models urban data for certain kinds of IoT applications based on semantic and machine-learning technologies. Moreover, we propose two case studies: pollution detection from vehicles and traffic pattern detection. The experimental results show that our system is scalable and capable of accommodating a large number of urban regions with different types of IoT applications.
Semantic Framework of Internet of Things for Smart Cities: Case Studies
Zhang, Ningyu; Chen, Huajun; Chen, Xi; Chen, Jiaoyan
2016-01-01
In recent years, the advancement of sensor technology has led to the generation of heterogeneous Internet-of-Things (IoT) data by smart cities. Thus, the development and deployment of various aspects of IoT-based applications are necessary to mine the potential value of data to the benefit of people and their lives. However, the variety, volume, heterogeneity, and real-time nature of data obtained from smart cities pose considerable challenges. In this paper, we propose a semantic framework that integrates the IoT with machine learning for smart cities. The proposed framework retrieves and models urban data for certain kinds of IoT applications based on semantic and machine-learning technologies. Moreover, we propose two case studies: pollution detection from vehicles and traffic pattern detection. The experimental results show that our system is scalable and capable of accommodating a large number of urban regions with different types of IoT applications. PMID:27649185
Predicting adverse hemodynamic events in critically ill patients.
Yoon, Joo H; Pinsky, Michael R
2018-06-01
The art of predicting future hemodynamic instability in the critically ill has rapidly become a science with the advent of advanced analytical processed based on computer-driven machine learning techniques. How these methods have progressed beyond severity scoring systems to interface with decision-support is summarized. Data mining of large multidimensional clinical time-series databases using a variety of machine learning tools has led to our ability to identify alert artifact and filter it from bedside alarms, display real-time risk stratification at the bedside to aid in clinical decision-making and predict the subsequent development of cardiorespiratory insufficiency hours before these events occur. This fast evolving filed is primarily limited by linkage of high-quality granular to physiologic rationale across heterogeneous clinical care domains. Using advanced analytic tools to glean knowledge from clinical data streams is rapidly becoming a reality whose clinical impact potential is great.
NASA Astrophysics Data System (ADS)
Bailly, J. S.; Delenne, C.; Chahinian, N.; Bringay, S.; Commandré, B.; Chaumont, M.; Derras, M.; Deruelle, L.; Roche, M.; Rodriguez, F.; Subsol, G.; Teisseire, M.
2017-12-01
In France, local government institutions must establish a detailed description of wastewater networks. The information should be available, but it remains fragmented (different formats held by different stakeholders) and incomplete. In the "Cart'Eaux" project, a multidisciplinary team, including an industrial partner, develops a global methodology using Machine Learning and Data Mining approaches applied to various types of large data to recover information in the aim of mapping urban sewage systems for hydraulic modelling. Deep-learning is first applied using a Convolution Neural Network to localize manhole covers on 5 cm resolution aerial RGB images. The detected manhole covers are then automatically connected using a tree-shaped graph constrained by industry rules. Based on a Delaunay triangulation, connections are chosen to minimize a cost function depending on pipe length, slope and possible intersection with roads or buildings. A stochastic version of this algorithm is currently being developed to account for positional uncertainty and detection errors, and generate sets of probable networks. As more information is required for hydraulic modeling (slopes, diameters, materials, etc.), text data mining is used to extract network characteristics from data posted on the Web or available through governmental or specific databases. Using an appropriate list of keywords, the web is scoured for documents which are saved in text format. The thematic entities are identified and linked to the surrounding spatial and temporal entities. The methodology is developed and tested on two towns in southern France. The primary results are encouraging: 54% of manhole covers are detected with few false detections, enabling the reconstruction of probable networks. The data mining results are still being investigated. It is clear at this stage that getting numerical values on specific pipes will be challenging. Thus, when no information is found, decision rules will be used to assign admissible numerical values to enable the final hydraulic modelling. Consequently, sensitivity analysis of the hydraulic model will be performed to take into account the uncertainty associated with each piece of information. Project funded by the European Regional Development Fund and the Occitanie Region.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kevrekidis, Ioannis G.
The work explored the linking of modern developing machine learning techniques (manifold learning and in particular diffusion maps) with traditional PDE modeling/discretization/scientific computation techniques via the equation-free methodology developed by the PI. The result (in addition to several PhD degrees, two of them by CSGF Fellows) was a sequence of strong developments - in part on the algorithmic side, linking data mining with scientific computing, and in part on applications, ranging from PDE discretizations to molecular dynamics and complex network dynamics.
Automated annotation of functional imaging experiments via multi-label classification
Turner, Matthew D.; Chakrabarti, Chayan; Jones, Thomas B.; Xu, Jiawei F.; Fox, Peter T.; Luger, George F.; Laird, Angela R.; Turner, Jessica A.
2013-01-01
Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text. PMID:24409112
Data-Rich Astronomy: Mining Sky Surveys with PhotoRApToR
NASA Astrophysics Data System (ADS)
Cavuoti, Stefano; Brescia, Massimo; Longo, Giuseppe
2014-05-01
In the last decade a new generation of telescopes and sensors has allowed the production of a very large amount of data and astronomy has become a data-rich science. New automatic methods largely based on machine learning are needed to cope with such data tsunami. We present some results in the fields of photometric redshifts and galaxy classification, obtained using the MLPQNA algorithm available in the DAMEWARE (Data Mining and Web Application Resource) for the SDSS galaxies (DR9 and DR10). We present PhotoRApToR (Photometric Research Application To Redshift): a Java based desktop application capable to solve regression and classification problems and specialized for photo-z estimation.
Information mining in remote sensing imagery
NASA Astrophysics Data System (ADS)
Li, Jiang
The volume of remotely sensed imagery continues to grow at an enormous rate due to the advances in sensor technology, and our capability for collecting and storing images has greatly outpaced our ability to analyze and retrieve information from the images. This motivates us to develop image information mining techniques, which is very much an interdisciplinary endeavor drawing upon expertise in image processing, databases, information retrieval, machine learning, and software design. This dissertation proposes and implements an extensive remote sensing image information mining (ReSIM) system prototype for mining useful information implicitly stored in remote sensing imagery. The system consists of three modules: image processing subsystem, database subsystem, and visualization and graphical user interface (GUI) subsystem. Land cover and land use (LCLU) information corresponding to spectral characteristics is identified by supervised classification based on support vector machines (SVM) with automatic model selection, while textural features that characterize spatial information are extracted using Gabor wavelet coefficients. Within LCLU categories, textural features are clustered using an optimized k-means clustering approach to acquire search efficient space. The clusters are stored in an object-oriented database (OODB) with associated images indexed in an image database (IDB). A k-nearest neighbor search is performed using a query-by-example (QBE) approach. Furthermore, an automatic parametric contour tracing algorithm and an O(n) time piecewise linear polygonal approximation (PLPA) algorithm are developed for shape information mining of interesting objects within the image. A fuzzy object-oriented database based on the fuzzy object-oriented data (FOOD) model is developed to handle the fuzziness and uncertainty. Three specific applications are presented: integrated land cover and texture pattern mining, shape information mining for change detection of lakes, and fuzzy normalized difference vegetation index (NDVI) pattern mining. The study results show the effectiveness of the proposed system prototype and the potentials for other applications in remote sensing.
Knowledge mining from clinical datasets using rough sets and backpropagation neural network.
Nahato, Kindie Biredagn; Harichandran, Khanna Nehemiah; Arputharaj, Kannan
2015-01-01
The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
Karim, Ahmad; Salleh, Rosli; Khan, Muhammad Khurram
2016-01-01
Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS), disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks’ back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps’ detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies. PMID:26978523
Karim, Ahmad; Salleh, Rosli; Khan, Muhammad Khurram
2016-01-01
Botnet phenomenon in smartphones is evolving with the proliferation in mobile phone technologies after leaving imperative impact on personal computers. It refers to the network of computers, laptops, mobile devices or tablets which is remotely controlled by the cybercriminals to initiate various distributed coordinated attacks including spam emails, ad-click fraud, Bitcoin mining, Distributed Denial of Service (DDoS), disseminating other malwares and much more. Likewise traditional PC based botnet, Mobile botnets have the same operational impact except the target audience is particular to smartphone users. Therefore, it is import to uncover this security issue prior to its widespread adaptation. We propose SMARTbot, a novel dynamic analysis framework augmented with machine learning techniques to automatically detect botnet binaries from malicious corpus. SMARTbot is a component based off-device behavioral analysis framework which can generate mobile botnet learning model by inducing Artificial Neural Networks' back-propagation method. Moreover, this framework can detect mobile botnet binaries with remarkable accuracy even in case of obfuscated program code. The results conclude that, a classifier model based on simple logistic regression outperform other machine learning classifier for botnet apps' detection, i.e 99.49% accuracy is achieved. Further, from manual inspection of botnet dataset we have extracted interesting trends in those applications. As an outcome of this research, a mobile botnet dataset is devised which will become the benchmark for future studies.
A New Data Mining Scheme Using Artificial Neural Networks
Kamruzzaman, S. M.; Jehad Sarkar, A. M.
2011-01-01
Classification is one of the data mining problems receiving enormous attention in the database community. Although artificial neural networks (ANNs) have been successfully applied in a wide range of machine learning applications, they are however often regarded as black boxes, i.e., their predictions cannot be explained. To enhance the explanation of ANNs, a novel algorithm to extract symbolic rules from ANNs has been proposed in this paper. ANN methods have not been effectively utilized for data mining tasks because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by human experts. With the proposed approach, concise symbolic rules with high accuracy, that are easily explainable, can be extracted from the trained ANNs. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and the accuracy. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of benchmark data mining classification problems. PMID:22163866
A new genome-mining tool redefines the lasso peptide biosynthetic landscape
Tietz, Jonathan I.; Schwalen, Christopher J.; Patel, Parth S.; Maxson, Tucker; Blair, Patricia M.; Tai, Hua-Chia; Zakai, Uzma I.; Mitchell, Douglas A.
2016-01-01
Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are attractive for genome-driven discovery and re-engineering, but limitations in bioinformatic methods and exponentially increasing genomic data make large-scale mining difficult. We report RODEO (Rapid ORF Description and Evaluation Online), which combines hidden Markov model-based analysis, heuristic scoring, and machine learning to identify biosynthetic gene clusters and predict RiPP precursor peptides. We initially focused on lasso peptides, which display intriguing physiochemical properties and bioactivities, but their hypervariability renders them challenging prospects for automated mining. Our approach yielded the most comprehensive mapping of lasso peptide space, revealing >1,300 compounds. We characterized the structures and bioactivities of six lasso peptides, prioritized based on predicted structural novelty, including an unprecedented handcuff-like topology and another with a citrulline modification exceptionally rare among bacteria. These combined insights significantly expand the knowledge of lasso peptides, and more broadly, provide a framework for future genome-mining efforts. PMID:28244986
Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.
2014-01-01
Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953
Sweeney, Elizabeth M; Vogelstein, Joshua T; Cuzzocreo, Jennifer L; Calabresi, Peter A; Reich, Daniel S; Crainiceanu, Ciprian M; Shinohara, Russell T
2014-01-01
Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance.
SWIFT-Review: a text-mining workbench for systematic review.
Howard, Brian E; Phillips, Jason; Miller, Kyle; Tandon, Arpit; Mav, Deepak; Shah, Mihir R; Holmgren, Stephanie; Pelch, Katherine E; Walker, Vickie; Rooney, Andrew A; Macleod, Malcolm; Shah, Ruchir R; Thayer, Kristina
2016-05-23
There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus. Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics. Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation. Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.
Argumentation Based Joint Learning: A Novel Ensemble Learning Approach
Xu, Junyi; Yao, Li; Li, Le
2015-01-01
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359
Alcaide-Leon, P; Dufort, P; Geraldo, A F; Alshafai, L; Maralani, P J; Spears, J; Bharatha, A
2017-06-01
Accurate preoperative differentiation of primary central nervous system lymphoma and enhancing glioma is essential to avoid unnecessary neurosurgical resection in patients with primary central nervous system lymphoma. The purpose of the study was to evaluate the diagnostic performance of a machine-learning algorithm by using texture analysis of contrast-enhanced T1-weighted images for differentiation of primary central nervous system lymphoma and enhancing glioma. Seventy-one adult patients with enhancing gliomas and 35 adult patients with primary central nervous system lymphomas were included. The tumors were manually contoured on contrast-enhanced T1WI, and the resulting volumes of interest were mined for textural features and subjected to a support vector machine-based machine-learning protocol. Three readers classified the tumors independently on contrast-enhanced T1WI. Areas under the receiver operating characteristic curves were estimated for each reader and for the support vector machine classifier. A noninferiority test for diagnostic accuracy based on paired areas under the receiver operating characteristic curve was performed with a noninferiority margin of 0.15. The mean areas under the receiver operating characteristic curve were 0.877 (95% CI, 0.798-0.955) for the support vector machine classifier; 0.878 (95% CI, 0.807-0.949) for reader 1; 0.899 (95% CI, 0.833-0.966) for reader 2; and 0.845 (95% CI, 0.757-0.933) for reader 3. The mean area under the receiver operating characteristic curve of the support vector machine classifier was significantly noninferior to the mean area under the curve of reader 1 ( P = .021), reader 2 ( P = .035), and reader 3 ( P = .007). Support vector machine classification based on textural features of contrast-enhanced T1WI is noninferior to expert human evaluation in the differentiation of primary central nervous system lymphoma and enhancing glioma. © 2017 by American Journal of Neuroradiology.
GREENE, CASEY S.; TAN, JIE; UNG, MATTHEW; MOORE, JASON H.; CHENG, CHAO
2017-01-01
Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the “big data” era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both “machine learning” algorithms as well as “unsupervised” and “supervised” examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. PMID:27908398
GREENE, CASEY S.; TAN, JIE; UNG, MATTHEW; MOORE, JASON H.; CHENG, CHAO
2017-01-01
Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the “big data” era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both “machine learning” algorithms as well as “unsupervised” and “supervised” examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. PMID:24799088
Color machine vision in industrial process control: case limestone mine
NASA Astrophysics Data System (ADS)
Paernaenen, Pekka H. T.; Lemstrom, Guy F.; Koskinen, Seppo
1994-11-01
An optical sorter technology has been developed to improve profitability of a mine by using color line scan machine vision technology. The new technology adapted longers the expected life time of the limestone mine and improves its efficiency. Also the project has proved that color line scan technology of today can successfully be applied to industrial use in harsh environments.
Bissert, P T; Carr, J L; DuCarme, J P; Smith, A K
2016-01-01
The continuous mining machine is a key piece of equipment used in underground coal mining operations. Over the past several decades these machines have been involved in a number of mine worker fatalities. Proximity detection systems have been developed to avert hazards associated with operating continuous mining machines. Incorporating intelligent design into proximity detection systems allows workers greater freedom to position themselves to see visual cues or avoid other hazards such as haulage equipment or unsupported roof or ribs. However, intelligent systems must be as safe as conventional proximity detection systems. An evaluation of the 39 fatal accidents for which the Mine Safety and Health Administration has published fatality investigation reports was conducted to determine whether the accident may have been prevented by conventional or intelligent proximity. Multiple zone configurations for the intelligent systems were studied to determine how system performance might be affected by the zone configuration. Researchers found that 32 of the 39 fatalities, or 82 percent, may have been prevented by both conventional and intelligent proximity systems. These results indicate that, by properly configuring the zones of an intelligent proximity detection system, equivalent protection to a conventional system is possible.
Electromagnetic Signal Feedback Control for Proximity Detection Systems
NASA Astrophysics Data System (ADS)
Smith, Adam K.
Coal is the most abundant fossil fuel in the United States and remains an essential source of energy. While more than half of coal production comes from surface mining, nearly twice as many workers are employed by underground operations. One of the key pieces of equipment used in underground coal mining is the continuous mining machine. These large and powerful machines are operated in confined spaces by remote control. Since 1984, 40 mine workers in the U. S. have been killed when struck or pinned by a continuous mining machine. It is estimated that a majority of these accidents could have been prevented with the application of proximity detection systems. While proximity detection systems can significantly increase safety around a continuous mining machine, there are some system limitations. Commercially available proximity warning systems for continuous mining machines use magnetic field generators to detect workers and establish safe work areas around the machines. Several environmental factors, however, can influence and distort the magnetic fields. To minimize these effects, a control system has been developed using electromagnetic field strength and generator current to stabilize and control field drift induced by internal and external environmental factors. A laboratory test set-up was built using a ferrite-core magnetic field generator to produce a stable magnetic field. Previous work based on a field-invariant magnetic flux density model, which generically describes the electromagnetic field, is expanded upon. The analytically established transferable shell-based flux density distribution model is used to experimentally validate the control system. By controlling the current input to the ferrite-core generator, a more reliable and consistent magnetic field is produced. Implementation of this technology will improve accuracy and performance of existing commercial proximity detection systems. These research results will help reduce the risk of traumatic injuries and improve overall safety in the mining workplace.
Schneider, Nadine; Lowe, Daniel M; Sayle, Roger A; Landrum, Gregory A
2015-01-26
Fingerprint methods applied to molecules have proven to be useful for similarity determination and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chemical reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calculated physicochemical properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also observed when applying the classifier to reactions from an in-house electronic laboratory notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster analysis that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the analysis are provided in the Supporting Information.
Agarwal, Shashank; Liu, Feifan; Yu, Hong
2011-10-03
Protein-protein interaction (PPI) is an important biomedical phenomenon. Automatically detecting PPI-relevant articles and identifying methods that are used to study PPI are important text mining tasks. In this study, we have explored domain independent features to develop two open source machine learning frameworks. One performs binary classification to determine whether the given article is PPI relevant or not, named "Simple Classifier", and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named "OntoNorm". We evaluated our system in the context of BioCreative challenge competition using the standardized data set. Our systems are amongst the top systems reported by the organizers, attaining 60.8% F1-score for identifying relevant documents, and 52.3% F1-score for mapping articles to interaction method ontology. Our results show that domain-independent machine learning frameworks can perform competitively well at the tasks of detecting PPI relevant articles and identifying the methods that were used to study the interaction in such articles. Simple Classifier is available at http://sourceforge.net/p/simpleclassify/home/ and OntoNorm at http://sourceforge.net/p/ontonorm/home/.
Use of sentiment analysis for capturing patient experience from free-text comments posted online.
Greaves, Felix; Ramirez-Cano, Daniel; Millett, Christopher; Darzi, Ara; Donaldson, Liam
2013-11-01
There are large amounts of unstructured, free-text information about quality of health care available on the Internet in blogs, social networks, and on physician rating websites that are not captured in a systematic way. New analytical techniques, such as sentiment analysis, may allow us to understand and use this information more effectively to improve the quality of health care. We attempted to use machine learning to understand patients' unstructured comments about their care. We used sentiment analysis techniques to categorize online free-text comments by patients as either positive or negative descriptions of their health care. We tried to automatically predict whether a patient would recommend a hospital, whether the hospital was clean, and whether they were treated with dignity from their free-text description, compared to the patient's own quantitative rating of their care. We applied machine learning techniques to all 6412 online comments about hospitals on the English National Health Service website in 2010 using Weka data-mining software. We also compared the results obtained from sentiment analysis with the paper-based national inpatient survey results at the hospital level using Spearman rank correlation for all 161 acute adult hospital trusts in England. There was 81%, 84%, and 89% agreement between quantitative ratings of care and those derived from free-text comments using sentiment analysis for cleanliness, being treated with dignity, and overall recommendation of hospital respectively (kappa scores: .40-.74, P<.001 for all). We observed mild to moderate associations between our machine learning predictions and responses to the large patient survey for the three categories examined (Spearman rho 0.37-0.51, P<.001 for all). The prediction accuracy that we have achieved using this machine learning process suggests that we are able to predict, from free-text, a reasonably accurate assessment of patients' opinion about different performance aspects of a hospital and that these machine learning predictions are associated with results of more conventional surveys.
NASA Astrophysics Data System (ADS)
Yoon, Seung-Chul; Park, Bosoon; Lawrence, Kurt C.
2017-05-01
Various types of optical imaging techniques measuring light reflectivity and scattering can detect microbial colonies of foodborne pathogens on agar plates. Until recently, these techniques were developed to provide solutions for hypothesis-driven studies, which focused on developing tools and batch/offline machine learning methods with well defined sets of data. These have relatively high accuracy and rapid response time because the tools and methods are often optimized for the collected data. However, they often need to be retrained or recalibrated when new untrained data and/or features are added. A big-data driven technique is more suitable for online learning of new/ambiguous samples and for mining unknown or hidden features. Although big data research in hyperspectral imaging is emerging in remote sensing and many tools and methods have been developed so far in many other applications such as bioinformatics, the tools and methods still need to be evaluated and adjusted in applications where the conventional batch machine learning algorithms were dominant. The primary objective of this study is to evaluate appropriate big data analytic tools and methods for online learning and mining of foodborne pathogens on agar plates. After the tools and methods are successfully identified, they will be applied to rapidly search big color and hyperspectral image data of microbial colonies collected over the past 5 years in house and find the most probable colony or a group of colonies in the collected big data. The meta-data, such as collection time and any unstructured data (e.g. comments), will also be analyzed and presented with output results. The expected results will be novel, big data-driven technology to correctly detect and recognize microbial colonies of various foodborne pathogens on agar plates.
Machine-related injuries in the US mining industry and priorities for safety research.
Ruff, Todd; Coleman, Patrick; Martini, Laura
2011-03-01
Researchers at the National Institute for Occupational Safety and Health studied mining accidents that involved a worker entangled in, struck by, or in contact with machinery or equipment in motion. The motivation for this study came from the large number of severe accidents, i.e. accidents resulting in a fatality or permanent disability, that are occurring despite available interventions. Accident descriptions were taken from an accident database maintained by the United States Department of Labor, Mine Safety and Health Administration, and 562 accidents that occurred during 2000-2007 fit the search criteria. Machine-related accidents accounted for 41% of all severe accidents in the mining industry during this period. Machinery most often involved in these accidents included conveyors, rock bolting machines, milling machines and haulage equipment such as trucks and loaders. The most common activities associated with these accidents were operation of the machine and maintenance and repair. The current methods to safeguard workers near machinery include mechanical guarding around moving components, lockout/tagout of machine power during maintenance and backup alarms for mobile equipment. To decrease accidents further, researchers recommend additional efforts in the development of new control technologies, training materials and dissemination of information on best practices.
Granular support vector machines with association rules mining for protein homology prediction.
Tang, Yuchun; Jin, Bo; Zhang, Yan-Qing
2005-01-01
Protein homology prediction between protein sequences is one of critical problems in computational biology. Such a complex classification problem is common in medical or biological information processing applications. How to build a model with superior generalization capability from training samples is an essential issue for mining knowledge to accurately predict/classify unseen new samples and to effectively support human experts to make correct decisions. A new learning model called granular support vector machines (GSVM) is proposed based on our previous work. GSVM systematically and formally combines the principles from statistical learning theory and granular computing theory and thus provides an interesting new mechanism to address complex classification problems. It works by building a sequence of information granules and then building support vector machines (SVM) in some of these information granules on demand. A good granulation method to find suitable granules is crucial for modeling a GSVM with good performance. In this paper, we also propose an association rules-based granulation method. For the granules induced by association rules with high enough confidence and significant support, we leave them as they are because of their high "purity" and significant effect on simplifying the classification task. For every other granule, a SVM is modeled to discriminate the corresponding data. In this way, a complex classification problem is divided into multiple smaller problems so that the learning task is simplified. The proposed algorithm, here named GSVM-AR, is compared with SVM by KDDCUP04 protein homology prediction data. The experimental results show that finding the splitting hyperplane is not a trivial task (we should be careful to select the association rules to avoid overfitting) and GSVM-AR does show significant improvement compared to building one single SVM in the whole feature space. Another advantage is that the utility of GSVM-AR is very good because it is easy to be implemented. More importantly and more interestingly, GSVM provides a new mechanism to address complex classification problems.
Shouval, Roni; Labopin, Myriam; Bondi, Ori; Mishan-Shamay, Hila; Shimoni, Avichai; Ciceri, Fabio; Esteve, Jordi; Giebel, Sebastian; Gorin, Norbert C; Schmid, Christoph; Polge, Emmanuelle; Aljurf, Mahmoud; Kroger, Nicolaus; Craddock, Charles; Bacigalupo, Andrea; Cornelissen, Jan J; Baron, Frederic; Unger, Ron; Nagler, Arnon; Mohty, Mohamad
2015-10-01
Allogeneic hematopoietic stem-cell transplantation (HSCT) is potentially curative for acute leukemia (AL), but carries considerable risk. Machine learning algorithms, which are part of the data mining (DM) approach, may serve for transplantation-related mortality risk prediction. This work is a retrospective DM study on a cohort of 28,236 adult HSCT recipients from the AL registry of the European Group for Blood and Marrow Transplantation. The primary objective was prediction of overall mortality (OM) at 100 days after HSCT. Secondary objectives were estimation of nonrelapse mortality, leukemia-free survival, and overall survival at 2 years. Donor, recipient, and procedural characteristics were analyzed. The alternating decision tree machine learning algorithm was applied for model development on 70% of the data set and validated on the remaining data. OM prevalence at day 100 was 13.9% (n=3,936). Of the 20 variables considered, 10 were selected by the model for OM prediction, and several interactions were discovered. By using a logistic transformation function, the crude score was transformed into individual probabilities for 100-day OM (range, 3% to 68%). The model's discrimination for the primary objective performed better than the European Group for Blood and Marrow Transplantation score (area under the receiver operating characteristics curve, 0.701 v 0.646; P<.001). Calibration was excellent. Scores assigned were also predictive of secondary objectives. The alternating decision tree model provides a robust tool for risk evaluation of patients with AL before HSCT, and is available online (http://bioinfo.lnx.biu.ac.il/∼bondi/web1.html). It is presented as a continuous probabilistic score for the prediction of day 100 OM, extending prediction to 2 years. The DM method has proved useful for clinical prediction in HSCT. © 2015 by American Society of Clinical Oncology.
Application of Elements of TPM Strategy for Operation Analysis of Mining Machine
NASA Astrophysics Data System (ADS)
Brodny, Jaroslaw; Tutak, Magdalena
2017-12-01
Total Productive Maintenance (TPM) strategy includes group of activities and actions in order to maintenance machines in failure-free state and without breakdowns thanks to tending limitation of failures, non-planned shutdowns, lacks and non-planned service of machines. These actions are ordered to increase effectiveness of utilization of possessed devices and machines in company. Very significant element of this strategy is connection of technical actions with changes in their perception by employees. Whereas fundamental aim of introduction this strategy is improvement of economic efficiency of enterprise. Increasing competition and necessity of reduction of production costs causes that also mining enterprises are forced to introduce this strategy. In the paper examples of use of OEE model for quantitative evaluation of selected mining devices were presented. OEE model is quantitative tool of TPM strategy and can be the base for further works connected with its introduction. OEE indicator is the product of three components which include availability and performance of the studied machine and the quality of the obtained product. The paper presents the results of the effectiveness analysis of the use of a set of mining machines included in the longwall system, which is the first and most important link in the technological line of coal production. The set of analyzed machines included the longwall shearer, armored face conveyor and cruscher. From a reliability point of view, the analyzed set of machines is a system that is characterized by the serial structure. The analysis was based on data recorded by the industrial automation system used in the mines. This method of data acquisition ensured their high credibility and a full time synchronization. Conclusions from the research and analyses should be used to reduce breakdowns, failures and unplanned downtime, increase performance and improve production quality.
Mass Estimation and Its Applications
2012-02-23
parameters); e.g., the rect- angular kernel function has fixed width or fixed per unit size. But the rectangular function used in mass has no parameter...MassTER is implemented in JAVA , and we use DBSCAN in WEKA [13] and a version of DENCLUE implemented in R (www.r-project.org) in our empirical evaluation...Proceedings of SIGKDD, 2010, 989-998. [13] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
Chapter 16: text mining for translational bioinformatics.
Cohen, K Bretonnel; Hunter, Lawrence E
2013-04-01
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
75 FR 17511 - Coal Mine Dust Sampling Devices
Federal Register 2010, 2011, 2012, 2013, 2014
2010-04-06
... Part III Department of Labor Mine Safety and Health Adminisration 30 CFR Parts 18, 74, and 75 Coal Mine Dust Sampling Devices; High-Voltage Continuous Mining Machine Standard for Underground Coal Mines...-AB61 Coal Mine Dust Sampling Devices AGENCY: Mine Safety and Health Administration, Labor. ACTION...
Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia
2013-01-01
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
Photometric redshift estimation based on data mining with PhotoRApToR
NASA Astrophysics Data System (ADS)
Cavuoti, S.; Brescia, M.; De Stefano, V.; Longo, G.
2015-03-01
Photometric redshifts (photo-z) are crucial to the scientific exploitation of modern panchromatic digital surveys. In this paper we present PhotoRApToR (Photometric Research Application To Redshift): a Java/C ++ based desktop application capable to solve non-linear regression and multi-variate classification problems, in particular specialized for photo-z estimation. It embeds a machine learning algorithm, namely a multi-layer neural network trained by the Quasi Newton learning rule, and special tools dedicated to pre- and post-processing data. PhotoRApToR has been successfully tested on several scientific cases. The application is available for free download from the DAME Program web site.
Learning classification with auxiliary probabilistic information
Nguyen, Quang; Valizadegan, Hamed; Hauskrecht, Milos
2012-01-01
Finding ways of incorporating auxiliary information or auxiliary data into the learning process has been the topic of active data mining and machine learning research in recent years. In this work we study and develop a new framework for classification learning problem in which, in addition to class labels, the learner is provided with an auxiliary (probabilistic) information that reflects how strong the expert feels about the class label. This approach can be extremely useful for many practical classification tasks that rely on subjective label assessment and where the cost of acquiring additional auxiliary information is negligible when compared to the cost of the example analysis and labelling. We develop classification algorithms capable of using the auxiliary information to make the learning process more efficient in terms of the sample complexity. We demonstrate the benefit of the approach on a number of synthetic and real world data sets by comparing it to the learning with class labels only. PMID:25309141
Methodology of selecting dozers for lignite open pit mines in Serbia
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stojanovic, D.; Ignjatovic, D.; Kovacevic, S.
1996-12-31
Apart from the main production processes (coal and overburden mining, rail conveyors transportation and storage of excavated masses) performed by great-capacity mechanization at open pit mines, numerous and different auxiliary works, that often have crucial influence on both the work efficiency of main equipment and the maintenance of optimum technical conditions of machines and plants covering technological system of open pit, are present. Successful realization of work indispensably requires a proper and adequate selection of auxiliary machines according to their type quantity, capacity, power etc. thus highly respecting specific conditions existing at each and every open pit mine. A dozermore » is certainly the most important and representative auxiliary machine at single open pit mine. It is widely used in numerous works that, in fact, are preconditions for successful work of the main mechanization and consequently the very selection of a dozer ranges among the most important operations when selecting mechanization. This paper presents the methodology of dozers selection when lignite open pit mines are concerned. A mathematical model defining the volume of work required for dozers to perform at open pit mines and consequently the number of necessary dozers was designed. The model underwent testing in practice at big open pit mines and can be used in design of future open pits mines.« less
Machine Learning for Big Data: A Study to Understand Limits at Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R.; Del-Castillo-Negrete, Carlos Emilio
This report aims to empirically understand the limits of machine learning when applied to Big Data. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny, evaluation and application for gleaning insights from the data than ever before. Much is expected from algorithms without understanding their limitations at scale while dealing with massive datasets. In that context, we pose and address the following questions How does a machine learning algorithm perform on measuresmore » such as accuracy and execution time with increasing sample size and feature dimensionality? Does training with more samples guarantee better accuracy? How many features to compute for a given problem? Do more features guarantee better accuracy? Do efforts to derive and calculate more features and train on larger samples worth the effort? As problems become more complex and traditional binary classification algorithms are replaced with multi-task, multi-class categorization algorithms do parallel learners perform better? What happens to the accuracy of the learning algorithm when trained to categorize multiple classes within the same feature space? Towards finding answers to these questions, we describe the design of an empirical study and present the results. We conclude with the following observations (i) accuracy of the learning algorithm increases with increasing sample size but saturates at a point, beyond which more samples do not contribute to better accuracy/learning, (ii) the richness of the feature space dictates performance - both accuracy and training time, (iii) increased dimensionality often reflected in better performance (higher accuracy in spite of longer training times) but the improvements are not commensurate the efforts for feature computation and training and (iv) accuracy of the learning algorithms drop significantly with multi-class learners training on the same feature matrix and (v) learning algorithms perform well when categories in labeled data are independent (i.e., no relationship or hierarchy exists among categories).« less
Li, Linglong; Yang, Yaodong; Zhang, Dawei; ...
2018-03-30
Exploration of phase transitions and construction of associated phase diagrams are of fundamental importance for condensed matter physics and materials science alike, and remain the focus of extensive research for both theoretical and experimental studies. For the latter, comprehensive studies involving scattering, thermodynamics, and modeling are typically required. We present a new approach to data mining multiple realizations of collective dynamics, measured through piezoelectric relaxation studies, to identify the onset of a structural phase transition in nanometer-scale volumes, that is, the probed volume of an atomic force microscope tip. Machine learning is used to analyze the multidimensional data sets describingmore » relaxation to voltage and thermal stimuli, producing the temperature-bias phase diagram for a relaxor crystal without the need to measure (or know) the order parameter. The suitability of the approach to determine the phase diagram is shown with simulations based on a two-dimensional Ising model. Finally, these results indicate that machine learning approaches can be used to determine phase transitions in ferroelectrics, providing a general, statistically significant, and robust approach toward determining the presence of critical regimes and phase boundaries.« less
Classification of ROTSE Variable Stars using Machine Learning
NASA Astrophysics Data System (ADS)
Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration
2001-12-01
We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Linglong; Yang, Yaodong; Zhang, Dawei
Exploration of phase transitions and construction of associated phase diagrams are of fundamental importance for condensed matter physics and materials science alike, and remain the focus of extensive research for both theoretical and experimental studies. For the latter, comprehensive studies involving scattering, thermodynamics, and modeling are typically required. We present a new approach to data mining multiple realizations of collective dynamics, measured through piezoelectric relaxation studies, to identify the onset of a structural phase transition in nanometer-scale volumes, that is, the probed volume of an atomic force microscope tip. Machine learning is used to analyze the multidimensional data sets describingmore » relaxation to voltage and thermal stimuli, producing the temperature-bias phase diagram for a relaxor crystal without the need to measure (or know) the order parameter. The suitability of the approach to determine the phase diagram is shown with simulations based on a two-dimensional Ising model. Finally, these results indicate that machine learning approaches can be used to determine phase transitions in ferroelectrics, providing a general, statistically significant, and robust approach toward determining the presence of critical regimes and phase boundaries.« less
Models of Weather Impact on Air Traffic
NASA Technical Reports Server (NTRS)
Kulkarni, Deepak; Wang, Yao
2017-01-01
Flight delays have been a serious problem in the national airspace system costing about $30B per year. About 70 of the delays are attributed to weather and upto two thirds of these are avoidable. Better decision support tools would reduce these delays and improve air traffic management tools. Such tools would benefit from models of weather impacts on the airspace operations. This presentation discusses use of machine learning methods to mine various types of weather and traffic data to develop such models.
Evolving optimised decision rules for intrusion detection using particle swarm paradigm
NASA Astrophysics Data System (ADS)
Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.
2012-12-01
The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
30 CFR 57.22308 - Methane monitors (III mines).
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Methane monitors (III mines). 57.22308 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22308 Methane monitors (III mines). (a) Methane monitors shall be installed on continuous mining machines and longwall mining systems. (b) The...
30 CFR 57.22308 - Methane monitors (III mines).
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Methane monitors (III mines). 57.22308 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22308 Methane monitors (III mines). (a) Methane monitors shall be installed on continuous mining machines and longwall mining systems. (b) The...
Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng
2014-01-01
Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154
Supervised Learning Based Hypothesis Generation from Biomedical Literature.
Sang, Shengtian; Yang, Zhihao; Li, Zongyao; Lin, Hongfei
2015-01-01
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is much useful knowledge undiscovered in this literature. Researchers can form biomedical hypotheses through mining these works. In this paper, we propose a supervised learning based approach to generate hypotheses from biomedical literature. This approach splits the traditional processing of hypothesis generation with classic ABC model into AB model and BC model which are constructed with supervised learning method. Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts. Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature. The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.
Fang, Xingang; Bagui, Sikha; Bagui, Subhash
2017-08-01
The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.
Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J
2015-02-01
The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.
30 CFR 57.22306 - Methane monitors (I-A mines).
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Methane monitors (I-A mines). 57.22306 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22306 Methane monitors (I-A mines). (a) Methane monitors shall be installed on continuous mining machines, longwall mining systems, and on loading...
30 CFR 57.22307 - Methane monitors (II-A mines).
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Methane monitors (II-A mines). 57.22307 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22307 Methane monitors (II-A mines). (a) Methane monitors shall be installed on continuous mining machines, longwall mining systems, bench and face...
30 CFR 57.22306 - Methane monitors (I-A mines).
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Methane monitors (I-A mines). 57.22306 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22306 Methane monitors (I-A mines). (a) Methane monitors shall be installed on continuous mining machines, longwall mining systems, and on loading...
30 CFR 57.22307 - Methane monitors (II-A mines).
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Methane monitors (II-A mines). 57.22307 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22307 Methane monitors (II-A mines). (a) Methane monitors shall be installed on continuous mining machines, longwall mining systems, bench and face...
NASA Technical Reports Server (NTRS)
Oza, Nikunj C.
2004-01-01
Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.
Mining EEG with SVM for Understanding Cognitive Underpinnings of Math Problem Solving Strategies
López, Julio
2018-01-01
We have developed a new methodology for examining and extracting patterns from brain electric activity by using data mining and machine learning techniques. Data was collected from experiments focused on the study of cognitive processes that might evoke different specific strategies in the resolution of math problems. A binary classification problem was constructed using correlations and phase synchronization between different electroencephalographic channels as characteristics and, as labels or classes, the math performances of individuals participating in specially designed experiments. The proposed methodology is based on using well-established procedures of feature selection, which were used to determine a suitable brain functional network size related to math problem solving strategies and also to discover the most relevant links in this network without including noisy connections or excluding significant connections. PMID:29670667
Mining EEG with SVM for Understanding Cognitive Underpinnings of Math Problem Solving Strategies.
Bosch, Paul; Herrera, Mauricio; López, Julio; Maldonado, Sebastián
2018-01-01
We have developed a new methodology for examining and extracting patterns from brain electric activity by using data mining and machine learning techniques. Data was collected from experiments focused on the study of cognitive processes that might evoke different specific strategies in the resolution of math problems. A binary classification problem was constructed using correlations and phase synchronization between different electroencephalographic channels as characteristics and, as labels or classes, the math performances of individuals participating in specially designed experiments. The proposed methodology is based on using well-established procedures of feature selection, which were used to determine a suitable brain functional network size related to math problem solving strategies and also to discover the most relevant links in this network without including noisy connections or excluding significant connections.
Cumulative Risk and Impact Modeling on Environmental Chemical and Social Stressors.
Huang, Hongtai; Wang, Aolin; Morello-Frosch, Rachel; Lam, Juleen; Sirota, Marina; Padula, Amy; Woodruff, Tracey J
2018-03-01
The goal of this review is to identify cumulative modeling methods used to evaluate combined effects of exposures to environmental chemicals and social stressors. The specific review question is: What are the existing quantitative methods used to examine the cumulative impacts of exposures to environmental chemical and social stressors on health? There has been an increase in literature that evaluates combined effects of exposures to environmental chemicals and social stressors on health using regression models; very few studies applied other data mining and machine learning techniques to this problem. The majority of studies we identified used regression models to evaluate combined effects of multiple environmental and social stressors. With proper study design and appropriate modeling assumptions, additional data mining methods may be useful to examine combined effects of environmental and social stressors.
Mining key elements for severe convection prediction based on CNN
NASA Astrophysics Data System (ADS)
Liu, Ming; Pan, Ning; Zhang, Changan; Sha, Hongzhou; Zhang, Bolei; Liu, Liang; Zhang, Meng
2017-04-01
Severe convective weather is a kind of weather disasters accompanied by heavy rainfall, gust wind, hail, etc. Along with recent developments on remote sensing and numerical modeling, there are high-volume and long-term observational and modeling data accumulated to capture massive severe convective events over particular areas and time periods. With those high-volume and high-variety weather data, most of the existing studies and methods carry out the dynamical laws, cause analysis, potential rule study, and prediction enhancement by utilizing the governing equations from fluid dynamics and thermodynamics. In this study, a key-element mining method is proposed for severe convection prediction based on convolution neural network (CNN). It aims to identify the key areas and key elements from huge amounts of historical weather data including conventional measurements, weather radar, satellite, so as numerical modeling and/or reanalysis data. Under this manner, the machine-learning based method could help the human forecasters on their decision-making on operational weather forecasts on severe convective weathers by extracting key information from the real-time and historical weather big data. In this paper, it first utilizes computer vision technology to complete the data preprocessing work of the meteorological variables. Then, it utilizes the information such as radar map and expert knowledge to annotate all images automatically. And finally, by using CNN model, it cloud analyze and evaluate each weather elements (e.g., particular variables, patterns, features, etc.), and identify key areas of those critical weather elements, then help forecasters quickly screen out the key elements from huge amounts of observation data by current weather conditions. Based on the rich weather measurement and model data (up to 10 years) over Fujian province in China, where the severe convective weathers are very active during the summer months, experimental tests are conducted with the new machine-learning method via CNN models. Based on the analysis of those experimental results and case studies, the proposed new method have below benefits for the severe convection prediction: (1) helping forecasters to narrow down the scope of analysis and saves lead-time for those high-impact severe convection; (2) performing huge amount of weather big data by machine learning methods rather relying on traditional theory and knowledge, which provide new method to explore and quantify the severe convective weathers; (3) providing machine learning based end-to-end analysis and processing ability with considerable scalability on data volumes, and accomplishing the analysis work without human intervention.
NASA Astrophysics Data System (ADS)
Kotelnikov, E. V.; Milov, V. R.
2018-05-01
Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
SVS: data and knowledge integration in computational biology.
Zycinski, Grzegorz; Barla, Annalisa; Verri, Alessandro
2011-01-01
In this paper we present a framework for structured variable selection (SVS). The main concept of the proposed schema is to take a step towards the integration of two different aspects of data mining: database and machine learning perspective. The framework is flexible enough to use not only microarray data, but other high-throughput data of choice (e.g. from mass spectrometry, microarray, next generation sequencing). Moreover, the feature selection phase incorporates prior biological knowledge in a modular way from various repositories and is ready to host different statistical learning techniques. We present a proof of concept of SVS, illustrating some implementation details and describing current results on high-throughput microarray data.
Construction accident narrative classification: An evaluation of text mining techniques.
Goh, Yang Miang; Ubeynarayana, C U
2017-11-01
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Sajn, Luka; Kukar, Matjaž
2011-12-01
The paper presents results of our long-term study on using image processing and data mining methods in a medical imaging. Since evaluation of modern medical images is becoming increasingly complex, advanced analytical and decision support tools are involved in integration of partial diagnostic results. Such partial results, frequently obtained from tests with substantial imperfections, are integrated into ultimate diagnostic conclusion about the probability of disease for a given patient. We study various topics such as improving the predictive power of clinical tests by utilizing pre-test and post-test probabilities, texture representation, multi-resolution feature extraction, feature construction and data mining algorithms that significantly outperform medical practice. Our long-term study reveals three significant milestones. The first improvement was achieved by significantly increasing post-test diagnostic probabilities with respect to expert physicians. The second, even more significant improvement utilizes multi-resolution image parametrization. Machine learning methods in conjunction with the feature subset selection on these parameters significantly improve diagnostic performance. However, further feature construction with the principle component analysis on these features elevates results to an even higher accuracy level that represents the third milestone. With the proposed approach clinical results are significantly improved throughout the study. The most significant result of our study is improvement in the diagnostic power of the whole diagnostic process. Our compound approach aids, but does not replace, the physician's judgment and may assist in decisions on cost effectiveness of tests. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Parodi, Stefano; Dosi, Corrado; Zambon, Antonella; Ferrari, Enrico; Muselli, Marco
2017-12-01
Identifying potential risk factors for problem gambling (PG) is of primary importance for planning preventive and therapeutic interventions. We illustrate a new approach based on the combination of standard logistic regression and an innovative method of supervised data mining (Logic Learning Machine or LLM). Data were taken from a pilot cross-sectional study to identify subjects with PG behaviour, assessed by two internationally validated scales (SOGS and Lie/Bet). Information was obtained from 251 gamblers recruited in six betting establishments. Data on socio-demographic characteristics, lifestyle and cognitive-related factors, and type, place and frequency of preferred gambling were obtained by a self-administered questionnaire. The following variables associated with PG were identified: instant gratification games, alcohol abuse, cognitive distortion, illegal behaviours and having started gambling with a relative or a friend. Furthermore, the combination of LLM and LR indicated the presence of two different types of PG, namely: (a) daily gamblers, more prone to illegal behaviour, with poor money management skills and who started gambling at an early age, and (b) non-daily gamblers, characterised by superstitious beliefs and a higher preference for immediate reward games. Finally, instant gratification games were strongly associated with the number of games usually played. Studies on gamblers habitually frequently betting shops are rare. The finding of different types of PG by habitual gamblers deserves further analysis in larger studies. Advanced data mining algorithms, like LLM, are powerful tools and potentially useful in identifying risk factors for PG.
NASA Astrophysics Data System (ADS)
Mote, P.; Foster, J. G.; Daley-Laursen, S. B.
2014-12-01
The Northwest has the nation's strongest geographic, institutional, and scientific alignment between NOAA RISA, DOI Climate Science Center, USDA Climate Hub, and participating universities. Considering each of those institutions' distinct mission, funding structures, governance, stakeholder engagement, methods of priority-setting, and deliverables, it is a challenge to find areas of common interest and ways for these institutions to work together. In view of the rich history of stakeholder engagement and the deep base of previous research on climate change in the region, these institutions are cooperating in developing a regional capacity to mine the vast available data in ways that are mutually beneficial, synergistic, and regionally relevant. Fundamentally, data mining means exploring connections across and within multiple datasets using advanced statistical techniques, development of multidimensional indices, machine learning, and more. The challenge is not just what we do with big datasets, but how we integrate the wide variety and types of data coming out of scenario analyses to create knowledge and inform decision-making. Federal agencies and their partners need to learn integrate big data on climate change and develop useful tools for important stake-holders to assist them in anticipating the main stresses of climate change to their own resources and preparing to abate those stresses.
VisualUrText: A Text Analytics Tool for Unstructured Textual Data
NASA Astrophysics Data System (ADS)
Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.
2018-05-01
The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.
Yosipof, Abraham; Nahum, Oren E; Anderson, Assaf Y; Barad, Hannah-Noa; Zaban, Arie; Senderowitz, Hanoch
2015-06-01
Growth in energy demands, coupled with the need for clean energy, are likely to make solar cells an important part of future energy resources. In particular, cells entirely made of metal oxides (MOs) have the potential to provide clean and affordable energy if their power conversion efficiencies are improved. Such improvements require the development of new MOs which could benefit from combining combinatorial material sciences for producing solar cells libraries with data mining tools to direct synthesis efforts. In this work we developed a data mining workflow and applied it to the analysis of two recently reported solar cell libraries based on Titanium and Copper oxides. Our results demonstrate that QSAR models with good prediction statistics for multiple solar cells properties could be developed and that these models highlight important factors affecting these properties in accord with experimental findings. The resulting models are therefore suitable for designing better solar cells. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Vestrand, W. T.; Theiler, J.; Woznia, P. R.
2004-10-01
The existence of rapidly slewing robotic telescopes and fast alert distribution via the Internet is revolutionizing our capability to study the physics of fast astrophysical transients. But the salient challenge that optical time domain surveys must conquer is mining the torrent of data to recognize important transients in a scene full of normal variations. Humans simply do not have the attention span, memory, or reaction time required to recognize fast transients and rapidly respond. Autonomous robotic instrumentation with the ability to extract pertinent information from the data stream in real time will therefore be essential for recognizing transients and commanding rapid follow-up observations while the ephemeral behavior is still present. Here we discuss how the development and integration of three technologies: (1) robotic telescope networks; (2) machine learning; and (3) advanced database technology, can enable the construction of smart robotic telescopes, which we loosely call ``thinking'' telescopes, capable of mining the sky in real time.
Chemical named entities recognition: a review on approaches and applications.
Eltyeb, Safaa; Salim, Naomie
2014-01-01
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.
Mechanization for Optimal Landscape Reclamation
NASA Astrophysics Data System (ADS)
Vondráčková, Terezie; Voštová, Věra; Kraus, Michal
2017-12-01
Reclamation is a method of ultimate utilization of land adversely affected by mining or other industrial activity. The paper explains the types of reclamation and the term “optimal reclamation”. Technological options of the long-lasting process of mine dumps reclamation starting with the removal of overlying rocks, transport and backfilling up to the follow-up remodelling of the mine dumps terrain. Technological units and equipment for stripping flow division. Stripping flow solution with respect to optimal reclamation. We recommend that the application of logistic chains and mining simulation with follow-up reclamation to open-pit mines be used for the implementation of optimal reclamation. In addition to a database of local heterogeneities of the stripped soil and reclaimed land, the flow of earths should be resolved in a manner allowing the most suitable soil substrate to be created for the restoration of agricultural and forest land on mine dumps. The methodology under development for the solution of a number of problems, including the geological survey of overlying rocks, extraction of stripping, their transport and backfilling in specified locations with the follow-up deployment of goal-directed reclamation. It will make possible to reduce the financial resources needed for the complex process chain by utilizing GIS, GPS and DGPS technologies, logistic tools and synergistic effects. When selecting machines for transport, moving and spreading of earths, various points of view and aspects must be taken into account. Among such aspects are e.g. the kind of earth to be operated by the respective construction machine, the kind of work activities to be performed, the machine’s capacity, the option to control the machine’s implement and economic aspects and clients’ requirements. All these points of view must be considered in the decision-making process so that the selected machine is capable of executing the required activity and that the use of an unsuitable machine is eliminated as it would result in a delay and increase in the project costs. Therefore, reclamation always includes extensive earth-moving work activities restoring the required relief of the land being reclaimed. Using the earth-moving machine capacity, the kind of soil in mine dumps, the kind of the work activity performed and the machine design, a SW application has been developed that allows the most suitable machine for the respective work technology to be selected with a view to preparing the land intended for reclamation.
Machine Learning in Medical Imaging.
Giger, Maryellen L
2018-03-01
Advances in both imaging and computers have synergistically led to a rapid rise in the potential use of artificial intelligence in various radiological imaging tasks, such as risk assessment, detection, diagnosis, prognosis, and therapy response, as well as in multi-omics disease discovery. A brief overview of the field is given here, allowing the reader to recognize the terminology, the various subfields, and components of machine learning, as well as the clinical potential. Radiomics, an expansion of computer-aided diagnosis, has been defined as the conversion of images to minable data. The ultimate benefit of quantitative radiomics is to (1) yield predictive image-based phenotypes of disease for precision medicine or (2) yield quantitative image-based phenotypes for data mining with other -omics for discovery (ie, imaging genomics). For deep learning in radiology to succeed, note that well-annotated large data sets are needed since deep networks are complex, computer software and hardware are evolving constantly, and subtle differences in disease states are more difficult to perceive than differences in everyday objects. In the future, machine learning in radiology is expected to have a substantial clinical impact with imaging examinations being routinely obtained in clinical practice, providing an opportunity to improve decision support in medical image interpretation. The term of note is decision support, indicating that computers will augment human decision making, making it more effective and efficient. The clinical impact of having computers in the routine clinical practice may allow radiologists to further integrate their knowledge with their clinical colleagues in other medical specialties and allow for precision medicine. Copyright © 2018. Published by Elsevier Inc.
30 CFR 57.22309 - Methane monitors (V-A mines).
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Methane monitors (V-A mines). 57.22309 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22309 Methane monitors (V-A mines). (a) Methane monitors shall be installed on continuous mining machines used in or beyond the last open crosscut...
30 CFR 57.22309 - Methane monitors (V-A mines).
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Methane monitors (V-A mines). 57.22309 Section... Standards for Methane in Metal and Nonmetal Mines Equipment § 57.22309 Methane monitors (V-A mines). (a) Methane monitors shall be installed on continuous mining machines used in or beyond the last open crosscut...
Gobeill, Julien; Pasche, Emilie; Vishnyakova, Dina; Ruch, Patrick
2013-01-01
The available curated data lag behind current biological knowledge contained in the literature. Text mining can assist biologists and curators to locate and access this knowledge, for instance by characterizing the functional profile of publications. Gene Ontology (GO) category assignment in free text already supports various applications, such as powering ontology-based search engines, finding curation-relevant articles (triage) or helping the curator to identify and encode functions. Popular text mining tools for GO classification are based on so called thesaurus-based--or dictionary-based--approaches, which exploit similarities between the input text and GO terms themselves. But their effectiveness remains limited owing to the complex nature of GO terms, which rarely occur in text. In contrast, machine learning approaches exploit similarities between the input text and already curated instances contained in a knowledge base to infer a functional profile. GO Annotations (GOA) and MEDLINE make possible to exploit a growing amount of curated abstracts (97 000 in November 2012) for populating this knowledge base. Our study compares a state-of-the-art thesaurus-based system with a machine learning system (based on a k-Nearest Neighbours algorithm) for the task of proposing a functional profile for unseen MEDLINE abstracts, and shows how resources and performances have evolved. Systems are evaluated on their ability to propose for a given abstract the GO terms (2.8 on average) used for curation in GOA. We show that since 2006, although a massive effort was put into adding synonyms in GO (+300%), our thesaurus-based system effectiveness is rather constant, reaching from 0.28 to 0.31 for Recall at 20 (R20). In contrast, thanks to its knowledge base growth, our machine learning system has steadily improved, reaching from 0.38 in 2006 to 0.56 for R20 in 2012. Integrated in semi-automatic workflows or in fully automatic pipelines, such systems are more and more efficient to provide assistance to biologists. DATABASE URL: http://eagl.unige.ch/GOCat/
30 CFR 70.207 - Bimonthly sampling; mechanized mining units.
Code of Federal Regulations, 2012 CFR
2012-07-01
... sampling device as follows: (1) Conventional section using cutting machine. On the cutting machine operator or on the cutting machine within 36 inches inby the normal working position; (2) Conventional section shooting off the solid. On the loading machine operator or on the loading machine within 36 inches inby the...
30 CFR 70.207 - Bimonthly sampling; mechanized mining units.
Code of Federal Regulations, 2014 CFR
2014-07-01
... sampling device as follows: (1) Conventional section using cutting machine. On the cutting machine operator or on the cutting machine within 36 inches inby the normal working position; (2) Conventional section shooting off the solid. On the loading machine operator or on the loading machine within 36 inches inby the...
30 CFR 70.207 - Bimonthly sampling; mechanized mining units.
Code of Federal Regulations, 2013 CFR
2013-07-01
... sampling device as follows: (1) Conventional section using cutting machine. On the cutting machine operator or on the cutting machine within 36 inches inby the normal working position; (2) Conventional section shooting off the solid. On the loading machine operator or on the loading machine within 36 inches inby the...
A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics.
Deng, Wan-Yu; Bai, Zuo; Huang, Guang-Bin; Zheng, Qing-Hua
2016-05-01
Big dimensional data is a growing trend that is emerging in many real world contexts, extending from web mining, gene expression analysis, protein-protein interaction to high-frequency financial data. Nowadays, there is a growing consensus that the increasing dimensionality poses impeding effects on the performances of classifiers, which is termed as the "peaking phenomenon" in the field of machine intelligence. To address the issue, dimensionality reduction is commonly employed as a preprocessing step on the Big dimensional data before building the classifiers. In this paper, we propose an Extreme Learning Machine (ELM) approach for large-scale data analytic. In contrast to existing approaches, we embed hidden nodes that are designed using singular value decomposition (SVD) into the classical ELM. These SVD nodes in the hidden layer are shown to capture the underlying characteristics of the Big dimensional data well, exhibiting excellent generalization performances. The drawback of using SVD on the entire dataset, however, is the high computational complexity involved. To address this, a fast divide and conquer approximation scheme is introduced to maintain computational tractability on high volume data. The resultant algorithm proposed is labeled here as Fast Singular Value Decomposition-Hidden-nodes based Extreme Learning Machine or FSVD-H-ELM in short. In FSVD-H-ELM, instead of identifying the SVD hidden nodes directly from the entire dataset, SVD hidden nodes are derived from multiple random subsets of data sampled from the original dataset. Comprehensive experiments and comparisons are conducted to assess the FSVD-H-ELM against other state-of-the-art algorithms. The results obtained demonstrated the superior generalization performance and efficiency of the FSVD-H-ELM. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred
2017-01-01
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388
LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team
2011-01-01
The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.
NASA Astrophysics Data System (ADS)
Moturu, Sai T.; Liu, Huan; Johnson, William G.
Rapidly rising healthcare costs represent one of the major issues plaguing the healthcare system. Data from the Arizona Health Care Cost Containment System, Arizona's Medicaid program provide a unique opportunity to exploit state-of-the-art machine learning and data mining algorithms to analyze data and provide actionable findings that can aid cost containment. Our work addresses specific challenges in this real-life healthcare application with respect to data imbalance in the process of building predictive risk models for forecasting high-cost patients. We survey the literature and propose novel data mining approaches customized for this compelling application with specific focus on non-random sampling. Our empirical study indicates that the proposed approach is highly effective and can benefit further research on cost containment in the healthcare industry.
Support Vector Machines for Multitemporal and Multisensor Change Detection in a Mining Area
NASA Astrophysics Data System (ADS)
Hecheltjen, Antje; Waske, Bjorn; Thonfeld, Frank; Braun, Matthias; Menz, Gunter
2010-12-01
Long-term change detection often implies the challenge of incorporating multitemporal data from different sensors. Most of the conventional change detection algorithms are designed for bi-temporal datasets from the same sensors detecting only the existence of changes. The labeling of change areas remains a difficult task. To overcome such drawbacks, much attention has been given lately to algorithms arising from machine learning, such as Support Vector Machines (SVMs). While SVMs have been applied successfully for land cover classifications, the exploitation of this approach for change detection is still in its infancy. Few studies have already proven the applicability of SVMs for bi- and multitemporal change detection using data from one sensor only. In this paper we demonstrate the application of SVM for multitemporal and -sensor change detection. Our study site covers lignite open pit mining areas in the German state North Rhine-Westphalia. The dataset consists of bi-temporal Landsat data and multi-temporal ERS SAR data covering two time slots (2001 and 2009). The SVM is conducted using the IDL program imageSVM. Change is deduced from one time slot to the next resulting in two change maps. In contrast to change detection, which is based on post-classification comparison, change detection is seen here as a specific classification problem. Thus, changes are directly classified from a layer-stack of the two years. To reduce the number of change classes, we created a change mask using the magnitude of Change Vector Analysis (CVA). Training data were selected for different change classes (e.g. forest to mining or mining to agriculture) as well as for the no-change classes (e.g. agriculture). Subsequently, they were divided in two independent sets for training the SVMs and accuracy assessment, respectively. Our study shows the applicability of SVMs to classify changes via SVMs. The proposed method yielded a change map of reclaimed and active mines. The use of ERS SAR data, however, did not add to the accuracy compared to Landsat data only. A great advantage compared to other change detection approaches are the labeled change maps, which are a direct output of the methodology. Our approach also overcomes the drawback of post-classification comparison, namely the propagation of classification inaccuracies.
Banaee, Hadi; Ahmed, Mobyen Uddin; Loutfi, Amy
2013-01-01
The past few years have witnessed an increase in the development of wearable sensors for health monitoring systems. This increase has been due to several factors such as development in sensor technology as well as directed efforts on political and stakeholder levels to promote projects which address the need for providing new methods for care given increasing challenges with an aging population. An important aspect of study in such system is how the data is treated and processed. This paper provides a recent review of the latest methods and algorithms used to analyze data from wearable sensors used for physiological monitoring of vital signs in healthcare services. In particular, the paper outlines the more common data mining tasks that have been applied such as anomaly detection, prediction and decision making when considering in particular continuous time series measurements. Moreover, the paper further details the suitability of particular data mining and machine learning methods used to process the physiological data and provides an overview of the properties of the data sets used in experimental validation. Finally, based on this literature review, a number of key challenges have been outlined for data mining methods in health monitoring systems. PMID:24351646
Banaee, Hadi; Ahmed, Mobyen Uddin; Loutfi, Amy
2013-12-17
The past few years have witnessed an increase in the development of wearable sensors for health monitoring systems. This increase has been due to several factors such as development in sensor technology as well as directed efforts on political and stakeholder levels to promote projects which address the need for providing new methods for care given increasing challenges with an aging population. An important aspect of study in such system is how the data is treated and processed. This paper provides a recent review of the latest methods and algorithms used to analyze data from wearable sensors used for physiological monitoring of vital signs in healthcare services. In particular, the paper outlines the more common data mining tasks that have been applied such as anomaly detection, prediction and decision making when considering in particular continuous time series measurements. Moreover, the paper further details the suitability of particular data mining and machine learning methods used to process the physiological data and provides an overview of the properties of the data sets used in experimental validation. Finally, based on this literature review, a number of key challenges have been outlined for data mining methods in health monitoring systems.
Text Mining to Support Gene Ontology Curation and Vice Versa.
Ruch, Patrick
2017-01-01
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
30 CFR 77.401 - Stationary grinding machines; protective devices.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Stationary grinding machines; protective... OF UNDERGROUND COAL MINES Safeguards for Mechanical Equipment § 77.401 Stationary grinding machines; protective devices. (a) Stationary grinding machines other than special bit grinders shall be equipped with...
30 CFR 77.401 - Stationary grinding machines; protective devices.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Stationary grinding machines; protective... OF UNDERGROUND COAL MINES Safeguards for Mechanical Equipment § 77.401 Stationary grinding machines; protective devices. (a) Stationary grinding machines other than special bit grinders shall be equipped with...
NASA Astrophysics Data System (ADS)
Galitsky, Boris; Kovalerchuk, Boris
2006-04-01
We develop a software system Text Scanner for Emotional Distress (TSED) for helping to detect email messages which are suspicious of coming from people under strong emotional distress. It has been confirmed by multiple studies that terrorist attackers have experienced a substantial emotional distress at some points before committing a terrorist attack. Therefore, if an individual in emotional distress can be detected on the basis of email texts, some preventive measures can be taken. The proposed detection machinery is based on extraction and classification of emotional profiles from emails. An emotional profile is a formal representation of a sequence of emotional states through a textual discourse where communicative actions are attached to these emotional states. The issues of extraction of emotional profiles from text and reasoning about it are discussed and illustrated. We then develop an inductive machine learning and reasoning framework to relate an emotional profile to the class "Emotional distress" or "No emotional distress", given a training dataset where the class is assigned by an expert. TSED's machine learning is evaluated using the database of structured customer complaints.
An open experimental database for exploring inorganic materials
Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; ...
2018-04-03
The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half ofmore » these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.« less
Ritchie, Marylyn D; White, Bill C; Parker, Joel S; Hahn, Lance W; Moore, Jason H
2003-01-01
Background Appropriate definition of neural network architecture prior to data analysis is crucial for successful data mining. This can be challenging when the underlying model of the data is unknown. The goal of this study was to determine whether optimizing neural network architecture using genetic programming as a machine learning strategy would improve the ability of neural networks to model and detect nonlinear interactions among genes in studies of common human diseases. Results Using simulated data, we show that a genetic programming optimized neural network approach is able to model gene-gene interactions as well as a traditional back propagation neural network. Furthermore, the genetic programming optimized neural network is better than the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present. Conclusion This study suggests that a machine learning strategy for optimizing neural network architecture may be preferable to traditional trial-and-error approaches for the identification and characterization of gene-gene interactions in common, complex human diseases. PMID:12846935
An open experimental database for exploring inorganic materials.
Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; Perkins, John D; White, Robert; Munch, Kristin; Tumas, William; Phillips, Caleb
2018-04-03
The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half of these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.
An open experimental database for exploring inorganic materials
Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; Perkins, John D.; White, Robert; Munch, Kristin; Tumas, William; Phillips, Caleb
2018-01-01
The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half of these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource. PMID:29611842
An open experimental database for exploring inorganic materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus
The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half ofmore » these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.« less
Liu, Shengyu; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Fan, Xiaoming
2015-01-01
Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043
NASA Astrophysics Data System (ADS)
Matetic, Rudy J.
Over-exposure to noise remains a widespread and serious health hazard in the U.S. mining industries despite 25 years of regulation. Every day, 80% of the nation's miners go to work in an environment where the time weighted average (TWA) noise level exceeds 85 dBA and more than 25% of the miners are exposed to a TWA noise level that exceeds 90 dBA, the permissible exposure limit (PEL). Additionally, MSHA coal noise sample data collected from 2000 to 2002 show that 65% of the equipment whose operators exceeded 100% noise dosage comprise only seven different types of machines; auger miners, bulldozers, continuous miners, front end loaders, roof bolters, shuttle cars (electric), and trucks. In addition, the MSHA data indicate that the roof bolter is third among all the equipment and second among equipment in underground coal whose operators exceed 100% dosage. A research program was implemented to: (1) determine, characterize and to measure sound power levels radiated by a roof bolting machine during differing drilling configurations (thrust, rotational speed, penetration rate, etc.) and utilizing differing types of drilling methods in high compressive strength rock media (>20,000 psi). The research approach characterized the sound power level results from laboratory testing and provided the mining industry with empirical data relative to utilizing differing noise control technologies (drilling configurations and types of drilling methods) in reducing sound power level emissions on a roof bolting machine; (2) distinguish and correlate the empirical data into one, statistically valid, equation, in which, provided the mining industry with a tool to predict overall sound power levels of a roof bolting machine given any type of drilling configuration and drilling method utilized in industry; (3) provided the mining industry with several approaches to predict or determine sound pressure levels in an underground coal mine utilizing laboratory test results from a roof bolting machine and (4) described a method for determining an operators' noise dosage of a roof bolting machine utilizing predicted or determined sound pressure levels.
Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure
Badger, Jonathan; LaRose, Eric; Shirzadi, Ehsan; Mahnke, Andrea; Mayer, John; Ye, Zhan; Page, David; Peissig, Peggy
2017-01-01
Background The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. Objective The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. Methods We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. Results The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. Conclusions To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis. PMID:29222076
Sudha, M
2017-09-27
As a recent trend, various computational intelligence and machine learning approaches have been used for mining inferences hidden in the large clinical databases to assist the clinician in strategic decision making. In any target data the irrelevant information may be detrimental, causing confusion for the mining algorithm and degrades the prediction outcome. To address this issue, this study attempts to identify an intelligent approach to assist disease diagnostic procedure using an optimal set of attributes instead of all attributes present in the clinical data set. In this proposed Application Specific Intelligent Computing (ASIC) decision support system, a rough set based genetic algorithm is employed in pre-processing phase and a back propagation neural network is applied in training and testing phase. ASIC has two phases, the first phase handles outliers, noisy data, and missing values to obtain a qualitative target data to generate appropriate attribute reduct sets from the input data using rough computing based genetic algorithm centred on a relative fitness function measure. The succeeding phase of this system involves both training and testing of back propagation neural network classifier on the selected reducts. The model performance is evaluated with widely adopted existing classifiers. The proposed ASIC system for clinical decision support has been tested with breast cancer, fertility diagnosis and heart disease data set from the University of California at Irvine (UCI) machine learning repository. The proposed system outperformed the existing approaches attaining the accuracy rate of 95.33%, 97.61%, and 93.04% for breast cancer, fertility issue and heart disease diagnosis.
Text mining for traditional Chinese medical knowledge discovery: a survey.
Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan
2010-08-01
Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Petropoulos, G.; Partsinevelos, P.; Mitraka, Z.
2012-04-01
Surface mining has been shown to cause intensive environmental degradation in terms of landscape, vegetation and biological communities. Nowadays, the commercial availability of remote sensing imagery at high spatiotemporal scales, has improved dramatically our ability to monitor surface mining activity and evaluate its impact on the environment and society. In this study we investigate the potential use of Landsat TM imagery combined with diverse classification techniques, namely artificial neural networks and support vector machines for delineating mining exploration and assessing its effect on vegetation in various surface mining sites in the Greek island of Milos. Assessment of the mining impact in the study area is validated through the analysis of available QuickBird imagery acquired nearly concurrently to the TM overpasses. Results indicate the capability of the TM sensor combined with the image analysis applied herein as a potential economically viable solution to provide rapidly and at regular time intervals information on mining activity and its impact to the local environment. KEYWORDS: mining environmental impact, remote sensing, image classification, change detection, land reclamation, support vector machines, neural networks
Determining Underground Mining Work Postures Using Motion Capture and Digital Human Modeling
Lutz, Timothy J.; DuCarme, Joseph P.; Smith, Adam K.; Ambrose, Dean
2017-01-01
According to Mine Safety and Health Administration (MSHA) data, during 2008–2012 in the U.S., there were, on average, 65 lost-time accidents per year during routine mining and maintenance activities involving remote-controlled continuous mining machines (CMMs). To address this problem, the National Institute for Occupational Safety and Health (NIOSH) is currently investigating the implementation and integration of existing and emerging technologies in underground mines to provide automated, intelligent proximity detection (iPD) devices on CMMs. One research goal of NIOSH is to enhance the proximity detection system by improving its capability to track and determine identity, position, and posture of multiple workers, and to selectively disable machine functions to keep workers and machine operators safe. Posture of the miner can determine the safe working distance from a CMM by way of the variation in the proximity detection magnetic field. NIOSH collected and analyzed motion capture data and calculated joint angles of the back, hips, and knees from various postures on 12 human subjects. The results of the analysis suggests that lower body postures can be identified by observing the changes in joint angles of the right hip, left hip, right knee, and left knee. PMID:28626796
jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints
2011-01-01
Background The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. Results We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. Conclusions jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining. PMID:21219648
Li, Yanpeng; Hu, Xiaohua; Lin, Hongfei; Yang, Zhihao
2011-01-01
Feature representation is essential to machine learning and text mining. In this paper, we present a feature coupling generalization (FCG) framework for generating new features from unlabeled data. It selects two special types of features, i.e., example-distinguishing features (EDFs) and class-distinguishing features (CDFs) from original feature set, and then generalizes EDFs into higher-level features based on their coupling degrees with CDFs in unlabeled data. The advantage is: EDFs with extreme sparsity in labeled data can be enriched by their co-occurrences with CDFs in unlabeled data so that the performance of these low-frequency features can be greatly boosted and new information from unlabeled can be incorporated. We apply this approach to three tasks in biomedical literature mining: gene named entity recognition (NER), protein-protein interaction extraction (PPIE), and text classification (TC) for gene ontology (GO) annotation. New features are generated from over 20 GB unlabeled PubMed abstracts. The experimental results on BioCreative 2, AIMED corpus, and TREC 2005 Genomics Track show that 1) FCG can utilize well the sparse features ignored by supervised learning. 2) It improves the performance of supervised baselines by 7.8 percent, 5.0 percent, and 5.8 percent, respectively, in the tree tasks. 3) Our methods achieve 89.1, 64.5 F-score, and 60.1 normalized utility on the three benchmark data sets.
30 CFR 18.22 - Boring-type machines equipped for auxiliary face ventilation.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 30 Mineral Resources 1 2012-07-01 2012-07-01 false Boring-type machines equipped for auxiliary... AND ACCESSORIES Construction and Design Requirements § 18.22 Boring-type machines equipped for auxiliary face ventilation. Each boring-type continuous-mining machine that is submitted for approval shall...
30 CFR 18.22 - Boring-type machines equipped for auxiliary face ventilation.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 30 Mineral Resources 1 2013-07-01 2013-07-01 false Boring-type machines equipped for auxiliary... AND ACCESSORIES Construction and Design Requirements § 18.22 Boring-type machines equipped for auxiliary face ventilation. Each boring-type continuous-mining machine that is submitted for approval shall...
30 CFR 18.22 - Boring-type machines equipped for auxiliary face ventilation.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 30 Mineral Resources 1 2014-07-01 2014-07-01 false Boring-type machines equipped for auxiliary... AND ACCESSORIES Construction and Design Requirements § 18.22 Boring-type machines equipped for auxiliary face ventilation. Each boring-type continuous-mining machine that is submitted for approval shall...
30 CFR 18.22 - Boring-type machines equipped for auxiliary face ventilation.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Boring-type machines equipped for auxiliary... AND ACCESSORIES Construction and Design Requirements § 18.22 Boring-type machines equipped for auxiliary face ventilation. Each boring-type continuous-mining machine that is submitted for approval shall...
30 CFR 18.22 - Boring-type machines equipped for auxiliary face ventilation.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Boring-type machines equipped for auxiliary... AND ACCESSORIES Construction and Design Requirements § 18.22 Boring-type machines equipped for auxiliary face ventilation. Each boring-type continuous-mining machine that is submitted for approval shall...
NASA Astrophysics Data System (ADS)
Broido, V. L.; Krasnoshtanov, S. U.
2018-03-01
The problems of a choice of rational technoloqy and materials for restoring crucial parts and large-sized welded constructions of dredges and other mining machines with use of methods of welding and surfasing are considered. Welding and surfacing occupy a significant share in the overall labor intensity of performing repair work at mining enterprises. Both manual arc welding and surfacing as well as mechanized methods are used, which ensure a 24-fold increase in productivity. The work shows examples of using the technology of restoring parts and structures at gold mining enterprises in Irkutsk region. Some marks of welding and surfasing materials are shown, which production is mastered by Irkutsk Heavy Engineering Plant (IZTM)
Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry
2018-06-25
The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Information Theory, Inference and Learning Algorithms
NASA Astrophysics Data System (ADS)
Mackay, David J. C.
2003-10-01
Information theory and inference, often taught separately, are here united in one entertaining textbook. These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography. This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction. A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks. The final part of the book describes the state of the art in error-correcting codes, including low-density parity-check codes, turbo codes, and digital fountain codes -- the twenty-first century standards for satellite communications, disk drives, and data broadcast. Richly illustrated, filled with worked examples and over 400 exercises, some with detailed solutions, David MacKay's groundbreaking book is ideal for self-learning and for undergraduate or graduate courses. Interludes on crosswords, evolution, and sex provide entertainment along the way. In sum, this is a textbook on information, communication, and coding for a new generation of students, and an unparalleled entry point into these subjects for professionals in areas as diverse as computational biology, financial engineering, and machine learning.
Finding Waldo: Learning about Users from their Interactions.
Brown, Eli T; Ottley, Alvitta; Zhao, Helen; Quan Lin; Souvenir, Richard; Endert, Alex; Chang, Remco
2014-12-01
Visual analytics is inherently a collaboration between human and computer. However, in current visual analytics systems, the computer has limited means of knowing about its users and their analysis processes. While existing research has shown that a user's interactions with a system reflect a large amount of the user's reasoning process, there has been limited advancement in developing automated, real-time techniques that mine interactions to learn about the user. In this paper, we demonstrate that we can accurately predict a user's task performance and infer some user personality traits by using machine learning techniques to analyze interaction data. Specifically, we conduct an experiment in which participants perform a visual search task, and apply well-known machine learning algorithms to three encodings of the users' interaction data. We achieve, depending on algorithm and encoding, between 62% and 83% accuracy at predicting whether each user will be fast or slow at completing the task. Beyond predicting performance, we demonstrate that using the same techniques, we can infer aspects of the user's personality factors, including locus of control, extraversion, and neuroticism. Further analyses show that strong results can be attained with limited observation time: in one case 95% of the final accuracy is gained after a quarter of the average task completion time. Overall, our findings show that interactions can provide information to the computer about its human collaborator, and establish a foundation for realizing mixed-initiative visual analytics systems.
NASA Astrophysics Data System (ADS)
Hengl, Tomislav
2016-04-01
Preliminary results of predicting distribution of soil organic soils (Histosols) and soil organic carbon stock (in tonnes per ha) using global compilations of soil profiles (about 150,000 points) and covariates at 250 m spatial resolution (about 150 covariates; mainly MODIS seasonal land products, SRTM DEM derivatives, climatic images, lithological and land cover and landform maps) are presented. We focus on using a data-driven approach i.e. Machine Learning techniques that often require no knowledge about the distribution of the target variable or knowledge about the possible relationships. Other advantages of using machine learning are (DOI: 10.1371/journal.pone.0125814): All rules required to produce outputs are formalized. The whole procedure is documented (the statistical model and associated computer script), enabling reproducible research. Predicted surfaces can make use of various information sources and can be optimized relative to all available quantitative point and covariate data. There is more flexibility in terms of the spatial extent, resolution and support of requested maps. Automated mapping is also more cost-effective: once the system is operational, maintenance and production of updates are an order of magnitude faster and cheaper. Consequently, prediction maps can be updated and improved at shorter and shorter time intervals. Some disadvantages of automated soil mapping based on Machine Learning are: Models are data-driven and any serious blunders or artifacts in the input data can propagate to order-of-magnitude larger errors than in the case of expert-based systems. Fitting machine learning models is at the order of magnitude computationally more demanding. Computing effort can be even tens of thousands higher than if e.g. linear geostatistics is used. Many machine learning models are fairly complex often abstract and any interpretation of such models is not trivial and require special multidimensional / multivariable plotting and data mining tools. Results of model fitting using the R packages nnet, randomForest and the h2o software (machine learning functions) show that significant models can be fitted for soil classes, bulk density (R-square 0.76), soil organic carbon (R-square 0.62) and coarse fragments (R-square 0.59). Consequently, we were able to estimate soil organic carbon stock for majority of the land mask (excluding permanent ice) and detect patches of landscape containing mainly organic soils (peat and similar). Our results confirm that hotspots of soil organic carbon in Tropics are peatlands in Indonesia, north of Peru, west Amazon and Congo river basin. Majority of world soil organic carbon stock is likely in the Northern latitudes (tundra and taiga of the north). Distribution of histosols seems to be mainly controlled by climatic conditions (especially temperature regime and water vapor) and hydrologic position in the landscape. Predicted distributions of organic soils (probability of occurrence) and total soil organic carbon stock at resolutions of 1 km and 250 m are available via the SoilGrids.org project homepage.
Text Classification for Organizational Researchers
Kobayashi, Vladimer B.; Mol, Stefan T.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. PMID:29881249
Promoter Sequences Prediction Using Relational Association Rule Mining
Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely
2012-01-01
In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233
Process-based upscaling of surface-atmosphere exchange
NASA Astrophysics Data System (ADS)
Keenan, T. F.; Prentice, I. C.; Canadell, J.; Williams, C. A.; Wang, H.; Raupach, M. R.; Collatz, G. J.; Davis, T.; Stocker, B.; Evans, B. J.
2015-12-01
Empirical upscaling techniques such as machine learning and data-mining have proven invaluable tools for the global scaling of disparate observations of surface-atmosphere exchange, but are not based on a theoretical understanding of the key processes involved. This makes spatial and temporal extrapolation outside of the training domain difficult at best. There is therefore a clear need for the incorporation of knowledge of ecosystem function, in combination with the strength of data mining. Here, we present such an approach. We describe a novel diagnostic process-based model of global photosynthesis and ecosystem respiration, which is directly informed by a variety of global datasets relevant to ecosystem state and function. We use the model framework to estimate global carbon cycling both spatially and temporally, with a specific focus on the mechanisms responsible for long-term change. Our results show the importance of incorporating process knowledge into upscaling approaches, and highlight the effect of key processes on the terrestrial carbon cycle.
Building a protein name dictionary from full text: a machine learning term extraction approach.
Shi, Lei; Campagne, Fabien
2005-04-07
The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.
Building a protein name dictionary from full text: a machine learning term extraction approach
Shi, Lei; Campagne, Fabien
2005-01-01
Background The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. Results We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. Conclusion This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt. PMID:15817129
Code of Federal Regulations, 2010 CFR
2010-07-01
... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Grounding offtrack direct-current machines and...-UNDERGROUND COAL MINES Grounding § 75.703 Grounding offtrack direct-current machines and the enclosures of related detached components. [Statutory Provisions] The frames of all offtrack direct-current machines and...
Code of Federal Regulations, 2011 CFR
2011-07-01
... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Grounding offtrack direct-current machines and...-UNDERGROUND COAL MINES Grounding § 75.703 Grounding offtrack direct-current machines and the enclosures of related detached components. [Statutory Provisions] The frames of all offtrack direct-current machines and...
Lötsch, Jörn; Kringel, Dario
2018-06-01
The novel research area of functional genomics investigates biochemical, cellular, or physiological properties of gene products with the goal of understanding the relationship between the genome and the phenotype. These developments have made analgesic drug research a data-rich discipline mastered only by making use of parallel developments in computer science, including the establishment of knowledge bases, mining methods for big data, machine-learning, and artificial intelligence, (Table ) which will be exemplarily introduced in the following. © 2018 The Authors Clinical Pharmacology & Therapeutics published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Bioinformatics in proteomics: application, terminology, and pitfalls.
Wiemer, Jan C; Prokudin, Alexander
2004-01-01
Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
[A Maternal Health Care System Based on Mobile Health Care].
Du, Xin; Zeng, Weijie; Li, Chengwei; Xue, Junwei; Wu, Xiuyong; Liu, Yinjia; Wan, Yuxin; Zhang, Yiru; Ji, Yurong; Wu, Lei; Yang, Yongzhe; Zhang, Yue; Zhu, Bin; Huang, Yueshan; Wu, Kai
2016-02-01
Wearable devices are used in the new design of the maternal health care system to detect electrocardiogram and oxygen saturation signal while smart terminals are used to achieve assessments and input maternal clinical information. All the results combined with biochemical analysis from hospital are uploaded to cloud server by mobile Internet. Machine learning algorithms are used for data mining of all information of subjects. This system can achieve the assessment and care of maternal physical health as well as mental health. Moreover, the system can send the results and health guidance to smart terminals.
An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines
Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John
2015-01-01
The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints. PMID:26062092
A Review of Extra-Terrestrial Mining Robot Concepts
NASA Technical Reports Server (NTRS)
Mueller, Robert P.; Van Susante, Paul J.
2011-01-01
Outer space contains a vast amount of resources that offer virtually unlimited wealth to the humans that can access and use them for commercial purposes. One of the key technologies for harvesting these resources is robotic mining of regolith, minerals, ices and metals. The harsh environment and vast distances create challenges that are handled best by robotic machines working in collaboration with human explorers. Humans will benefit from the resources that will be mined by robots. They will visit outposts and mining camps as required for exploration, commerce and scientific research, but a continuous presence is most likely to be provided by robotic mining machines that are remotely controlled by humans. There have been a variety of extra-terrestrial robotic mining concepts proposed over the last 100 years and this paper will attempt to summarize and review concepts in the public domain (government, industry and academia) to serve as an informational resource for future mining robot developers and operators. The challenges associated with these concepts will be discussed and feasibility will be assessed. Future needs associated with commercial efforts will also be investigated.
An Approach to Realizing Process Control for Underground Mining Operations of Mobile Machines.
Song, Zhen; Schunnesson, Håkan; Rinne, Mikael; Sturgul, John
2015-01-01
The excavation and production in underground mines are complicated processes which consist of many different operations. The process of underground mining is considerably constrained by the geometry and geology of the mine. The various mining operations are normally performed in series at each working face. The delay of a single operation will lead to a domino effect, thus delay the starting time for the next process and the completion time of the entire process. This paper presents a new approach to the process control for underground mining operations, e.g. drilling, bolting, mucking. This approach can estimate the working time and its probability for each operation more efficiently and objectively by improving the existing PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method). If the delay of the critical operation (which is on a critical path) inevitably affects the productivity of mined ore, the approach can rapidly assign mucking machines new jobs to increase this amount at a maximum level by using a new mucking algorithm under external constraints.
A Review of Extra-Terrestrial Mining Concepts
NASA Technical Reports Server (NTRS)
Mueller, R. P.; van Susante, P. J.
2012-01-01
Outer space contains a vast amount of resources that offer virtually unlimited wealth to the humans that can access and use them for commercial purposes. One of the key technologies for harvesting these resources is robotic mining of regolith, minerals, ices and metals. The harsh environment and vast distances create challenges that are handled best by robotic machines working in collaboration with human explorers. Humans will benefit from the resources that will be mined by robots. They will visit outposts and mining camps as required for exploration, commerce and scientific research, but a continuous presence is most likely to be provided by robotic mining machines that are remotely controlled by humans. There have been a variety of extra-terrestrial robotic mining concepts proposed over the last 40 years and this paper will attempt to summarize and review concepts in the public domain (government, industry and academia) to serve as an informational resource for future mining robot developers and operators. The challenges associated with these concepts will be discussed and feasibility will be assessed. Future needs associated with commercial efforts will also be investigated.
Code of Federal Regulations, 2011 CFR
2011-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2013 CFR
2013-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2010 CFR
2010-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2014 CFR
2014-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2012 CFR
2012-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Inductive System Health Monitoring
NASA Technical Reports Server (NTRS)
Iverson, David L.
2004-01-01
The Inductive Monitoring System (IMS) software was developed to provide a technique to automatically produce health monitoring knowledge bases for systems that are either difficult to model (simulate) with a computer or which require computer models that are too complex to use for real time monitoring. IMS uses nominal data sets collected either directly from the system or from simulations to build a knowledge base that can be used to detect anomalous behavior in the system. Machine learning and data mining techniques are used to characterize typical system behavior by extracting general classes of nominal data from archived data sets. IMS is able to monitor the system by comparing real time operational data with these classes. We present a description of learning and monitoring method used by IMS and summarize some recent IMS results.
Cutter-loader apparatus having overhung shearer drum
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groger, H.; Harms, E.E.
1984-05-01
A longwall mining machine includes a drum cutter-loader and face conveyor wherein the drum cutter-loader is overhung and is supported by a support arm adjacent to the mine face. Nozzles direct high pressure liquid jets against the forward edge of the support arm to cut away the mining face and permit the face side support arm to advance as the mining machine advances. In one embodiment the nozzles are provided along an inclined cutting edge at the forward end of the support arm. Such nozzles may be fixed or oscillating. In an alternative embodiment the nozzles are provided in themore » cylindrical edge zone of the shearer drum and direct the high pressure fluid jets against the cutter edge at the forward end of the support arm.« less
Oztekin, Asil; Delen, Dursun; Kong, Zhenyu James
2009-12-01
Predicting the survival of heart-lung transplant patients has the potential to play a critical role in understanding and improving the matching procedure between the recipient and graft. Although voluminous data related to the transplantation procedures is being collected and stored, only a small subset of the predictive factors has been used in modeling heart-lung transplantation outcomes. The previous studies have mainly focused on applying statistical techniques to a small set of factors selected by the domain-experts in order to reveal the simple linear relationships between the factors and survival. The collection of methods known as 'data mining' offers significant advantages over conventional statistical techniques in dealing with the latter's limitations such as normality assumption of observations, independence of observations from each other, and linearity of the relationship between the observations and the output measure(s). There are statistical methods that overcome these limitations. Yet, they are computationally more expensive and do not provide fast and flexible solutions as do data mining techniques in large datasets. The main objective of this study is to improve the prediction of outcomes following combined heart-lung transplantation by proposing an integrated data-mining methodology. A large and feature-rich dataset (16,604 cases with 283 variables) is used to (1) develop machine learning based predictive models and (2) extract the most important predictive factors. Then, using three different variable selection methods, namely, (i) machine learning methods driven variables-using decision trees, neural networks, logistic regression, (ii) the literature review-based expert-defined variables, and (iii) common sense-based interaction variables, a consolidated set of factors is generated and used to develop Cox regression models for heart-lung graft survival. The predictive models' performance in terms of 10-fold cross-validation accuracy rates for two multi-imputed datasets ranged from 79% to 86% for neural networks, from 78% to 86% for logistic regression, and from 71% to 79% for decision trees. The results indicate that the proposed integrated data mining methodology using Cox hazard models better predicted the graft survival with different variables than the conventional approaches commonly used in the literature. This result is validated by the comparison of the corresponding Gains charts for our proposed methodology and the literature review based Cox results, and by the comparison of Akaike information criteria (AIC) values received from each. Data mining-based methodology proposed in this study reveals that there are undiscovered relationships (i.e. interactions of the existing variables) among the survival-related variables, which helps better predict the survival of the heart-lung transplants. It also brings a different set of variables into the scene to be evaluated by the domain-experts and be considered prior to the organ transplantation.
Knowledge-Based Reinforcement Learning for Data Mining
NASA Astrophysics Data System (ADS)
Kudenko, Daniel; Grzes, Marek
Data Mining is the process of extracting patterns from data. Two general avenues of research in the intersecting areas of agents and data mining can be distinguished. The first approach is concerned with mining an agent’s observation data in order to extract patterns, categorize environment states, and/or make predictions of future states. In this setting, data is normally available as a batch, and the agent’s actions and goals are often independent of the data mining task. The data collection is mainly considered as a side effect of the agent’s activities. Machine learning techniques applied in such situations fall into the class of supervised learning. In contrast, the second scenario occurs where an agent is actively performing the data mining, and is responsible for the data collection itself. For example, a mobile network agent is acquiring and processing data (where the acquisition may incur a certain cost), or a mobile sensor agent is moving in a (perhaps hostile) environment, collecting and processing sensor readings. In these settings, the tasks of the agent and the data mining are highly intertwined and interdependent (or even identical). Supervised learning is not a suitable technique for these cases. Reinforcement Learning (RL) enables an agent to learn from experience (in form of reward and punishment for explorative actions) and adapt to new situations, without a teacher. RL is an ideal learning technique for these data mining scenarios, because it fits the agent paradigm of continuous sensing and acting, and the RL agent is able to learn to make decisions on the sampling of the environment which provides the data. Nevertheless, RL still suffers from scalability problems, which have prevented its successful use in many complex real-world domains. The more complex the tasks, the longer it takes a reinforcement learning algorithm to converge to a good solution. For many real-world tasks, human expert knowledge is available. For example, human experts have developed heuristics that help them in planning and scheduling resources in their work place. However, this domain knowledge is often rough and incomplete. When the domain knowledge is used directly by an automated expert system, the solutions are often sub-optimal, due to the incompleteness of the knowledge, the uncertainty of environments, and the possibility to encounter unexpected situations. RL, on the other hand, can overcome the weaknesses of the heuristic domain knowledge and produce optimal solutions. In the talk we propose two techniques, which represent first steps in the area of knowledge-based RL (KBRL). The first technique [1] uses high-level STRIPS operator knowledge in reward shaping to focus the search for the optimal policy. Empirical results show that the plan-based reward shaping approach outperforms other RL techniques, including alternative manual and MDP-based reward shaping when it is used in its basic form. We showed that MDP-based reward shaping may fail and successful experiments with STRIPS-based shaping suggest modifications which can overcome encountered problems. The STRIPSbased method we propose allows expressing the same domain knowledge in a different way and the domain expert can choose whether to define an MDP or STRIPS planning task. We also evaluated the robustness of the proposed STRIPS-based technique to errors in the plan knowledge. In case that STRIPS knowledge is not available, we propose a second technique [2] that shapes the reward with hierarchical tile coding. Where the Q-function is represented with low-level tile coding, a V-function with coarser tile coding can be learned in parallel and used to approximate the potential for ground states. In the context of data mining, our KBRL approaches can also be used for any data collection task where the acquisition of data may incur considerable cost. In addition, observing the data collection agent in specific scenarios may lead to new insights into optimal data collection behaviour in the respective domains. In future work, we intend to demonstrate and evaluate our techniques on concrete real-world data mining applications.
Scalable Regression Tree Learning on Hadoop using OpenPlanet
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor
As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less
Singhal, Ayush; Simmons, Michael; Lu, Zhiyong
2016-11-01
The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.
Information extraction from multi-institutional radiology reports.
Hassanpour, Saeed; Langlotz, Curtis P
2016-01-01
The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.
Influence of continuous mining arrangements on respirable dust exposures
Beck, T. W.; Organiscak, J. A.; Pollock, D. E.; Potts, J. D.; Reed, W. R.
2017-01-01
In underground continuous mining operations, ventilation, water sprays and machine-mounted flooded-bed scrubbers are the primary means of controlling respirable dust exposures at the working face. Changes in mining arrangements — such as face ventilation configuration, orientation of crosscuts mined in relation to the section ventilation and equipment operator positioning — can have impacts on the ability of dust controls to reduce occupational respirable dust exposures. This study reports and analyzes dust concentrations measured by the Pittsburgh Mining Research Division for remote-controlled continuous mining machine operators as well as haulage operators at 10 U.S. underground mines. The results of these respirable dust surveys show that continuous miner exposures varied little with depth of cut but are significantly higher with exhaust ventilation. Haulage operators experienced elevated concentrations with blowing face ventilation. Elevated dust concentrations were observed for both continuous miner operators and haulage operators when working in crosscuts driven into or counter to the section airflow. Individual cuts are highlighted to demonstrate instances of minimal and excessive dust exposures attributable to particular mining configurations. These findings form the basis for recommendations for lowering face worker respirable dust exposures. PMID:28529441
Wagland, Richard; Recio-Saucedo, Alejandra; Simon, Michael; Bracher, Michael; Hunt, Katherine; Foster, Claire; Downing, Amy; Glaser, Adam; Corner, Jessica
2016-08-01
Quality of cancer care may greatly impact on patients' health-related quality of life (HRQoL). Free-text responses to patient-reported outcome measures (PROMs) provide rich data but analysis is time and resource-intensive. This study developed and tested a learning-based text-mining approach to facilitate analysis of patients' experiences of care and develop an explanatory model illustrating impact on HRQoL. Respondents to a population-based survey of colorectal cancer survivors provided free-text comments regarding their experience of living with and beyond cancer. An existing coding framework was tested and adapted, which informed learning-based text mining of the data. Machine-learning algorithms were trained to identify comments relating to patients' specific experiences of service quality, which were verified by manual qualitative analysis. Comparisons between coded retrieved comments and a HRQoL measure (EQ5D) were explored. The survey response rate was 63.3% (21 802/34 467), of which 25.8% (n=5634) participants provided free-text comments. Of retrieved comments on experiences of care (n=1688), over half (n=1045, 62%) described positive care experiences. Most negative experiences concerned a lack of post-treatment care (n=191, 11% of retrieved comments) and insufficient information concerning self-management strategies (n=135, 8%) or treatment side effects (n=160, 9%). Associations existed between HRQoL scores and coded algorithm-retrieved comments. Analysis indicated that the mechanism by which service quality impacted on HRQoL was the extent to which services prevented or alleviated challenges associated with disease and treatment burdens. Learning-based text mining techniques were found useful and practical tools to identify specific free-text comments within a large dataset, facilitating resource-efficient qualitative analysis. This method should be considered for future PROM analysis to inform policy and practice. Study findings indicated that perceived care quality directly impacts on HRQoL. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Cañada, Andres; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso
2017-01-01
Abstract A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes—CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es PMID:28531339
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
A planetary nervous system for social mining and collective awareness
NASA Astrophysics Data System (ADS)
Giannotti, F.; Pedreschi, D.; Pentland, A.; Lukowicz, P.; Kossmann, D.; Crowley, J.; Helbing, D.
2012-11-01
We present a research roadmap of a Planetary Nervous System (PNS), capable of sensing and mining the digital breadcrumbs of human activities and unveiling the knowledge hidden in the big data for addressing the big questions about social complexity. We envision the PNS as a globally distributed, self-organizing, techno-social system for answering analytical questions about the status of world-wide society, based on three pillars: social sensing, social mining and the idea of trust networks and privacy-aware social mining. We discuss the ingredients of a science and a technology necessary to build the PNS upon the three mentioned pillars, beyond the limitations of their respective state-of-art. Social sensing is aimed at developing better methods for harvesting the big data from the techno-social ecosystem and make them available for mining, learning and analysis at a properly high abstraction level. Social mining is the problem of discovering patterns and models of human behaviour from the sensed data across the various social dimensions by data mining, machine learning and social network analysis. Trusted networks and privacy-aware social mining is aimed at creating a new deal around the questions of privacy and data ownership empowering individual persons with full awareness and control on own personal data, so that users may allow access and use of their data for their own good and the common good. The PNS will provide a goal-oriented knowledge discovery framework, made of technology and people, able to configure itself to the aim of answering questions about the pulse of global society. Given an analytical request, the PNS activates a process composed by a variety of interconnected tasks exploiting the social sensing and mining methods within the transparent ecosystem provided by the trusted network. The PNS we foresee is the key tool for individual and collective awareness for the knowledge society. We need such a tool for everyone to become fully aware of how powerful is the knowledge of our society we can achieve by leveraging our wisdom as a crowd, and how important is that everybody participates both as a consumer and as a producer of the social knowledge, for it to become a trustable, accessible, safe and useful public good.
Toward Intelligent Machine Learning Algorithms
1988-05-01
Machine learning is recognized as a tool for improving the performance of many kinds of systems, yet most machine learning systems themselves are not...directed systems, and with the addition of a knowledge store for organizing and maintaining knowledge to assist learning, a learning machine learning (L...ML) algorithm is possible. The necessary components of L-ML systems are presented along with several case descriptions of existing machine learning systems
30 CFR 18.8 - Date for conducting investigation and tests.
Code of Federal Regulations, 2010 CFR
2010-07-01
..., EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES General... determine the order of precedence for investigation and testing. If an electrical machine component or...
NASA Astrophysics Data System (ADS)
Klump, J. F.; Huber, R.; Robertson, J.; Cox, S. J. D.; Woodcock, R.
2014-12-01
Despite the recent explosion of quantitative geological data, geology remains a fundamentally qualitative science. Numerical data only constitute a certain part of data collection in the geosciences. In many cases, geological observations are compiled as text into reports and annotations on drill cores, thin sections or drawings of outcrops. The observations are classified into concepts such as lithology, stratigraphy, geological structure, etc. These descriptions are semantically rich and are generally supported by more quantitative observations using geochemical analyses, XRD, hyperspectral scanning, etc, but the goal is geological semantics. In practice it has been difficult to bring the different observations together due to differing perception or granularity of classification in human observation, or the partial observation of only some characteristics using quantitative sensors. In the past years many geological classification schemas have been transferred into ontologies and vocabularies, formalized using RDF and OWL, and published through SPARQL endpoints. Several lithological ontologies were compiled by stratigraphy.net and published through a SPARQL endpoint. This work is complemented by the development of a Python API to integrate this vocabulary into Python-based text mining applications. The applications for the lithological vocabulary and Python API are automated semantic tagging of geochemical data and descriptions of drill cores, machine learning of geochemical compositions that are diagnostic for lithological classifications, and text mining for lithological concepts in reports and geological literature. This combination of applications can be used to identify anomalies in databases, where composition and lithological classification do not match. It can also be used to identify lithological concepts in the literature and infer quantitative values. The resulting semantic tagging opens new possibilities for linking these diverse sources of data.
Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure.
P Tafti, Ahmad; Badger, Jonathan; LaRose, Eric; Shirzadi, Ehsan; Mahnke, Andrea; Mayer, John; Ye, Zhan; Page, David; Peissig, Peggy
2017-12-08
The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis. ©Ahmad P Tafti, Jonathan Badger, Eric LaRose, Ehsan Shirzadi, Andrea Mahnke, John Mayer, Zhan Ye, David Page, Peggy Peissig. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 08.12.2017.
OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.
Naderi, Nona; Kappler, Thomas; Baker, Christopher J O; Witte, René
2011-10-01
Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. witte@semanticsoftware.info.
30 CFR 18.10 - Notice of approval or disapproval.
Code of Federal Regulations, 2010 CFR
2010-07-01
..., EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES General... assembly of an electrical machine or accessory, MSHA will issue to the applicant either a written notice of...
Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz
2017-04-01
Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Literature classification for semi-automated updating of biological knowledgebases
2013-01-01
Background As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. Results We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases. PMID:24564403
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML)
Lechevalier, D.; Ak, R.; Ferguson, M.; Law, K. H.; Lee, Y.-T. T.; Rachuri, S.
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain. PMID:29202125
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).
Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.
Using Machine Learning to Advance Personality Assessment and Theory.
Bleidorn, Wiebke; Hopwood, Christopher James
2018-05-01
Machine learning has led to important advances in society. One of the most exciting applications of machine learning in psychological science has been the development of assessment tools that can powerfully predict human behavior and personality traits. Thus far, machine learning approaches to personality assessment have focused on the associations between social media and other digital records with established personality measures. The goal of this article is to expand the potential of machine learning approaches to personality assessment by embedding it in a more comprehensive construct validation framework. We review recent applications of machine learning to personality assessment, place machine learning research in the broader context of fundamental principles of construct validation, and provide recommendations for how to use machine learning to advance our understanding of personality.
Multi-objects recognition for distributed intelligent sensor networks
NASA Astrophysics Data System (ADS)
He, Haibo; Chen, Sheng; Cao, Yuan; Desai, Sachi; Hohil, Myron E.
2008-04-01
This paper proposes an innovative approach for multi-objects recognition for homeland security and defense based intelligent sensor networks. Unlike the conventional way of information analysis, data mining in such networks is typically characterized with high information ambiguity/uncertainty, data redundancy, high dimensionality and real-time constrains. Furthermore, since a typical military based network normally includes multiple mobile sensor platforms, ground forces, fortified tanks, combat flights, and other resources, it is critical to develop intelligent data mining approaches to fuse different information resources to understand dynamic environments, to support decision making processes, and finally to achieve the goals. This paper aims to address these issues with a focus on multi-objects recognition. Instead of classifying a single object as in the traditional image classification problems, the proposed method can automatically learn multiple objectives simultaneously. Image segmentation techniques are used to identify the interesting regions in the field, which correspond to multiple objects such as soldiers or tanks. Since different objects will come with different feature sizes, we propose a feature scaling method to represent each object in the same number of dimensions. This is achieved by linear/nonlinear scaling and sampling techniques. Finally, support vector machine (SVM) based learning algorithms are developed to learn and build the associations for different objects, and such knowledge will be adaptively accumulated for objects recognition in the testing stage. We test the effectiveness of proposed method in different simulated military environments.
30 CFR 18.96 - Preparation of machines for inspection; requirements.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 30 Mineral Resources 1 2012-07-01 2012-07-01 false Preparation of machines for inspection... Field Approval of Electrically Operated Mining Equipment § 18.96 Preparation of machines for inspection; requirements. (a) Upon receipt of written notice from the Health and Safety District Manager of the time and...
30 CFR 18.96 - Preparation of machines for inspection; requirements.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 30 Mineral Resources 1 2014-07-01 2014-07-01 false Preparation of machines for inspection... Field Approval of Electrically Operated Mining Equipment § 18.96 Preparation of machines for inspection; requirements. (a) Upon receipt of written notice from the Health and Safety District Manager of the time and...
30 CFR 18.96 - Preparation of machines for inspection; requirements.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 30 Mineral Resources 1 2013-07-01 2013-07-01 false Preparation of machines for inspection... Field Approval of Electrically Operated Mining Equipment § 18.96 Preparation of machines for inspection; requirements. (a) Upon receipt of written notice from the Health and Safety District Manager of the time and...
Finding Waldo: Learning about Users from their Interactions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Eli T.; Ottley, Alvitta; Zhao, Helen
Visual analytics is inherently a collaboration between human and computer. However, in current visual analytics systems, the computer has limited means of knowing about its users and their analysis processes. While existing research has shown that a user’s interactions with a system reflect a large amount of the user’s reasoning process, there has been limited advancement in developing automated, real-time techniques that mine interactions to learn about the user. In this paper, we demonstrate that we can accurately predict a user’s task performance and infer some user personality traits by using machine learning techniques to analyze interaction data. Specifically, wemore » conduct an experiment in which participants perform a visual search task and we apply well-known machine learning algorithms to three encodings of the users interaction data. We achieve, depending on algorithm and encoding, between 62% and 96% accuracy at predicting whether each user will be fast or slow at completing the task. Beyond predicting performance, we demonstrate that using the same techniques, we can infer aspects of the user’s personality factors, including locus of control, extraversion, and neuroticism. Further analyses show that strong results can be attained with limited observation time, in some cases, 82% of the final accuracy is gained after a quarter of the average task completion time. Overall, our findings show that interactions can provide information to the computer about its human collaborator, and establish a foundation for realizing mixed- initiative visual analytics systems.« less
30 CFR 18.99 - Notice of approval or disapproval; letters of approval and approval plates.
Code of Federal Regulations, 2010 CFR
2010-07-01
... approval or disapproval of the machine. (a) If the qualified electrical representative recommends field..., DEPARTMENT OF LABOR TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Field Approval of Electrically Operated Mining Equipment § 18.99 Notice of approval or...
30 CFR 18.99 - Notice of approval or disapproval; letters of approval and approval plates.
Code of Federal Regulations, 2011 CFR
2011-07-01
... approval or disapproval of the machine. (a) If the qualified electrical representative recommends field..., DEPARTMENT OF LABOR TESTING, EVALUATION, AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Field Approval of Electrically Operated Mining Equipment § 18.99 Notice of approval or...
2. GENERAL VIEW LOOKING NORTHEAST, SHOWING COKE MACHINE (CENTER), INTERMEDIATE ...
2. GENERAL VIEW LOOKING NORTHEAST, SHOWING COKE MACHINE (CENTER), INTERMEDIATE TIPPLE (RIGHT), AND OVENS - Shoaf Mine & Coke Works, East side of Shoaf, off Township Route 472, Shoaf, Fayette County, PA
Shrivastava, Vimal K; Londhe, Narendra D; Sonawane, Rajendra S; Suri, Jasjit S
2015-10-01
A large percentage of dermatologist׳s decision in psoriasis disease assessment is based on color. The current computer-aided diagnosis systems for psoriasis risk stratification and classification lack the vigor of color paradigm. The paper presents an automated psoriasis computer-aided diagnosis (pCAD) system for classification of psoriasis skin images into psoriatic lesion and healthy skin, which solves the two major challenges: (i) fulfills the color feature requirements and (ii) selects the powerful dominant color features while retaining high classification accuracy. Fourteen color spaces are discovered for psoriasis disease analysis leading to 86 color features. The pCAD system is implemented in a support vector-based machine learning framework where the offline image data set is used for computing machine learning offline color machine learning parameters. These are then used for transformation of the online color features to predict the class labels for healthy vs. diseased cases. The above paradigm uses principal component analysis for color feature selection of dominant features, keeping the original color feature unaltered. Using the cross-validation protocol, the above machine learning protocol is compared against the standalone grayscale features with 60 features and against the combined grayscale and color feature set of 146. Using a fixed data size of 540 images with equal number of healthy and diseased, 10 fold cross-validation protocol, and SVM of polynomial kernel of type two, pCAD system shows an accuracy of 99.94% with sensitivity and specificity of 99.93% and 99.96%. Using a varying data size protocol, the mean classification accuracies for color, grayscale, and combined scenarios are: 92.85%, 93.83% and 93.99%, respectively. The reliability of the system in these three scenarios are: 94.42%, 97.39% and 96.00%, respectively. We conclude that pCAD system using color space alone is compatible to grayscale space or combined color and grayscale spaces. We validated our pCAD system against facial color databases and the results are consistent in accuracy and reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Park, Seong-Cheol; Chung, Chun Kee
2018-06-01
The objective of this study was to introduce a new machine learning guided by outcome of resective epilepsy surgery defined as the presence/absence of seizures to improve data mining for interictal pathological activities in neocortical epilepsy. Electrocorticographies for 39 patients with medically intractable neocortical epilepsy were analyzed. We separately analyzed 38 frequencies from 0.9 to 800 Hz including both high-frequency activities and low-frequency activities to select bands related to seizure outcome. An automatic detector using amplitude-duration-number thresholds was used. Interictal electrocorticography data sets of 8 min for each patient were selected. In the first training data set of 20 patients, the automatic detector was optimized to best differentiate the seizure-free group from not-seizure-free-group based on ranks of resection percentages of activities detected using a genetic algorithm. The optimization was validated in a different data set of 19 patients. There were 16 (41%) seizure-free patients. The mean follow-up duration was 21 ± 11 mo (range, 13-44 mo). After validation, frequencies significantly related to seizure outcome were 5.8, 8.4-25, 30, 36, 52, and 75 among low-frequency activities and 108 and 800 Hz among high-frequency activities. Resection for 5.8, 8.4-25, 108, and 800 Hz activities consistently improved seizure outcome. Resection effects of 17-36, 52, and 75 Hz activities on seizure outcome were variable according to thresholds. We developed and validated an automated detector for monitoring interictal pathological and inhibitory/physiological activities in neocortical epilepsy using a data-driven approach through outcome-guided machine learning. NEW & NOTEWORTHY Outcome-guided machine learning based on seizure outcome was used to improve detections for interictal electrocorticographic low- and high-frequency activities. This method resulted in better separation of seizure outcome groups than others reported in the literature. The automatic detector can be trained without human intervention and no prior information. It is based only on objective seizure outcome data without relying on an expert's manual annotations. Using the method, we could find and characterize pathological and inhibitory activities.
Anomaly detection in reconstructed quantum states using a machine-learning technique
NASA Astrophysics Data System (ADS)
Hara, Satoshi; Ono, Takafumi; Okamoto, Ryo; Washio, Takashi; Takeuchi, Shigeki
2014-02-01
The accurate detection of small deviations in given density matrices is important for quantum information processing. Here we propose a method based on the concept of data mining. We demonstrate that the proposed method can more accurately detect small erroneous deviations in reconstructed density matrices, which contain intrinsic fluctuations due to the limited number of samples, than a naive method of checking the trace distance from the average of the given density matrices. This method has the potential to be a key tool in broad areas of physics where the detection of small deviations of quantum states reconstructed using a limited number of samples is essential.
Aspect level sentiment analysis using machine learning
NASA Astrophysics Data System (ADS)
Shubham, D.; Mithil, P.; Shobharani, Meesala; Sumathy, S.
2017-11-01
In modern world the development of web and smartphones increases the usage of online shopping. The overall feedback about product is generated with the help of sentiment analysis using text processing.Opinion mining or sentiment analysis is used to collect and categorized the reviews of product. The proposed system uses aspect leveldetection in which features are extracted from the datasets. The system performs pre-processing operation such as tokenization, part of speech and limitization on the data tofinds meaningful information which is used to detect the polarity level and assigns rating to product. The proposed model focuses on aspects to produces accurate result by avoiding the spam reviews.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanfilippo, Antonio P.; McGrath, Liam R.; Whitney, Paul D.
2011-11-17
We present a computational approach to radical rhetoric that leverages the co-expression of rhetoric and action features in discourse to identify violent intent. The approach combines text mining and machine learning techniques with insights from Frame Analysis and theories that explain the emergence of violence in terms of moral disengagement, the violation of sacred values and social isolation in order to build computational models that identify messages from terrorist sources and estimate their proximity to an attack. We discuss a specific application of this approach to a body of documents from and about radical and terrorist groups in the Middlemore » East and present the results achieved.« less
Advances in natural language processing.
Hirschberg, Julia; Manning, Christopher D
2015-07-17
Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
Estimating procedure times for surgeries by determining location parameters for the lognormal model.
Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H
2004-05-01
We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.
Continuous Rating for Diggability Assessment in Surface Mines
NASA Astrophysics Data System (ADS)
IPHAR, Melih
2016-10-01
The rocks can be loosened either by drilling-blasting or direct excavation using powerful machines in opencast mining operations. The economics of rock excavation is considered for each method to be applied. If blasting operation is not preferred and also the geological structures and rock mass properties in site are convenient (favourable ground conditions) for ripping or direct excavation method by mining machines, the next step is to determine which machine or excavator should be selected for the excavation purposes. Many researchers have proposed several diggability or excavatability assessment methods for deciding on excavator type to be used in the field. Most of these systems are generally based on assigning a rating for the parameters having importance in rock excavation process. However, the sharp transitions between the two adjacent classes for a given parameter can lead to some uncertainties. In this paper, it has been proposed that varying rating should be assigned for a given parameter called as “continuous rating” instead of giving constant rating for a given class.
LeadMine: a grammar and dictionary driven approach to entity recognition.
Lowe, Daniel M; Sayle, Roger A
2015-01-01
Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution.
LeadMine: a grammar and dictionary driven approach to entity recognition
2015-01-01
Background Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Results Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Conclusions Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution. PMID:25810776
Evaluation of an Integrated Multi-Task Machine Learning System with Humans in the Loop
2007-01-01
machine learning components natural language processing, and optimization...was examined with a test explicitly developed to measure the impact of integrated machine learning when used by a human user in a real world setting...study revealed that integrated machine learning does produce a positive impact on overall performance. This paper also discusses how specific machine learning components contributed to human-system
NASA Astrophysics Data System (ADS)
Scheele, C. J.; Huang, Q.
2016-12-01
In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.
Pulsed, Hydraulic Coal-Mining Machine
NASA Technical Reports Server (NTRS)
Collins, Earl R., Jr.
1986-01-01
In proposed coal-cutting machine, piston forces water through nozzle, expelling pulsed jet that cuts into coal face. Spring-loaded piston reciprocates at end of travel to refill water chamber. Machine a onecylinder, two-cycle, internal-combustion engine, fueled by gasoline, diesel fuel, or hydrogen. Fuel converted more directly into mechanical energy of water jet.
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2012 CFR
2012-07-01
... paragraphs (b) and (c) of this section, an ATRS system shall be used with roof bolting machines and continuous-mining machines with integral roof bolters operated in a working section. The requirements of this paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28...
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2013 CFR
2013-07-01
... paragraphs (b) and (c) of this section, an ATRS system shall be used with roof bolting machines and continuous-mining machines with integral roof bolters operated in a working section. The requirements of this paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28...
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2014 CFR
2014-07-01
... paragraphs (b) and (c) of this section, an ATRS system shall be used with roof bolting machines and continuous-mining machines with integral roof bolters operated in a working section. The requirements of this paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28...
NASA Astrophysics Data System (ADS)
Huang, Yin; Chen, Jianhua; Xiong, Shaojun
2009-07-01
Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.
A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns
ERIC Educational Resources Information Center
Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam
2013-01-01
Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…
Novel diesel exhaust filters for underground mining vehicles
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bickel, K.L.; Taubert, T.R.
1995-12-31
The U.S. Bureau of Mines (USBM) pioneered the development of disposable filters for reducing diesel particulate emissions from permissible mining machines. The USBM is now evaluating filter media that can withstand the high exhaust temperatures on nonpermissible machines. The goal of the evaluation is to find an inexpensive medium that can be cleaned or disposed of after use, and will reduce particulate emissions by 50 % or more. This report summarizes the results from screening tests of a lava rock and woven fiberglass filter media. The lava rock media exhibited low collection efficiencies, but with very low increases in exhaustmore » back pressure. Preliminary results indicate a collection efficiency exceeding 80 % for the woven fiber media. Testing of both media is continuing.« less
Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth
2017-09-13
Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.
NASA Astrophysics Data System (ADS)
Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth
2017-09-01
Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.
Chaudhary, Dhanjee Kumar; Bhattacherjee, Ashis; Patra, Aditya Kumar; Chau, Nearkasen
2015-12-01
This study aimed to assess the whole-body vibration (WBV) exposure among large blast hole drill machine operators with regard to the International Organization for Standardization (ISO) recommended threshold values and its association with machine- and rock-related factors and workers' individual characteristics. The study population included 28 drill machine operators who had worked in four opencast iron ore mines in eastern India. The study protocol comprised the following: measurements of WBV exposure [frequency weighted root mean square (RMS) acceleration (m/s(2))], machine-related data (manufacturer of machine, age of machine, seat height, thickness, and rest height) collected from mine management offices, measurements of rock hardness, uniaxial compressive strength and density, and workers' characteristics via face-to-face interviews. More than 90% of the operators were exposed to a higher level WBV than the ISO upper limit and only 3.6% between the lower and upper limits, mainly in the vertical axis. Bivariate correlations revealed that potential predictors of total WBV exposure were: machine manufacturer (r = 0.453, p = 0.015), age of drill (r = 0.533, p = 0.003), and hardness of rock (r = 0.561, p = 0.002). The stepwise multiple regression model revealed that the potential predictors are age of operator (regression coefficient β = -0.052, standard error SE = 0.023), manufacturer (β = 1.093, SE = 0.227), rock hardness (β = 0.045, SE = 0.018), uniaxial compressive strength (β = 0.027, SE = 0.009), and density (β = -1.135, SE = 0.235). Prevention should include using appropriate machines to handle rock hardness, rock uniaxial compressive strength and density, and seat improvement using ergonomic approaches such as including a suspension system.
Chaudhary, Dhanjee Kumar; Bhattacherjee, Ashis; Patra, Aditya Kumar; Chau, Nearkasen
2015-01-01
Background This study aimed to assess the whole-body vibration (WBV) exposure among large blast hole drill machine operators with regard to the International Organization for Standardization (ISO) recommended threshold values and its association with machine- and rock-related factors and workers' individual characteristics. Methods The study population included 28 drill machine operators who had worked in four opencast iron ore mines in eastern India. The study protocol comprised the following: measurements of WBV exposure [frequency weighted root mean square (RMS) acceleration (m/s2)], machine-related data (manufacturer of machine, age of machine, seat height, thickness, and rest height) collected from mine management offices, measurements of rock hardness, uniaxial compressive strength and density, and workers' characteristics via face-to-face interviews. Results More than 90% of the operators were exposed to a higher level WBV than the ISO upper limit and only 3.6% between the lower and upper limits, mainly in the vertical axis. Bivariate correlations revealed that potential predictors of total WBV exposure were: machine manufacturer (r = 0.453, p = 0.015), age of drill (r = 0.533, p = 0.003), and hardness of rock (r = 0.561, p = 0.002). The stepwise multiple regression model revealed that the potential predictors are age of operator (regression coefficient β = −0.052, standard error SE = 0.023), manufacturer (β = 1.093, SE = 0.227), rock hardness (β = 0.045, SE = 0.018), uniaxial compressive strength (β = 0.027, SE = 0.009), and density (β = –1.135, SE = 0.235). Conclusion Prevention should include using appropriate machines to handle rock hardness, rock uniaxial compressive strength and density, and seat improvement using ergonomic approaches such as including a suspension system. PMID:26929838
Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques
ERIC Educational Resources Information Center
Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili
2009-01-01
In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2013 CFR
2013-07-01
... all electrical components for materials, workmanship, design, and construction; (2) Examination of all components of the machine which have been approved or certified under Bureau of Mines Schedule 2D, 2E, 2F, or...
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2012 CFR
2012-07-01
... all electrical components for materials, workmanship, design, and construction; (2) Examination of all components of the machine which have been approved or certified under Bureau of Mines Schedule 2D, 2E, 2F, or...
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2014 CFR
2014-07-01
... all electrical components for materials, workmanship, design, and construction; (2) Examination of all components of the machine which have been approved or certified under Bureau of Mines Schedule 2D, 2E, 2F, or...
Kireeva, Natalia V; Ovchinnikova, Svetlana I; Kuznetsov, Sergey L; Kazennov, Andrey M; Tsivadze, Aslan Yu
2014-02-01
This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
NASA Astrophysics Data System (ADS)
Kireeva, Natalia V.; Ovchinnikova, Svetlana I.; Kuznetsov, Sergey L.; Kazennov, Andrey M.; Tsivadze, Aslan Yu.
2014-02-01
This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
Transfer learning for visual categorization: a survey.
Shao, Ling; Zhu, Fan; Li, Xuelong
2015-05-01
Regular machine learning and data mining techniques study the training data for future inferences under a major assumption that the future data are within the same feature space or have the same distribution as the training data. However, due to the limited availability of human labeled training data, training data that stay in the same feature space or have the same distribution as the future data cannot be guaranteed to be sufficient enough to avoid the over-fitting problem. In real-world applications, apart from data in the target domain, related data in a different domain can also be included to expand the availability of our prior knowledge about the target future data. Transfer learning addresses such cross-domain learning problems by extracting useful information from data in a related domain and transferring them for being used in target tasks. In recent years, with transfer learning being applied to visual categorization, some typical problems, e.g., view divergence in action recognition tasks and concept drifting in image classification tasks, can be efficiently solved. In this paper, we survey state-of-the-art transfer learning algorithms in visual categorization applications, such as object recognition, image classification, and human action recognition.
2011-01-01
Background Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. Results This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. Conclusions AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements. PMID:21798025
Stålring, Jonna C; Carlsson, Lars A; Almeida, Pedro; Boyer, Scott
2011-07-28
Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements.
Optimization of C4.5 algorithm-based particle swarm optimization for breast cancer diagnosis
NASA Astrophysics Data System (ADS)
Muslim, M. A.; Rukmana, S. H.; Sugiharti, E.; Prasetiyo, B.; Alimah, S.
2018-03-01
Data mining has become a basic methodology for computational applications in the field of medical domains. Data mining can be applied in the health field such as for diagnosis of breast cancer, heart disease, diabetes and others. Breast cancer is most common in women, with more than one million cases and nearly 600,000 deaths occurring worldwide each year. The most effective way to reduce breast cancer deaths was by early diagnosis. This study aims to determine the level of breast cancer diagnosis. This research data uses Wisconsin Breast Cancer dataset (WBC) from UCI machine learning. The method used in this research is the algorithm C4.5 and Particle Swarm Optimization (PSO) as a feature option and to optimize the algorithm. C4.5. Ten-fold cross-validation is used as a validation method and a confusion matrix. The result of this research is C4.5 algorithm. The particle swarm optimization C4.5 algorithm has increased by 0.88%.
Chemical named entities recognition: a review on approaches and applications
2014-01-01
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to “text mine” these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted. PMID:24834132
Geometrical structure of Neural Networks: Geodesics, Jeffrey's Prior and Hyper-ribbons
NASA Astrophysics Data System (ADS)
Hayden, Lorien; Alemi, Alex; Sethna, James
2014-03-01
Neural networks are learning algorithms which are employed in a host of Machine Learning problems including speech recognition, object classification and data mining. In practice, neural networks learn a low dimensional representation of high dimensional data and define a model manifold which is an embedding of this low dimensional structure in the higher dimensional space. In this work, we explore the geometrical structure of a neural network model manifold. A Stacked Denoising Autoencoder and a Deep Belief Network are trained on handwritten digits from the MNIST database. Construction of geodesics along the surface and of slices taken from the high dimensional manifolds reveal a hierarchy of widths corresponding to a hyper-ribbon structure. This property indicates that neural networks fall into the class of sloppy models, in which certain parameter combinations dominate the behavior. Employing this information could prove valuable in designing both neural network architectures and training algorithms. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No . DGE-1144153.
The Efficacy of Machine Learning Programs for Navy Manpower Analysis
1993-03-01
This thesis investigated the efficacy of two machine learning programs for Navy manpower analysis. Two machine learning programs, AIM and IXL, were...to generate models from the two commercial machine learning programs. Using a held out sub-set of the data the capabilities of the three models were...partial effects. The author recommended further investigation of AIM’s capabilities, and testing in an operational environment.... Machine learning , AIM, IXL.
The Security of Machine Learning
2008-04-24
Machine learning has become a fundamental tool for computer security, since it can rapidly evolve to changing and complex situations. That...adaptability is also a vulnerability: attackers can exploit machine learning systems. We present a taxonomy identifying and analyzing attacks against machine ...We use our framework to survey and analyze the literature of attacks against machine learning systems. We also illustrate our taxonomy by showing
NASA Astrophysics Data System (ADS)
Zhao, Dekang; Wu, Qiang; Cui, Fangpeng; Xu, Hua; Zeng, Yifan; Cao, Yufei; Du, Yuanze
2018-04-01
Coal-floor water-inrush incidents account for a large proportion of coal mine disasters in northern China, and accurate risk assessment is crucial for safe coal production. A novel and promising assessment model for water inrush is proposed based on random forest (RF), which is a powerful intelligent machine-learning algorithm. RF has considerable advantages, including high classification accuracy and the capability to evaluate the importance of variables; in particularly, it is robust in dealing with the complicated and non-linear problems inherent in risk assessment. In this study, the proposed model is applied to Panjiayao Coal Mine, northern China. Eight factors were selected as evaluation indices according to systematic analysis of the geological conditions and a field survey of the study area. Risk assessment maps were generated based on RF, and the probabilistic neural network (PNN) model was also used for risk assessment as a comparison. The results demonstrate that the two methods are consistent in the risk assessment of water inrush at the mine, and RF shows a better performance compared to PNN with an overall accuracy higher by 6.67%. It is concluded that RF is more practicable to assess the water-inrush risk than PNN. The presented method will be helpful in avoiding water inrush and also can be extended to various engineering applications.
A sentence sliding window approach to extract protein annotations from biomedical articles
Krallinger, Martin; Padron, Maria; Valencia, Alfonso
2005-01-01
Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831
Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin
2017-07-03
A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
ERIC Educational Resources Information Center
Kinnebrew, John S.; Biswas, Gautam
2012-01-01
Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…
Entanglement-Based Machine Learning on a Quantum Computer
NASA Astrophysics Data System (ADS)
Cai, X.-D.; Wu, D.; Su, Z.-E.; Chen, M.-C.; Wang, X.-L.; Li, Li; Liu, N.-L.; Lu, C.-Y.; Pan, J.-W.
2015-03-01
Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.
NASA Astrophysics Data System (ADS)
Houborg, Rasmus; McCabe, Matthew F.
2018-01-01
With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory 'predictor' variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association with inherent extrapolation and transferability limitations. Explanatory VIs formed from bands in the near-infrared (NIR) and shortwave infrared domains (e.g., NDWI) were associated with the highest predictive ability, whereas Cubist models relying entirely on VIs based on NIR and red band combinations (e.g., NDVI) were associated with comparatively high uncertainties (i.e., rMAD ∼ 21%). The most transferable and best performing models were based on combinations of several predictor variables, which included both NDWI- and NDVI-like variables. In this process, prior screening of input VIs based on an assessment of variable relevance served as an effective mechanism for optimizing prediction accuracies from both Cubist and RF. While this study demonstrated benefit in combining data mining operations with physically based constraints via a hybrid training approach, the concept of transferability and portability warrants further investigations in order to realize the full potential of emerging machine-learning techniques for regression purposes.
Mining machines effectiveness and OEE Indicator
NASA Astrophysics Data System (ADS)
Korski, Jacek; Tobór-Osadnik, Katarzyna; Wyganowska, Małgorzata
2017-11-01
The situation in the hard coal industry in Poland is forcing the identification of effectual and practical indicators of the effectiveness of machinery and equipment. In the article, the authors discuss the possible use of the OEE indicator for the evaluation of production processes in hard-coal mines. In summary, recommendations are made to enable efficiency assessment of mining machinery using the OEE.
Coal Mining Machinery Development As An Ecological Factor Of Progressive Technologies Implementation
NASA Astrophysics Data System (ADS)
Efremenkov, A. B.; Khoreshok, A. A.; Zhironkin, S. A.; Myaskov, A. V.
2017-01-01
At present, a significant amount of energy spent for the work of mining machines and coal mining equipment on coal mines and open pits goes to the coal grinding in the process of its extraction in mining faces. Meanwhile, the increase of small fractions in mined coal does not only reduce the profitability of its production, but also causes a further negative impact on the environment and degrades labor conditions for miners. The countermeasure to the specified processes is possible with the help of coal mining equipment development. However, against the background of the technological decrease of coal mine equipment applied in Russia the negative impact on the environment is getting reinforced.
Predictive analysis and data mining among the employment of fresh graduate students in HEI
NASA Astrophysics Data System (ADS)
Rahman, Nor Azziaty Abdul; Tan, Kian Lam; Lim, Chen Kim
2017-10-01
Management of higher education have a problem in producing 100% of graduates who can meet the needs of industry while industry is also facing the problem of finding skilled graduates who suit their needs partly due to the lack of an effective method in assessing problem solving skills as well as weaknesses in the assessment of problem-solving skills. The purpose of this paper is to propose a suitable classification model that can be used in making prediction and assessment of the attributes of the student's dataset to meet the selection criteria of work demanded by the industry of the graduates in the academic field. Supervised and unsupervised Machine Learning Algorithms were used in this research where; K-Nearest Neighbor, Naïve Bayes, Decision Tree, Neural Network, Logistic Regression and Support Vector Machine. The proposed model will help the university management to make a better long-term plans for producing graduates who are skilled, knowledgeable and fulfill the industry needs as well.
Parallel and Scalable Clustering and Classification for Big Data in Geosciences
NASA Astrophysics Data System (ADS)
Riedel, M.
2015-12-01
Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
Development of a HIPAA-compliant environment for translational research data and analytics.
Bradford, Wayne; Hurdle, John F; LaSalle, Bernie; Facelli, Julio C
2014-01-01
High-performance computing centers (HPC) traditionally have far less restrictive privacy management policies than those encountered in healthcare. We show how an HPC can be re-engineered to accommodate clinical data while retaining its utility in computationally intensive tasks such as data mining, machine learning, and statistics. We also discuss deploying protected virtual machines. A critical planning step was to engage the university's information security operations and the information security and privacy office. Access to the environment requires a double authentication mechanism. The first level of authentication requires access to the university's virtual private network and the second requires that the users be listed in the HPC network information service directory. The physical hardware resides in a data center with controlled room access. All employees of the HPC and its users take the university's local Health Insurance Portability and Accountability Act training series. In the first 3 years, researcher count has increased from 6 to 58.
Classification Algorithms for Big Data Analysis, a Map Reduce Approach
NASA Astrophysics Data System (ADS)
Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.
2015-03-01
Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.
A Machine Learning and Optimization Toolkit for the Swarm
2014-11-17
Machine Learning and Op0miza0on Toolkit for the Swarm Ilge Akkaya, Shuhei Emoto...3. DATES COVERED 00-00-2014 to 00-00-2014 4. TITLE AND SUBTITLE A Machine Learning and Optimization Toolkit for the Swarm 5a. CONTRACT NUMBER... machine learning methodologies by providing the right interfaces between machine learning tools and
Data mining for better material synthesis: The case of pulsed laser deposition of complex oxides
NASA Astrophysics Data System (ADS)
Young, Steven R.; Maksov, Artem; Ziatdinov, Maxim; Cao, Ye; Burch, Matthew; Balachandran, Janakiraman; Li, Linglong; Somnath, Suhas; Patton, Robert M.; Kalinin, Sergei V.; Vasudevan, Rama K.
2018-03-01
The pursuit of more advanced electronics, and finding solutions to energy needs often hinges upon the discovery and optimization of new functional materials. However, the discovery rate of these materials is alarmingly low. Much of the information that could drive this rate higher is scattered across tens of thousands of papers in the extant literature published over several decades but is not in an indexed form, and cannot be used in entirety without substantial effort. Many of these limitations can be circumvented if the experimentalist has access to systematized collections of prior experimental procedures and results. Here, we investigate the property-processing relationship during growth of oxide films by pulsed laser deposition. To do so, we develop an enabling software tool to (1) mine the literature of relevant papers for synthesis parameters and functional properties of previously studied materials, (2) enhance the accuracy of this mining through crowd sourcing approaches, (3) create a searchable repository that will be a community-wide resource enabling material scientists to leverage this information, and (4) provide through the Jupyter notebook platform, simple machine-learning-based analysis to learn the complex interactions between growth parameters and functional properties (all data/codes available on https://github.com/ORNL-DataMatls). The results allow visualization of growth windows, trends and outliers, which can serve as a template for analyzing the distribution of growth conditions, provide starting points for related compounds and act as a feedback for first-principles calculations. Such tools will comprise an integral part of the materials design schema in the coming decade.
Standardized data collection to build prediction models in oncology: a prototype for rectal cancer.
Meldolesi, Elisa; van Soest, Johan; Damiani, Andrea; Dekker, Andre; Alitto, Anna Rita; Campitelli, Maura; Dinapoli, Nicola; Gatta, Roberto; Gambacorta, Maria Antonietta; Lanzotti, Vito; Lambin, Philippe; Valentini, Vincenzo
2016-01-01
The advances in diagnostic and treatment technology are responsible for a remarkable transformation in the internal medicine concept with the establishment of a new idea of personalized medicine. Inter- and intra-patient tumor heterogeneity and the clinical outcome and/or treatment's toxicity's complexity, justify the effort to develop predictive models from decision support systems. However, the number of evaluated variables coming from multiple disciplines: oncology, computer science, bioinformatics, statistics, genomics, imaging, among others could be very large thus making traditional statistical analysis difficult to exploit. Automated data-mining processes and machine learning approaches can be a solution to organize the massive amount of data, trying to unravel important interaction. The purpose of this paper is to describe the strategy to collect and analyze data properly for decision support and introduce the concept of an 'umbrella protocol' within the framework of 'rapid learning healthcare'.
ezTag: tagging biomedical concepts via interactive learning.
Kwon, Dongseop; Kim, Sun; Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong
2018-05-18
Recently, advanced text-mining techniques have been shown to speed up manual data curation by providing human annotators with automated pre-annotations generated by rules or machine learning models. Due to the limited training data available, however, current annotation systems primarily focus only on common concept types such as genes or diseases. To support annotating a wide variety of biological concepts with or without pre-existing training data, we developed ezTag, a web-based annotation tool that allows curators to perform annotation and provide training data with humans in the loop. ezTag supports both abstracts in PubMed and full-text articles in PubMed Central. It also provides lexicon-based concept tagging as well as the state-of-the-art pre-trained taggers such as TaggerOne, GNormPlus and tmVar. ezTag is freely available at http://eztag.bioqrator.org.
Learning by doing at the Colorado School of Mines
NASA Astrophysics Data System (ADS)
Furtak, Thomas E.; Ruskell, Todd G.
2013-03-01
With over 260 majors, the undergraduate physics program at CSM is among the largest in the country. An underlying theme in this success is experiential learning, starting with a studio teaching method in the introductory calculus-based physics courses. After their second year students complete a 6-week full-time summer course devoted to hands-on practical knowledge and skills, including machine shop techniques, high-vacuum technology, applied optics, electronic control systems, and computational tools. This precedes a two-semester laboratory sequence that can be taught at an advanced level because of the students' experience. The required capstone senior course is a year-long open-ended challenge in which students partner with members of the faculty to work on authentic research projects, teaming with grad students or post-docs as contributing members to the department's externally funded scholarship. All of these features are important components of our B.S. degree, Engineering Physics, which is officially accredited by ABET.
Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling
Cuperlovic-Culf, Miroslava
2018-01-01
Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649
Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.
Cuperlovic-Culf, Miroslava
2018-01-11
Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.
Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro
2018-05-09
Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.
Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.
Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P
2017-12-01
Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.
Next-Generation Machine Learning for Biological Networks.
Camacho, Diogo M; Collins, Katherine M; Powers, Rani K; Costello, James C; Collins, James J
2018-06-14
Machine learning, a collection of data-analytical techniques aimed at building predictive models from multi-dimensional datasets, is becoming integral to modern biological research. By enabling one to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a primer on machine learning for life scientists, including an introduction to deep learning. We discuss opportunities and challenges at the intersection of machine learning and network biology, which could impact disease biology, drug discovery, microbiome research, and synthetic biology. Copyright © 2018 Elsevier Inc. All rights reserved.
Comparison between extreme learning machine and wavelet neural networks in data classification
NASA Astrophysics Data System (ADS)
Yahia, Siwar; Said, Salwa; Jemai, Olfa; Zaied, Mourad; Ben Amar, Chokri
2017-03-01
Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.
MLBCD: a machine learning tool for big clinical data.
Luo, Gang
2015-01-01
Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.
Machine Learning and Radiology
Wang, Shijun; Summers, Ronald M.
2012-01-01
In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077
Evaluating the Security of Machine Learning Algorithms
2008-05-20
Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a...computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions... machine learning has a structure that we can use to build secure learning systems. This thesis makes three high-level contributions. First, we develop a
Bailey-Wilson, Joan E.; Brennan, Jennifer S.; Bull, Shelley B; Culverhouse, Robert; Kim, Yoonhee; Jiang, Yuan; Jung, Jeesun; Li, Qing; Lamina, Claudia; Liu, Ying; Mägi, Reedik; Niu, Yue S.; Simpson, Claire L.; Wang, Libo; Yilmaz, Yildiz E.; Zhang, Heping; Zhang, Zhaogong
2012-01-01
Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors. PMID:22128066
INSIGHTS FROM MACHINE-LEARNED DIET SUCCESS PREDICTION.
Weber, Ingmar; Achananuparp, Palakorn
2016-01-01
To support people trying to lose weight and stay healthy, more and more fitness apps have sprung up including the ability to track both calories intake and expenditure. Users of such apps are part of a wider "quantified self" movement and many opt-in to publicly share their logged data. In this paper, we use public food diaries of more than 4,000 long-term active MyFitnessPal users to study the characteristics of a (un-)successful diet. Concretely, we train a machine learning model to predict repeatedly being over or under self-set daily calories goals and then look at which features contribute to the model's prediction. Our findings include both expected results, such as the token "mcdonalds" or the category "dessert" being indicative for being over the calories goal, but also less obvious ones such as the difference between pork and poultry concerning dieting success, or the use of the "quick added calories" functionality being indicative of over-shooting calorie-wise. This study also hints at the feasibility of using such data for more in-depth data mining, e.g., looking at the interaction between consumed foods such as mixing protein- and carbohydrate-rich foods. To the best of our knowledge, this is the first systematic study of public food diaries.
PhenoLines: Phenotype Comparison Visualizations for Disease Subtyping via Topic Models.
Glueck, Michael; Naeini, Mahdi Pakdaman; Doshi-Velez, Finale; Chevalier, Fanny; Khan, Azam; Wigdor, Daniel; Brudno, Michael
2018-01-01
PhenoLines is a visual analysis tool for the interpretation of disease subtypes, derived from the application of topic models to clinical data. Topic models enable one to mine cross-sectional patient comorbidity data (e.g., electronic health records) and construct disease subtypes-each with its own temporally evolving prevalence and co-occurrence of phenotypes-without requiring aligned longitudinal phenotype data for all patients. However, the dimensionality of topic models makes interpretation challenging, and de facto analyses provide little intuition regarding phenotype relevance or phenotype interrelationships. PhenoLines enables one to compare phenotype prevalence within and across disease subtype topics, thus supporting subtype characterization, a task that involves identifying a proposed subtype's dominant phenotypes, ages of effect, and clinical validity. We contribute a data transformation workflow that employs the Human Phenotype Ontology to hierarchically organize phenotypes and aggregate the evolving probabilities produced by topic models. We introduce a novel measure of phenotype relevance that can be used to simplify the resulting topology. The design of PhenoLines was motivated by formative interviews with machine learning and clinical experts. We describe the collaborative design process, distill high-level tasks, and report on initial evaluations with machine learning experts and a medical domain expert. These results suggest that PhenoLines demonstrates promising approaches to support the characterization and optimization of topic models.
Cheng, Lu; Zhu, Mu; Poss, Jeffrey W; Hirdes, John P; Glenny, Christine; Stolee, Paul
2015-10-09
Resources for home care rehabilitation are limited, and many home care clients who could benefit do not receive rehabilitation therapy. The interRAI Contact Assessment (CA) is a new screening instrument comprised of a subset of interRAI Home Care (HC) items, designed to be used as a preliminary assessment to identify which potential home care clients should be referred for a full assessment, or for services such as rehabilitation. We investigated which client characteristics are most relevant in predicting rehabilitation use in the full interRAI HC assessment. We applied two algorithms from machine learning and data mining - the LASSO and the random forest - to frequency matched interRAI HC and service utilization data for home care clients in Ontario, Canada. Analyses confirmed the importance of functional decline and mobility variables in targeting rehabilitation services, but suggested that other items in use as potential predictors may be less relevant. Six of the most highly ranked items related to ambulation. Diagnosis of cancer was highly associated with decreased rehabilitation use; however, cognitive status was not. Inconsistencies between variables considered important for classifying clients who need rehabilitation and those identified in this study based on use may indicate a discrepancy in the client characteristics considered relevant in theory versus actual practice.
Intelligent excavator control system for lunar mining system
NASA Astrophysics Data System (ADS)
Lever, Paul J. A.; Wang, Fei-Yue
1995-01-01
A major benefit of utilizing local planetary resources is that it reduces the need and cost of lifting materials from the Earth's surface into Earth orbit. The location of the moon makes it an ideal site for harvesting the materials needed to assist space activities. Here, lunar excavation will take place in the dynamic unstructured lunar environment, in which conditions are highly variable and unpredictable. Autonomous mining (excavation) machines are necessary to remove human operators from this hazardous environment. This machine must use a control system structure that can identify, plan, sense, and control real-time dynamic machine movements in the lunar environment. The solution is a vision-based hierarchical control structure. However, excavation tasks require force/torque sensor feedback to control the excavation tool after it has penetrated the surface. A fuzzy logic controller (FLC) is used to interpret the forces and torques gathered from a bucket mounted force/torque sensor during excavation. Experimental results from several excavation tests using the FLC are presented here. These results represent the first step toward an integrated sensing and control system for a lunar mining system.
Drilling side holes from a borehole
NASA Technical Reports Server (NTRS)
Collins, E. R., Jr.
1980-01-01
Machine takes long horizontal stratum samples from confines of 21 cm bore hole. Stacked interlocking half cylindrical shells mate to form rigid thrust tube. Drive shaft and core storage device is flexible and retractable. Entire machine fits in 10 meter length of steel tube. Machine could drill drainage or ventilation holes in coal mines, or provide important information for geological, oil, and geothermal surveys.
Code of Federal Regulations, 2013 CFR
2013-10-01
... machine. An acceptable method for measuring the concentration of carbon dioxide is described in Bureau of Mines Report of Investigations 6865, A Machine-Test Method for Measuring Carbon Dioxide in the Inspired... of 10.5 liters. (3) A sedentary breathing machine cam will be used. (4) The apparatus will be tested...
Code of Federal Regulations, 2012 CFR
2012-10-01
... machine. An acceptable method for measuring the concentration of carbon dioxide is described in Bureau of Mines Report of Investigations 6865, A Machine-Test Method for Measuring Carbon Dioxide in the Inspired... of 10.5 liters. (3) A sedentary breathing machine cam will be used. (4) The apparatus will be tested...
Code of Federal Regulations, 2014 CFR
2014-10-01
... machine. An acceptable method for measuring the concentration of carbon dioxide is described in Bureau of Mines Report of Investigations 6865, A Machine-Test Method for Measuring Carbon Dioxide in the Inspired... of 10.5 liters. (3) A sedentary breathing machine cam will be used. (4) The apparatus will be tested...
Using human brain activity to guide machine learning.
Fong, Ruth C; Scheirer, Walter J; Cox, David D
2018-03-29
Machine learning is a field of computer science that builds algorithms that learn. In many cases, machine learning algorithms are used to recreate a human ability like adding a caption to a photo, driving a car, or playing a game. While the human brain has long served as a source of inspiration for machine learning, little effort has been made to directly use data collected from working brains as a guide for machine learning algorithms. Here we demonstrate a new paradigm of "neurally-weighted" machine learning, which takes fMRI measurements of human brain activity from subjects viewing images, and infuses these data into the training process of an object recognition learning algorithm to make it more consistent with the human brain. After training, these neurally-weighted classifiers are able to classify images without requiring any additional neural data. We show that our neural-weighting approach can lead to large performance gains when used with traditional machine vision features, as well as to significant improvements with already high-performing convolutional neural network features. The effectiveness of this approach points to a path forward for a new class of hybrid machine learning algorithms which take both inspiration and direct constraints from neuronal data.
NASA Astrophysics Data System (ADS)
Ivanov, A. S.; Kalanchin, I. Yu; Pugacheva, E. E.
2017-09-01
One of the first electric motors, based on the use of electromagnets, was a reluctance motor in the XIX century. Due to the complexities in the implementation of control system the development of switched reluctance electric machines was repeatedly initiated only in 1960 thanks to the development of computers and power electronic devices. The main feature of these machines is the capacity to work both in engine mode and in generator mode. Thanks to a simple and reliable design in which there is no winding of the rotor, commutator, permanent magnets, a reactive gate-inductor electric drive operating in the engine mode is actively being introduced into various areas such as car industry, production of household appliances, wind power engineering, as well as responsible production processes in the oil and mining industries. However, the existing shortcomings of switched reluctance electric machines, such as nonlinear pulsations of electromagnetic moment, the presence of three or four phase supply system and sensor of rotor position prevent wide distribution of this kind of electric machines.
Quantum-Enhanced Machine Learning
NASA Astrophysics Data System (ADS)
Dunjko, Vedran; Taylor, Jacob M.; Briegel, Hans J.
2016-09-01
The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.
Machine learning reveals orbital interaction in materials
NASA Astrophysics Data System (ADS)
Lam Pham, Tien; Kino, Hiori; Terakura, Kiyoyuki; Miyake, Takashi; Tsuda, Koji; Takigawa, Ichigaku; Chi Dam, Hieu
2017-12-01
We propose a novel representation of materials named an 'orbital-field matrix (OFM)', which is based on the distribution of valence shell electrons. We demonstrate that this new representation can be highly useful in mining material data. Experimental investigation shows that the formation energies of crystalline materials, atomization energies of molecular materials, and local magnetic moments of the constituent atoms in bimetal alloys of lanthanide metal and transition-metal can be predicted with high accuracy using the OFM. Knowledge regarding the role of the coordination numbers of the transition-metal and lanthanide elements in determining the local magnetic moments of the transition-metal sites can be acquired directly from decision tree regression analyses using the OFM.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
NASA Technical Reports Server (NTRS)
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
Parsing Citations in Biomedical Articles Using Conditional Random Fields
Zhang, Qing; Cao, Yong-Gang; Yu, Hong
2011-01-01
Citations are used ubiquitously in biomedical full-text articles and play an important role for representing both the rhetorical structure and the semantic content of the articles. As a result, text mining systems will significantly benefit from a tool that automatically extracts the content of a citation. In this study, we applied the supervised machine-learning algorithms Conditional Random Fields (CRFs) to automatically parse a citation into its fields (e.g., Author, Title, Journal, and Year). With a subset of html format open-access PubMed Central articles, we report an overall 97.95% F1-score. The citation parser can be accessed at: http://www.cs.uwm.edu/~qing/projects/cithit/index.html. PMID:21419403
Automated expert modeling for automated student evaluation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abbott, Robert G.
The 8th International Conference on Intelligent Tutoring Systems provides a leading international forum for the dissemination of original results in the design, implementation, and evaluation of intelligent tutoring systems and related areas. The conference draws researchers from a broad spectrum of disciplines ranging from artificial intelligence and cognitive science to pedagogy and educational psychology. The conference explores intelligent tutoring systems increasing real world impact on an increasingly global scale. Improved authoring tools and learning object standards enable fielding systems and curricula in real world settings on an unprecedented scale. Researchers deploy ITS's in ever larger studies and increasingly use datamore » from real students, tasks, and settings to guide new research. With high volumes of student interaction data, data mining, and machine learning, tutoring systems can learn from experience and improve their teaching performance. The increasing number of realistic evaluation studies also broaden researchers knowledge about the educational contexts for which ITS's are best suited. At the same time, researchers explore how to expand and improve ITS/student communications, for example, how to achieve more flexible and responsive discourse with students, help students integrate Web resources into learning, use mobile technologies and games to enhance student motivation and learning, and address multicultural perspectives.« less
Future Sky Surveys: New Discovery Frontiers
NASA Astrophysics Data System (ADS)
Tyson, J. Anthony; Borne, Kirk D.
2012-03-01
Driven by the availability of new instrumentation, there has been an evolution in astronomical science toward comprehensive investigations of new phenomena. Major advances in our understanding of the Universe over the history of astronomy have often arisen from dramatic improvements in our capability to observe the sky to greater depth, in previously unexplored wavebands, with higher precision, or with improved spatial, spectral, or temporal resolution. Substantial progress in the important scientific problems of the next decade (determining the nature of dark energy and dark matter, studying the evolution of galaxies and the structure of our own Milky Way, opening up the time domain to discover faint variable objects, and mapping both the inner and outer Solar System) can be achieved through the application of advanced data mining methods and machine learning algorithms operating on the numerous large astronomical databases that will be generated from a variety of revolutionary future sky surveys. Over the next decade, astronomy will irrevocably enter the era of big surveys and of really big telescopes. New sky surveys (some of which will produce petabyte-scale data collections) will begin their operations, and one or more very large telescopes (ELTs = Extremely Large Telescopes) will enter the construction phase. These programs and facilities will generate a remarkable wealth of data of high complexity, endowed with enormous scientific knowledge discovery potential. New parameter spaces will be opened, in multiple wavelength domains as well as the time domain, across wide areas of the sky, and down to unprecedented faint source flux limits. The synergies of grand facilities, massive data collections, and advanced machine learning algorithms will come together to enable discoveries within most areas of astronomical science, including Solar System, exo-planets, star formation, stellar populations, stellar death, galaxy assembly, galaxy evolution, quasar evolution, and cosmology. Current and future sky surveys, comprising an alphabet soup of project names (e.g., Pan- STARRS, WISE, Kepler, DES, VST, VISTA, GAIA, EUCLID, SKA, LSST, and WFIRST; some of which are discussed in Chapters 17, 18, and 20),will contribute to the exponential explosion of complex data in astronomy. The scientific goals of these projects are as monumental as the programs themselves. The core scientific output of all of these will be their scientific data collection. Consequently, data mining and machine learning algorithms and specialists will become a common component of future astronomical research with these facilities. This synergistic combination and collaboration among multiple disciplines are essential in order to maximize the scientific discovery potential, the science output, the research efficiency, and the success of these projects.
Code of Federal Regulations, 2014 CFR
2014-07-01
... constitute an integral part of a circuit for transmitting electrical energy. (d) Cable reels for shuttle cars... MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Construction and Design Requirements § 18.45 Cable reels. (a) A self-propelled machine, that receives electrical energy through a portable...
Code of Federal Regulations, 2013 CFR
2013-07-01
... constitute an integral part of a circuit for transmitting electrical energy. (d) Cable reels for shuttle cars... MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Construction and Design Requirements § 18.45 Cable reels. (a) A self-propelled machine, that receives electrical energy through a portable...
Code of Federal Regulations, 2012 CFR
2012-07-01
... constitute an integral part of a circuit for transmitting electrical energy. (d) Cable reels for shuttle cars... MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Construction and Design Requirements § 18.45 Cable reels. (a) A self-propelled machine, that receives electrical energy through a portable...
Code of Federal Regulations, 2010 CFR
2010-07-01
... constitute an integral part of a circuit for transmitting electrical energy. (d) Cable reels for shuttle cars... MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Construction and Design Requirements § 18.45 Cable reels. (a) A self-propelled machine, that receives electrical energy through a portable...
Code of Federal Regulations, 2011 CFR
2011-07-01
... constitute an integral part of a circuit for transmitting electrical energy. (d) Cable reels for shuttle cars... MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Construction and Design Requirements § 18.45 Cable reels. (a) A self-propelled machine, that receives electrical energy through a portable...
Analyzing Student Inquiry Data Using Process Discovery and Sequence Classification
ERIC Educational Resources Information Center
Emond, Bruno; Buffett, Scott
2015-01-01
This paper reports on results of applying process discovery mining and sequence classification mining techniques to a data set of semi-structured learning activities. The main research objective is to advance educational data mining to model and support self-regulated learning in heterogeneous environments of learning content, activities, and…
Myths and legends in learning classification rules
NASA Technical Reports Server (NTRS)
Buntine, Wray
1990-01-01
A discussion is presented of machine learning theory on empirically learning classification rules. Six myths are proposed in the machine learning community that address issues of bias, learning as search, computational learning theory, Occam's razor, universal learning algorithms, and interactive learning. Some of the problems raised are also addressed from a Bayesian perspective. Questions are suggested that machine learning researchers should be addressing both theoretically and experimentally.
Machine Learning Based Malware Detection
2015-05-18
A TRIDENT SCHOLAR PROJECT REPORT NO. 440 Machine Learning Based Malware Detection by Midshipman 1/C Zane A. Markel, USN...COVERED (From - To) 4. TITLE AND SUBTITLE Machine Learning Based Malware Detection 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM...suitably be projected into realistic performance. This work explores several aspects of machine learning based malware detection . First, we
Interpreting Medical Information Using Machine Learning and Individual Conditional Expectation.
Nohara, Yasunobu; Wakata, Yoshifumi; Nakashima, Naoki
2015-01-01
Recently, machine-learning techniques have spread many fields. However, machine-learning is still not popular in medical research field due to difficulty of interpreting. In this paper, we introduce a method of interpreting medical information using machine learning technique. The method gave new explanation of partial dependence plot and individual conditional expectation plot from medical research field.
Machine Learning Applications to Resting-State Functional MR Imaging Analysis.
Billings, John M; Eder, Maxwell; Flood, William C; Dhami, Devendra Singh; Natarajan, Sriraam; Whitlow, Christopher T
2017-11-01
Machine learning is one of the most exciting and rapidly expanding fields within computer science. Academic and commercial research entities are investing in machine learning methods, especially in personalized medicine via patient-level classification. There is great promise that machine learning methods combined with resting state functional MR imaging will aid in diagnosis of disease and guide potential treatment for conditions thought to be impossible to identify based on imaging alone, such as psychiatric disorders. We discuss machine learning methods and explore recent advances. Copyright © 2017 Elsevier Inc. All rights reserved.
Source localization in an ocean waveguide using supervised machine learning.
Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter
2017-09-01
Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.
Manifold Regularized Experimental Design for Active Learning.
Zhang, Lining; Shum, Hubert P H; Shao, Ling
2016-12-02
Various machine learning and data mining tasks in classification require abundant data samples to be labeled for training. Conventional active learning methods aim at labeling the most informative samples for alleviating the labor of the user. Many previous studies in active learning select one sample after another in a greedy manner. However, this is not very effective because the classification models has to be retrained for each newly labeled sample. Moreover, many popular active learning approaches utilize the most uncertain samples by leveraging the classification hyperplane of the classifier, which is not appropriate since the classification hyperplane is inaccurate when the training data are small-sized. The problem of insufficient training data in real-world systems limits the potential applications of these approaches. This paper presents a novel method of active learning called manifold regularized experimental design (MRED), which can label multiple informative samples at one time for training. In addition, MRED gives an explicit geometric explanation for the selected samples to be labeled by the user. Different from existing active learning methods, our method avoids the intrinsic problems caused by insufficiently labeled samples in real-world applications. Various experiments on synthetic datasets, the Yale face database and the Corel image database have been carried out to show how MRED outperforms existing methods.
Machine Learning for Medical Imaging
Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L.
2017-01-01
Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. ©RSNA, 2017 PMID:28212054
Machine Learning for Medical Imaging.
Erickson, Bradley J; Korfiatis, Panagiotis; Akkus, Zeynettin; Kline, Timothy L
2017-01-01
Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. © RSNA, 2017.
Data Mining for Understanding and Impriving Decision-Making Affecting Ground Delay Programs
NASA Technical Reports Server (NTRS)
Kulkarni, Deepak; Wang, Yao Xun; Sridhar, Banavar
2013-01-01
The continuous growth in the demand for air transportation results in an imbalance between airspace capacity and traffic demand. The airspace capacity of a region depends on the ability of the system to maintain safe separation between aircraft in the region. In addition to growing demand, the airspace capacity is severely limited by convective weather. During such conditions, traffic managers at the FAA's Air Traffic Control System Command Center (ATCSCC) and dispatchers at various Airlines' Operations Center (AOC) collaborate to mitigate the demand-capacity imbalance caused by weather. The end result is the implementation of a set of Traffic Flow Management (TFM) initiatives such as ground delay programs, reroute advisories, flow metering, and ground stops. Data Mining is the automated process of analyzing large sets of data and then extracting patterns in the data. Data mining tools are capable of predicting behaviors and future trends, allowing an organization to benefit from past experience in making knowledge-driven decisions. The work reported in this paper is focused on ground delay programs. Data mining algorithms have the potential to develop associations between weather patterns and the corresponding ground delay program responses. If successful, they can be used to improve and standardize TFM decision resulting in better predictability of traffic flows on days with reliable weather forecasts. The approach here seeks to develop a set of data mining and machine learning models and apply them to historical archives of weather observations and forecasts and TFM initiatives to determine the extent to which the theory can predict and explain the observed traffic flow behaviors.
A Learning System for Discriminating Variants of Malicious Network Traffic
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beaver, Justin M; Symons, Christopher T; Gillen, Rob
Modern computer network defense systems rely primarily on signature-based intrusion detection tools, which generate alerts when patterns that are pre-determined to be malicious are encountered in network data streams. Signatures are created reactively, and only after in-depth manual analysis of a network intrusion. There is little ability for signature-based detectors to identify intrusions that are new or even variants of an existing attack, and little ability to adapt the detectors to the patterns unique to a network environment. Due to these limitations, the need exists for network intrusion detection techniques that can more comprehensively address both known unknown networkbased attacksmore » and can be optimized for the target environment. This work describes a system that leverages machine learning to provide a network intrusion detection capability that analyzes behaviors in channels of communication between individual computers. Using examples of malicious and non-malicious traffic in the target environment, the system can be trained to discriminate between traffic types. The machine learning provides insight that would be difficult for a human to explicitly code as a signature because it evaluates many interdependent metrics simultaneously. With this approach, zero day detection is possible by focusing on similarity to known traffic types rather than mining for specific bit patterns or conditions. This also reduces the burden on organizations to account for all possible attack variant combinations through signatures. The approach is presented along with results from a third-party evaluation of its performance.« less
Aural mapping of STEM concepts using literature mining
NASA Astrophysics Data System (ADS)
Bharadwaj, Venkatesh
Recent technological applications have made the life of people too much dependent on Science, Technology, Engineering, and Mathematics (STEM) and its applications. Understanding basic level science is a must in order to use and contribute to this technological revolution. Science education in middle and high school levels however depends heavily on visual representations such as models, diagrams, figures, animations and presentations etc. This leaves visually impaired students with very few options to learn science and secure a career in STEM related areas. Recent experiments have shown that small aural clues called Audemes are helpful in understanding and memorization of science concepts among visually impaired students. Audemes are non-verbal sound translations of a science concept. In order to facilitate science concepts as Audemes, for visually impaired students, this thesis presents an automatic system for audeme generation from STEM textbooks. This thesis describes the systematic application of multiple Natural Language Processing tools and techniques, such as dependency parser, POS tagger, Information Retrieval algorithm, Semantic mapping of aural words, machine learning etc., to transform the science concept into a combination of atomic-sounds, thus forming an audeme. We present a rule based classification method for all STEM related concepts. This work also presents a novel way of mapping and extracting most related sounds for the words being used in textbook. Additionally, machine learning methods are used in the system to guarantee the customization of output according to a user's perception. The system being presented is robust, scalable, fully automatic and dynamically adaptable for audeme generation.
Development of sensitized pick coal interface detector system
NASA Technical Reports Server (NTRS)
Burchill, R. F.
1979-01-01
One approach for detection of the coal interface is measurement of the pick cutting hoads and shock through the use of pick strain gage load cells and accelerometers. The cutting drum of a long wall mining machine contains a number of cutting picks. In order to measure pick loads and shocks, one pick was instrumented and telementry used to transmit the signals from the drum to an instrument-type tape recorder. A data system using FM telemetry was designed to transfer cutting bit load and shock information from the drum of a longwall shearer coal mining machine to a chassis mounted data recorder.
Machine learning in heart failure: ready for prime time.
Awan, Saqib Ejaz; Sohel, Ferdous; Sanfilippo, Frank Mario; Bennamoun, Mohammed; Dwivedi, Girish
2018-03-01
The aim of this review is to present an up-to-date overview of the application of machine learning methods in heart failure including diagnosis, classification, readmissions and medication adherence. Recent studies have shown that the application of machine learning techniques may have the potential to improve heart failure outcomes and management, including cost savings by improving existing diagnostic and treatment support systems. Recently developed deep learning methods are expected to yield even better performance than traditional machine learning techniques in performing complex tasks by learning the intricate patterns hidden in big medical data. The review summarizes the recent developments in the application of machine and deep learning methods in heart failure management.
Human Machine Learning Symbiosis
ERIC Educational Resources Information Center
Walsh, Kenneth R.; Hoque, Md Tamjidul; Williams, Kim H.
2017-01-01
Human Machine Learning Symbiosis is a cooperative system where both the human learner and the machine learner learn from each other to create an effective and efficient learning environment adapted to the needs of the human learner. Such a system can be used in online learning modules so that the modules adapt to each learner's learning state both…
Calculation of parameters of technological equipment for deep-sea mining
NASA Astrophysics Data System (ADS)
Yungmeister, D. A.; Ivanov, S. E.; Isaev, A. I.
2018-03-01
The actual problem of extracting minerals from the bottom of the world ocean is considered. On the ocean floor, three types of minerals are of interest: iron-manganese concretions (IMC), cobalt-manganese crusts (CMC) and sulphides. The analysis of known designs of machines and complexes for the extraction of IMC is performed. These machines are based on the principle of excavating the bottom surface; however such methods do not always correspond to “gentle” methods of mining. The ecological purity of such mining methods does not meet the necessary requirements. Such machines require the transmission of high electric power through the water column, which in some cases is a significant challenge. The authors analyzed the options of transportation of the extracted mineral from the bottom. The paper describes the design of machines that collect IMC by the method of vacuum suction. In this method, the gripping plates or drums are provided with cavities in which a vacuum is created and individual IMC are attracted to the devices by a pressure drop. The work of such machines can be called “gentle” processing technology of the bottom areas. Their environmental impact is significantly lower than mechanical devices that carry out the raking of IMC. The parameters of the device for lifting the IMC collected on the bottom are calculated. With the use of Kevlar ropes of serial production up to 0.06 meters in diameter, with a cycle time of up to 2 hours and a lifting speed of up to 3 meters per second, a productivity of about 400,000 tons per year can be realized for IMC. The development of machines based on the calculated parameters and approbation of their designs will create a unique complex for the extraction of minerals at oceanic deposits.
Machine learning in cardiovascular medicine: are we there yet?
Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P
2018-01-19
Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Zhang, Lu; Tan, Jianjun; Han, Dan; Zhu, Hao
2017-11-01
Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era. Copyright © 2017 Elsevier Ltd. All rights reserved.
Myths and legends in learning classification rules
NASA Technical Reports Server (NTRS)
Buntine, Wray
1990-01-01
This paper is a discussion of machine learning theory on empirically learning classification rules. The paper proposes six myths in the machine learning community that address issues of bias, learning as search, computational learning theory, Occam's razor, 'universal' learning algorithms, and interactive learnings. Some of the problems raised are also addressed from a Bayesian perspective. The paper concludes by suggesting questions that machine learning researchers should be addressing both theoretically and experimentally.
Douglas, P K; Harris, Sam; Yuille, Alan; Cohen, Mark S
2011-05-15
Machine learning (ML) has become a popular tool for mining functional neuroimaging data, and there are now hopes of performing such analyses efficiently in real-time. Towards this goal, we compared accuracy of six different ML algorithms applied to neuroimaging data of persons engaged in a bivariate task, asserting their belief or disbelief of a variety of propositional statements. We performed unsupervised dimension reduction and automated feature extraction using independent component (IC) analysis and extracted IC time courses. Optimization of classification hyperparameters across each classifier occurred prior to assessment. Maximum accuracy was achieved at 92% for Random Forest, followed by 91% for AdaBoost, 89% for Naïve Bayes, 87% for a J48 decision tree, 86% for K*, and 84% for support vector machine. For real-time decoding applications, finding a parsimonious subset of diagnostic ICs might be useful. We used a forward search technique to sequentially add ranked ICs to the feature subspace. For the current data set, we determined that approximately six ICs represented a meaningful basis set for classification. We then projected these six IC spatial maps forward onto a later scanning session within subject. We then applied the optimized ML algorithms to these new data instances, and found that classification accuracy results were reproducible. Additionally, we compared our classification method to our previously published general linear model results on this same data set. The highest ranked IC spatial maps show similarity to brain regions associated with contrasts for belief > disbelief, and disbelief < belief. Copyright © 2010 Elsevier Inc. All rights reserved.
Machine learning and radiology.
Wang, Shijun; Summers, Ronald M
2012-07-01
In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. Copyright © 2012. Published by Elsevier B.V.
Association Rule Mining from an Intelligent Tutor
ERIC Educational Resources Information Center
Dogan, Buket; Camurcu, A. Yilmaz
2008-01-01
Educational data mining is a very novel research area, offering fertile ground for many interesting data mining applications. Educational data mining can extract useful information from educational activities for better understanding and assessment of the student learning process. In this way, it is possible to explore how students learn topics in…
ERIC Educational Resources Information Center
Sagan, Carl
1975-01-01
The author of this article believes that human survival depends upon the ability to develop and work with machines of high artificial intelligence. He lists uses of such machines, including terrestrial mining, outer space exploration, and other tasks too dangerous, too expensive, or too boring for human beings. (MA)
Lapko, I V; Kir'iakov, V A; Antoshina, L I; Pavlovskaia, N A; Kondratovich, S V
2014-01-01
The authors studied influence of vibration, noise, physical overexertion and microclimate on carbohydrates metabolism and insulin resistance in metal mining industry workers. Findings are that vibration disease appeared to have maximal effect on insulin resistance test results and insulin level. The authors suggested biomarkers for early diagnosis of insulin resistance disorders in metal mining industry workers.
Zupanc, Christine M; Burgess-Limerick, Robin J; Wallis, Guy
2007-08-01
To investigate error and reaction time consequences of alternating compatible and incompatible steering arrangements during a simulated obstacle avoidance task. Underground coal mine shuttle cars provide an example of a vehicle in which operators are required to alternate between compatible and incompatible steering configurations. This experiment examines the performance of 48 novice participants in a virtual analogy of an underground coal mine shuttle car. Participants were randomly assigned to a compatible condition, an incompatible condition, an alternating condition in which compatibility alternated within and between hands, or an alternating condition in which compatibility alternated between hands. Participants made fewer steering direction errors and made correct steering responses more quickly in the compatible condition. Error rate decreased over time in the incompatible condition. A compatibility effect for both errors and reaction time was also found when the control-response relationship alternated; however, performance improvements over time were not consistent. Isolating compatibility to a hand resulted in reduced error rate and faster reaction time than when compatibility alternated within and between hands. The consequences of alternating control-response relationships are higher error rates and slower responses, at least in the early stages of learning. This research highlights the importance of ensuring consistently compatible human-machine directional control-response relationships.
Applications of Machine Learning and Rule Induction,
1995-02-15
An important area of application for machine learning is in automating the acquisition of knowledge bases required for expert systems. In this paper...we review the major paradigms for machine learning , including neural networks, instance-based methods, genetic learning, rule induction, and analytic
75 FR 17529 - High-Voltage Continuous Mining Machine Standard for Underground Coal Mines
Federal Register 2010, 2011, 2012, 2013, 2014
2010-04-06
..., requires manufacturers to provide safeguards against corona on all 4,160-volt circuits in explosion-proof enclosures. Corona is a luminous discharge that occurs around electric conductors that are subject to high electric stresses. Corona can cause premature breakdown of insulating materials in explosion-proof...
30 CFR 18.48 - Circuit-interrupting devices.
Code of Federal Regulations, 2010 CFR
2010-07-01
..., AND APPROVAL OF MINING PRODUCTS ELECTRIC MOTOR-DRIVEN MINE EQUIPMENT AND ACCESSORIES Construction and.... Such a switch shall be designed to prevent electrical connection to the machine frame when the cable is... motor in the event the belt is stopped, or abnormally slowed down. Note: Short transfer-type conveyors...
TaggerOne: joint named entity recognition and normalization with semi-Markov Models
Leaman, Robert; Lu, Zhiyong
2016-01-01
Motivation: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization. Methods: We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput. Results: We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance. Availability and Implementation: The TaggerOne source code and an online demonstration are available at: http://www.ncbi.nlm.nih.gov/bionlp/taggerone Contact: zhiyong.lu@nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27283952
TaggerOne: joint named entity recognition and normalization with semi-Markov Models.
Leaman, Robert; Lu, Zhiyong
2016-09-15
Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization. We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput. We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance. The TaggerOne source code and an online demonstration are available at: http://www.ncbi.nlm.nih.gov/bionlp/taggerone zhiyong.lu@nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
ERIC Educational Resources Information Center
Winne, Philip H.; Baker, Ryan S. J. D.
2013-01-01
Our article introduces the "Journal of Educational Data Mining's" Special Issue on Educational Data Mining on Motivation, Metacognition, and Self-Regulated Learning. We outline general research challenges for data mining researchers who conduct investigations in these areas, the potential of EDM to advance research in this area, and…
Analysis of Availability of Longwall-Shearer Based On Its Working Cycle
NASA Astrophysics Data System (ADS)
Brodny, Jaroslaw; Tutak, Magdalena
2017-12-01
Effective use of any type of devices, particularly machines has very significant meaning for mining enterprises. High costs of their purchase and tenancy cause that these enterprises tend to the best use of own technical potential. However, characteristics of mining production causes that this process not always proceeds without interferences. Practical experiences show that determination of objective measure of utilization of machine in mining company is not simple. In the paper methodology allowing to solve this problem is presented. Longwall-shearer, as the most important machine between longwall mechanical complex. Also it was assumed that the most significant meaning for determination of effectiveness of longwall-shearer has its availability, i.e. its effective time of work related to standard time. Such an approach is conforming to OEE model. However, specification of mining branch causes that determined availability do not give actual state of longwall-shearer’s operation. Therefore, this availability was related to the operation cycle of longwall-shearer. In presented example a longwall-shearer works in unidirectional cycle of mining. It causes that in one direction longwall-shearer mines, moving with operating velocity, and in other direction it does not mine and moves with manoeuvre velocity. Such defined working cycle became a base for determinate availability of longwall-shearer. Using indications of industrial automatic system for each of working shift there were determined number of cycles of longwall-shearer and availability of each one. Accepted of such way of determination of availability of longwall-shearer enabled to perform accurate analysis of losses of its availability. These losses result from non-planned shutdowns of longwall-shearer. Thanks to performed analysis based on the operating cycle of longwall-shearer time of its standstill for particular phase of cycle were determined. Presented methodology of determination of longwall-shearer’s availability enables to obtain information which may be used for optimization of mining process. Knowledge of particular phases of longwall-shearer’s operation, in which reduced availability occurs, allows to direct the repairing actions exactly to these regions. Developed methodology and obtained results create great opportunities for practical application and improvement of effectiveness of underground exploitation.
Experimental Realization of a Quantum Support Vector Machine
NASA Astrophysics Data System (ADS)
Li, Zhaokai; Liu, Xiaomei; Xu, Nanyang; Du, Jiangfeng
2015-04-01
The fundamental principle of artificial intelligence is the ability of machines to learn from previous experience and do future work accordingly. In the age of big data, classical learning machines often require huge computational resources in many practical cases. Quantum machine learning algorithms, on the other hand, could be exponentially faster than their classical counterparts by utilizing quantum parallelism. Here, we demonstrate a quantum machine learning algorithm to implement handwriting recognition on a four-qubit NMR test bench. The quantum machine learns standard character fonts and then recognizes handwritten characters from a set with two candidates. Because of the wide spread importance of artificial intelligence and its tremendous consumption of computational resources, quantum speedup would be extremely attractive against the challenges of big data.
Workshop on Fielded Applications of Machine Learning
1994-05-11
This report summaries the talks presented at the Workshop on Fielded Applications of Machine Learning , and draws some initial conclusions about the state of machine learning and its potential for solving real-world problems.
Revisit of Machine Learning Supported Biological and Biomedical Studies.
Yu, Xiang-Tian; Wang, Lu; Zeng, Tao
2018-01-01
Generally, machine learning includes many in silico methods to transform the principles underlying natural phenomenon to human understanding information, which aim to save human labor, to assist human judge, and to create human knowledge. It should have wide application potential in biological and biomedical studies, especially in the era of big biological data. To look through the application of machine learning along with biological development, this review provides wide cases to introduce the selection of machine learning methods in different practice scenarios involved in the whole biological and biomedical study cycle and further discusses the machine learning strategies for analyzing omics data in some cutting-edge biological studies. Finally, the notes on new challenges for machine learning due to small-sample high-dimension are summarized from the key points of sample unbalance, white box, and causality.
Manual of good practices for sanitation in coal mining operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
The purpose of the manual was to act as a guideline, setting reasonable recommendations relative to mine sanitation which will enable mines to install adequate facilities and make appropriate alterations conserving and improving the health and welfare of the mine worker. A systematic evaluation was undertaken of the sanitation facilities and maintenance at coal mines. Consideration was given to central facilities including building, floors, walls, partitions, ceilings, lockers, baskets and benches, showers, toilets, lavatories, lighting, ventilation and temperature control, and maintenance. Also discussed were food vending machines, water source, water quality, water treatment, water delivery systems for underground and surfacemore » mines, sanitary waste disposal, workplace toilets in underground and surface mines, refuse control and handling for underground and surface mines, and pest control.« less
Machine Learning. Part 1. A Historical and Methodological Analysis.
1983-05-31
Machine learning has always been an integral part of artificial intelligence, and its methodology has evolved in concert with the major concerns of the field. In response to the difficulties of encoding ever-increasing volumes of knowledge in modern Al systems, many researchers have recently turned their attention to machine learning as a means to overcome the knowledge acquisition bottleneck. Part 1 of this paper presents a taxonomic analysis of machine learning organized primarily by learning strategies and secondarily by
Toward Harnessing User Feedback For Machine Learning
2006-10-02
machine learning systems. If this resource-the users themselves-could somehow work hand-in-hand with machine learning systems, the accuracy of learning systems could be improved and the users? understanding and trust of the system could improve as well. We conducted a think-aloud study to see how willing users were to provide feedback and to understand what kinds of feedback users could give. Users were shown explanations of machine learning predictions and asked to provide feedback to improve the predictions. We found that users
Intelligible machine learning with malibu.
Langlois, Robert E; Lu, Hui
2008-01-01
malibu is an open-source machine learning work-bench developed in C/C++ for high-performance real-world applications, namely bioinformatics and medical informatics. It leverages third-party machine learning implementations for more robust bug-free software. This workbench handles several well-studied supervised machine learning problems including classification, regression, importance-weighted classification and multiple-instance learning. The malibu interface was designed to create reproducible experiments ideally run in a remote and/or command line environment. The software can be found at: http://proteomics.bioengr. uic.edu/malibu/index.html.
Language Acquisition and Machine Learning.
1986-02-01
machine learning and examine its implications for computational models of language acquisition. As a framework for understanding this research, the authors propose four component tasks involved in learning from experience-aggregation, clustering, characterization, and storage. They then consider four common problems studied by machine learning researchers-learning from examples, heuristics learning, conceptual clustering, and learning macro-operators-describing each in terms of our framework. After this, they turn to the problem of grammar
Behavioral Profiling of Scada Network Traffic Using Machine Learning Algorithms
2014-03-27
BEHAVIORAL PROFILING OF SCADA NETWORK TRAFFIC USING MACHINE LEARNING ALGORITHMS THESIS Jessica R. Werling, Captain, USAF AFIT-ENG-14-M-81 DEPARTMENT...subject to copyright protection in the United States. AFIT-ENG-14-M-81 BEHAVIORAL PROFILING OF SCADA NETWORK TRAFFIC USING MACHINE LEARNING ...AFIT-ENG-14-M-81 BEHAVIORAL PROFILING OF SCADA NETWORK TRAFFIC USING MACHINE LEARNING ALGORITHMS Jessica R. Werling, B.S.C.S. Captain, USAF Approved
Statistical Machine Learning for Structured and High Dimensional Data
2014-09-17
AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under
Machine learning in genetics and genomics
Libbrecht, Maxwell W.; Noble, William Stafford
2016-01-01
The field of machine learning promises to enable computers to assist humans in making sense of large, complex data sets. In this review, we outline some of the main applications of machine learning to genetic and genomic data. In the process, we identify some recurrent challenges associated with this type of analysis and provide general guidelines to assist in the practical application of machine learning to real genetic and genomic data. PMID:25948244
Translations on North Korea No. 622
1978-10-13
Pyongyang Power Station 5 July Electric Factory Hamhung Machine Tool Factory Kosan Plastic Pipe Factory Sog’wangea Plastic Pipe Factory 8...August Factory Double Chollima Hamhung Disabled Veterans’ Plastic Goods Factory Mangyongdae Machine Tool Factory Kangso Coal Mine Tongdaewon Garment...21 Jul 78 p 4) innovating in machine tool production (NC 21 Jul 78 p 2) in 40 days of the 蔴 days of combat" raised coal production 10 percent
Hepworth, Philip J.; Nefedov, Alexey V.; Muchnik, Ilya B.; Morgan, Kenton L.
2012-01-01
Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide. PMID:22319115
Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L
2012-08-07
Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.
Addressing uncertainty in atomistic machine learning.
Peterson, Andrew A; Christensen, Rune; Khorshidi, Alireza
2017-05-10
Machine-learning regression has been demonstrated to precisely emulate the potential energy and forces that are output from more expensive electronic-structure calculations. However, to predict new regions of the potential energy surface, an assessment must be made of the credibility of the predictions. In this perspective, we address the types of errors that might arise in atomistic machine learning, the unique aspects of atomistic simulations that make machine-learning challenging, and highlight how uncertainty analysis can be used to assess the validity of machine-learning predictions. We suggest this will allow researchers to more fully use machine learning for the routine acceleration of large, high-accuracy, or extended-time simulations. In our demonstrations, we use a bootstrap ensemble of neural network-based calculators, and show that the width of the ensemble can provide an estimate of the uncertainty when the width is comparable to that in the training data. Intriguingly, we also show that the uncertainty can be localized to specific atoms in the simulation, which may offer hints for the generation of training data to strategically improve the machine-learned representation.
On the Conditioning of Machine-Learning-Assisted Turbulence Modeling
NASA Astrophysics Data System (ADS)
Wu, Jinlong; Sun, Rui; Wang, Qiqi; Xiao, Heng
2017-11-01
Recently, several researchers have demonstrated that machine learning techniques can be used to improve the RANS modeled Reynolds stress by training on available database of high fidelity simulations. However, obtaining improved mean velocity field remains an unsolved challenge, restricting the predictive capability of current machine-learning-assisted turbulence modeling approaches. In this work we define a condition number to evaluate the model conditioning of data-driven turbulence modeling approaches, and propose a stability-oriented machine learning framework to model Reynolds stress. Two canonical flows, the flow in a square duct and the flow over periodic hills, are investigated to demonstrate the predictive capability of the proposed framework. The satisfactory prediction performance of mean velocity field for both flows demonstrates the predictive capability of the proposed framework for machine-learning-assisted turbulence modeling. With showing the capability of improving the prediction of mean flow field, the proposed stability-oriented machine learning framework bridges the gap between the existing machine-learning-assisted turbulence modeling approaches and the demand of predictive capability of turbulence models in real applications.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Bypassing the Kohn-Sham equations with machine learning.
Brockherde, Felix; Vogt, Leslie; Li, Li; Tuckerman, Mark E; Burke, Kieron; Müller, Klaus-Robert
2017-10-11
Last year, at least 30,000 scientific papers used the Kohn-Sham scheme of density functional theory to solve electronic structure problems in a wide variety of scientific fields. Machine learning holds the promise of learning the energy functional via examples, bypassing the need to solve the Kohn-Sham equations. This should yield substantial savings in computer time, allowing larger systems and/or longer time-scales to be tackled, but attempts to machine-learn this functional have been limited by the need to find its derivative. The present work overcomes this difficulty by directly learning the density-potential and energy-density maps for test systems and various molecules. We perform the first molecular dynamics simulation with a machine-learned density functional on malonaldehyde and are able to capture the intramolecular proton transfer process. Learning density models now allows the construction of accurate density functionals for realistic molecular systems.Machine learning allows electronic structure calculations to access larger system sizes and, in dynamical simulations, longer time scales. Here, the authors perform such a simulation using a machine-learned density functional that avoids direct solution of the Kohn-Sham equations.
30 CFR 27.24 - Power-shutoff component.
Code of Federal Regulations, 2010 CFR
2010-07-01
... APPROVAL OF MINING PRODUCTS METHANE-MONITORING SYSTEMS Construction and Design Requirements § 27.24 Power... the machine or equipment when actuated by the methane detector at a methane concentration of 2.0... actuated by the methane detector, cause a control circuit to shut down the machine or equipment on which it...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brezovec, D.
1983-11-01
A new coal mining machine that was going to pull some 40 million tons of coal from the Appalachian coalfields by 1986 has had more than its share of start-up problems. The machine, known as the Thin Seam Miner (TSM), is a $2.7-million auger-type mining machine that is designed to bore 220 ft into new or abandoned highwalls (CA 5/82 p. 106). Gamma-ray sensors located near the continuous drum miner-type cutter head monitor for rock and other sensors monitor for methane. The machines are designed to produce about 425 tons per shift from a 36-in.-thick coal seam. The machines weremore » introduced officially to the American coal industry at a luncheon Aug. 19, 1981, in a ballroom at the Lexington, Ky., Hyatt Regency Hotel. At the luncheon, some 200 coal industry executives and others sipped champagne and listened to glowing reports of how 24 of the machines would produce 2.2 million tons of coal by the end of 1981 and 64 of the machines would produce 6.6 million tons by the end of 1982. The machines would be built in Holland by RijnSchelde-Verolme (RSV), a major Dutch shipbuilder, and managed in the United States by Advanced Coal Management (ACM), a company formed for the purpose by James D. Stacy, a colorful, cigar-smoking stock car owner whose experience in the coal business dated from only the mid-1970s.« less
Neuromorphic Optical Signal Processing and Image Understanding for Automated Target Recognition
1989-12-01
34 Stochastic Learning Machine " Neuromorphic Target Identification * Cognitive Networks 3. Conclusions ..... ................ .. 12 4. Publications...16 5. References ...... ................... . 17 6. Appendices ....... .................. 18 I. Optoelectronic Neural Networks and...Learning Machines. II. Stochastic Optical Learning Machine. III. Learning Network for Extrapolation AccesFon For and Radar Target Identification
An iterative learning control method with application for CNC machine tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, D.I.; Kim, S.
1996-01-01
A proportional, integral, and derivative (PID) type iterative learning controller is proposed for precise tracking control of industrial robots and computer numerical controller (CNC) machine tools performing repetitive tasks. The convergence of the output error by the proposed learning controller is guaranteed under a certain condition even when the system parameters are not known exactly and unknown external disturbances exist. As the proposed learning controller is repeatedly applied to the industrial robot or the CNC machine tool with the path-dependent repetitive task, the distance difference between the desired path and the actual tracked or machined path, which is one ofmore » the most significant factors in the evaluation of control performance, is progressively reduced. The experimental results demonstrate that the proposed learning controller can improve machining accuracy when the CNC machine tool performs repetitive machining tasks.« less
Weber, Gerhard-Wilhelm; Ozöğür-Akyüz, Süreyya; Kropat, Erik
2009-06-01
An emerging research area in computational biology and biotechnology is devoted to mathematical modeling and prediction of gene-expression patterns; it nowadays requests mathematics to deeply understand its foundations. This article surveys data mining and machine learning methods for an analysis of complex systems in computational biology. It mathematically deepens recent advances in modeling and prediction by rigorously introducing the environment and aspects of errors and uncertainty into the genetic context within the framework of matrix and interval arithmetics. Given the data from DNA microarray experiments and environmental measurements, we extract nonlinear ordinary differential equations which contain parameters that are to be determined. This is done by a generalized Chebychev approximation and generalized semi-infinite optimization. Then, time-discretized dynamical systems are studied. By a combinatorial algorithm which constructs and follows polyhedra sequences, the region of parametric stability is detected. In addition, we analyze the topological landscape of gene-environment networks in terms of structural stability. As a second strategy, we will review recent model selection and kernel learning methods for binary classification which can be used to classify microarray data for cancerous cells or for discrimination of other kind of diseases. This review is practically motivated and theoretically elaborated; it is devoted to a contribution to better health care, progress in medicine, a better education, and more healthy living conditions.
Learning dominance relations in combinatorial search problems
NASA Technical Reports Server (NTRS)
Yu, Chee-Fen; Wah, Benjamin W.
1988-01-01
Dominance relations commonly are used to prune unnecessary nodes in search graphs, but they are problem-dependent and cannot be derived by a general procedure. The authors identify machine learning of dominance relations and the applicable learning mechanisms. A study of learning dominance relations using learning by experimentation is described. This system has been able to learn dominance relations for the 0/1-knapsack problem, an inventory problem, the reliability-by-replication problem, the two-machine flow shop problem, a number of single-machine scheduling problems, and a two-machine scheduling problem. It is considered that the same methodology can be extended to learn dominance relations in general.